Improved classifier accuracy

I am very happy to announce this performance update that means that classification will have better accuracy than before.

When I was building a new topic classifier based on the IAB taxonomy I did notice some weird behaviour for classes with much less training data than the others. As I started to investigate this I was able to understand how the overall classification could be improved, not only those with low training data. After weeks of testing different implementations I found a few improvements that significantly gave better results on the test datasets.

In short classifiers are much more robust and less sensitive to imbalanced data.

This update doesn’t affect any api endpoints it will only give you better probabilities.

I might write a short post on the technicalities of this update.