A Sentiment analyzer tells you if a text it’s positive or negative. For example “I love the new Mad Max Fury road” (positive) or “i am not impressed by the bike” (negative). The Sentiment classifier hosted by uClassify is very popular so I decided to spend some time on improving it.
The goal was to improve the classification accuracy, especially for short texts such as Twitter messages, Facebook statuses or other snippets while maintaining high quality results on texts with more information.
The old Sentiment classifier was built by 40k amazon product reviews. The straight forward way to improve a classifier is to add more data. Thanks to the Internet we were able to find multiple data sources we could train our classifier on. In fact it’s now trained on 2.8 million documents!
The results are good very good, the accuracy on large documents (reviews) went from about 75% to 83%. Tweets went from 63% to about 77%.
Image by Anna Gathu