A text classifier places documents into their relevant classes (categories). For example, placing spam in the spam folder or web pages about Artificial Intelligence into the AI category. There are different types of text classifiers, the one I will be addressing here is a machine learning one!
To make the classifier understand where documents should go you must first train it. By training you manually set up two or more classes (e.g. spam and legitimate) and describe each class by showing typical documents. In the case of a spam classifier you would train the classifier on spam and legitimate documents. Basically saying to Mrs. Classifier “Hey look at this bunch of documents, they are all spam!” after which you show her the legitimate documents “and these are legitimate!”
By doing so the classifier learns characteristics for each class. This is called supervised training. The training documents are often referred to as the training corpus.
Once a classifier has been trained it can be used to find out into which of the predefined classes a previously unseen document is most likely belong. You ask Mrs. Classifier something like “To which of the classes (I have trained you on) is this document most likely to belong?” She would the kindly answer something like “I am 96% certain that it should go into the spam folder.”
It’s not necessary to stop training a classifier when you start classifying. Training and classifying can take place at the same time.
Using our XML API you can communicate with “Mrs. Classifier”!
Some of you may have experienced problems registering since it has been impossible to click on the password textbox (only tab worked). This bug has now been fixed.
Thanks to Marcus Endicott who reported it – he has an interesting blog on Artificial Intelligence and Natural language processing (which I believe is one of the hardest domain of AI). There is also a demo of a chat bot on his page:
– You: what is your name?
– VagaBot: My name is Ralf.
– You: Are you having a good time online?
– VagaBot: Single men will not travel as fast as a pair of women or a mixed couple but should make good time.
We are very happy to anouce that yet another site is using the uclassify web service! ofaust.com is a literature expert who finds out to which classical author a text resemble most. The developers let us know that it has been trained on over 80 different works of classical authors such as Plato, Shakespeare, Tolstoy and of course Goethe.
The beta is now up and running, please sign up create your own web site using cool classifications!
Today we are very pleased to announce the beta release of a new web service that allows everyone to access text classifiers for free. In short, by using a web api (e.g. google maps), everyone can create and train their own classifiers.
Two sites using the api already exists, be inspired and come up with your own classifiers
Typealyzer.com – Analyzes the personality of a blog author.
GenderAnalyzer.com – Figures out if a text is written by a man or woman.
During beta we will test the server for usability, stability, scalability and performance.
All comments and feedback are very appreciated!!
Jon Kågström, Roger Karlsson and Emil Kågström.