uClassify Corpus Tool BETA

With the uClassify Corpus Tool you will be able to build and test classifiers locally without any programming involved. It’s included in the distribution of the uClassify server. You can download the server evaluation version here.

Classifier representation

This tool is really simple to use. To represent a classifier simply create a directory on your hard drive with the name of the classifier. Then create sub directories for each class belonging to this classifier.

For example:
c:corpussentiment (classifier ‘sentiment’ directory)
c:corpussentimentpositive (class ‘positive’ belonging to classifier ‘sentiment’)
c:corpussentimentnegative (class ‘negative’ belonging to classifier ‘sentiment’)

Now you just fill each class directory with documents belonging the that class. For example put positive Amazon reviews in the ‘c:corpussentimentpositive’ and negative reviews in the ‘c:corpussentimentnegative’ folder.

Testing a classifier

To test this classifier you run the uClassify Corpus Tool:
uclassifytool.exe -test c:corpussentiment

This will output some basic metrics on the performance such as accuracy, macro precision, macro recall and the f1 measure between precision and recall. Also some per class statistics are shown.

In order to calculate custom metrics on a classifier you can export a confusion matrix with the the flag ‘-outcm’. This will allow you to calculate a lot of other measurements on the classifier. You may also output per class (one vs all) statistics with the ‘-outpc’ flag.

Building a classifier

To build a classifier:
uclassifytool.exe -build c:corpussentiment

This will create a binary that the uClassify server can read. It’s basically a frequency distribution with some additional information. The resulting file will be called ‘sentiment.dat’ and placed in the root dir of the classifier (in this case ‘c:corpussentimentsentiment.dat’).

Now you can just copy this file to your local uClassify server classifier directory.

Public data sets

There are hundreds of public data sets that you can test the classifier on. You can just download them, unzip and put their documents in a directory structure that the uClassify Corpus Tool understands. To mention a few:

Future

For now the .dat files can only be used by your own local uClassify server, however, we are looking into ways to make it possible to upload .dat classifier files to the uclassify.com to be used via the web api.

uClassify Tool Screenshot