Development – Page 2 – uClassify blog

New URL REST API

The new URL REST API is our simplest to use API. You can copy paste the API url in the browser and get the result. The read api key and text are passed as parameters in the url.

Here is an example:
https://api.uclassify.com/v1/uClassify/Sentiment/classify/?readKey=YOUR_READ_API_KEY_HERE&text=I+am+so+happy+today

The result is simply a JSON dictionary with class=>probabilities:

{
"negative": 0.133639,
"positive": 0.866361
}

The only thing you need to do is to sign up for a free account (allows you 1000 calls per day) and replace ‘YOUR_READ_API_KEY_HERE’ with your read api key (found after you log in).

Here is the documentation for the api. The API is a simplified subset of our standard JSON REST API, you can read more the uClassify API differences here.

Happy classifying!

Improved classifier accuracy

I am very happy to announce this performance update that means that classification will have better accuracy than before.

When I was building a new topic classifier based on the IAB taxonomy I did notice some weird behaviour for classes with much less training data than the others. As I started to investigate this I was able to understand how the overall classification could be improved, not only those with low training data. After weeks of testing different implementations I found a few improvements that significantly gave better results on the test datasets.

In short classifiers are much more robust and less sensitive to imbalanced data.

This update doesn’t affect any api endpoints it will only give you better probabilities.

I might write a short post on the technicalities of this update.

JSON REST API

Since uClassify was launched back in 2008 we have seen many technological changes. Last year I modernised the site to use bootstrap as a foundation. Now it’s time to take the api to a more modern format.

Initially the uClassify api only had an XML endpoint, however over the years JSON has become more common and I have been getting more and more requests for REST endpoints with JSON format. The graph below shows google trends ‘json api’ (red) vs ‘xml api’ (blue)

Today I have launched a beta of the JSON REST API, changes may still occur but it will hopefully be finalised during Mars 2016.

You can find the documentation here, please feel free to leave feedback.

The old XML and URL API endpoints will of course continue to work as before.

New 64-bit local server

As a part of the uClassify upgrade I’ve recompiled the local server for 64-bit. This was necessary since I’m working on a huge classifier for sentiment and needed the corpus tool to be able to handle more then what 32-bit pointers could hold.

If you are running a local uClassify server, you can download the 64-bit (and 32-bit) here. The 64-bit server is already used in production for uClassify and should be pretty well tested by now.

You can read more about the local uClassify server here.

Update: Thursday 14/5 May (read-only-mode)

Exciting times! I’ve decided to push the next update out on Thursday May the 14th (2015). Normally you won’t notice updates but this one is huge.

I’m migrating servers from old ‘Classic’ Amazon EC2 to their new cloudy thing. This will require a DNS update which takes time to propagate over internet before it’s completely done.

Read Only Mode during the transition

Since this also involves a database migration step, I will set the uClassify to ‘read-only’ until it’s done. This means that all the read calls (classify etc) should continue to work during the transition while write calls won’t go through (creating, training classifiers). You won’t be able to register as a new user during this time either. DNS updates usually takes about 48h.

What will be new

First, I’ve done extensive testing to make sure the API will behave exactly the same. If I have not missed anything your app will continue to work without any changes.

The major ‘visible’ changes are:

– A new responsive bootstrap UI (the vanilla theme, somehow cosmetics always ends up last on my prio lists 🙂

– To make it more secure the entire site will be in SSL (don’t worry all the API links without https:// will still work).

– It will be possible to sign in via Twitter, Facebook and Google.

– You can train classifiers by uploading files.

This is the first of a few major updates for uClassify, it doesn’t introduce much new cool fancy stuff but it’s a very important updates that paves the road for the stuff I actually want to add, such as an JSON api.

If you have any questions please don’t hesitate to contact me. (contact AT uclassify DOT com)

Sentiment analysis with keyword extraction

Lately we have been getting a lot requests to our sentiment classifier, many are from social media analyst companies. In fact our sentiment analysis is now the most popular classifier at uClassify!

I just wanted to share something that could be usable for you guys. By using our latest Api call, ‘classifyKeywords’ you can see which keywords are the strongest triggers for the positive and negative classes. This could reveal additional valuable information for your clients.

For example, if you use the keyword analysis on a long product review, you could use the keywords to extract the sentences where the product is mentioned in a positive or negative way. Why not highlight it in green or red? Highlighting sentences will give a very good overview for human reviewers.

Here is how an XML request looks like (just swap ‘classify’ for ‘classifyKeywords’):

<?xml version=”1.0″ encoding=”utf-8″ ?>
<uclassify xmlns=”http://api.uclassify.com/1/RequestSchema” version=”1.01″>
<texts>
<textBase64 id=”tweet1″>bm93IHNvbWV0aW1lcyBpIHdvbmRlciB3aGF0</textBase64>
</texts>
<readCalls readApiKey=”YOUR_READ_API_KEY_HERE”>
<classifyKeywords id=”ClassifyKeywords” username=”uClassify” classifierName=”Sentiment” textId=”tweet1″/>
</readCalls>
</uclassify>

You can find more info about ‘classifyKeywords’ here.

The sentiment classifier is described in more detail here.

Keywords API

With the keywords API you can extract relevant/discriminating words from texts, this opens up a lot of possibilities for developers. Keywords can be used to for tag clouds and answer questions such as why a text is classified into a class. Compared to ordinary tag clouds they bring an extra angle as they are not the overall keywords but only for a certain genre. For example you can find out what parts of a text makes it manly using the gender classifier at the same time running it through the mood classifier and finding out keywords that indicate happy parts.

After having tested the keyword API for a while I’ve just made it public in the XML API now. It works exactly like the classify API but you will get back a list of keywords for each class as well.

In short this is how a call can look:

<?xml version="1.0" encoding="utf-8" ?>
<uclassify xmlns="http://api.uclassify.com/1/RequestSchema" version="1.01">
  <texts>
    <textBase64 id="UnknownText1">bm93IHNvbWV0aW1lcyBpIHdvbmRlciB3aGF0</textBase64>
  </texts>
  <readCalls readApiKey="YOUR_READ_API_KEY_HERE">
    <classifyKeywords id="ClassifyKeywords" classifierName="MySpamClassifier" textId="UnknownText1"/>
  </readCalls>
</uclassify>

Example response:

<?xml version="1.0" encoding="utf-8" ?>
<uclassify xmlns="http://api.uclassify.com/1/ResponseSchema" version="1.01">
  <status success="true" statusCode="2000"/>
  <readCalls>
    <classifyKeywords id="ClassifyKeywords">
      <classification textCoverage="0.96">
        <class className="Legitimate" p="0.12"/>
        <class className="Spam" p="0.88"/>
      </classification>
      <keywords>
        <class className="Legitimate">uclassify jon computer urlai</class>
        <class className="Spam">viagra cheap pills</class>
      </keywords>
    </classifyKeywords>
  </readCalls>
</uclassify>

More info is available in the XML API documentation.

Happy new years!

Classifier Visualization

I’m currently working on a new keywords API for uClassify. This will allow users to get information about what words that are good discriminators for certain classes. To test this API I spent last weekend to built a visualization application for urlai.com.

Here is a screenshot how the visualization prototype show data:

I would very much like to get some feedback on this, you can try it here, please comment below.

API change: Moved textCoverage into ApiVersion 1.01

The last release of the API introduced a new feature called textCoverage. This release was a bit premature and supposed to go into the API version 1.01 in order not to break any of our users response parsers.

If you have not changed anything in your parser during the last couple of days this should not affect you. If anyone was quick enough to start using the textCoverage under version ‘1.00’ this change means that it will disappear from the responses and then you need to bump the version to 1.01. I am really sorry about that.

Bumping xml version

For xml just chage the version number from ‘1.00’ to ‘1.01’: <uclassify xmlns=”http://api.uclassify.com/1/RequestSchema” version=”1.01“>

Bumping the url API version

Here you need to add a new paramter, ‘version’ and set it to the version: http://uclassify.com/browse/uClassify/Text Language/ClassifyUrl?readkey=YOUR_READ_API_KEY_HERE&url=http%3a%2f%2fblog.uclassify.com&version=1.01

You can read more about version handling here.

I am really sorry for any disturbance this may have caused. Let me know if you need any support with this.

uClassify Corpus Tool BETA

With the uClassify Corpus Tool you will be able to build and test classifiers locally without any programming involved. It’s included in the distribution of the uClassify server. You can download the server evaluation version here.

Classifier representation

This tool is really simple to use. To represent a classifier simply create a directory on your hard drive with the name of the classifier. Then create sub directories for each class belonging to this classifier.

For example:
c:corpussentiment (classifier ‘sentiment’ directory)
c:corpussentimentpositive (class ‘positive’ belonging to classifier ‘sentiment’)
c:corpussentimentnegative (class ‘negative’ belonging to classifier ‘sentiment’)

Now you just fill each class directory with documents belonging the that class. For example put positive Amazon reviews in the ‘c:corpussentimentpositive’ and negative reviews in the ‘c:corpussentimentnegative’ folder.

Testing a classifier

To test this classifier you run the uClassify Corpus Tool:
uclassifytool.exe -test c:corpussentiment

This will output some basic metrics on the performance such as accuracy, macro precision, macro recall and the f1 measure between precision and recall. Also some per class statistics are shown.

In order to calculate custom metrics on a classifier you can export a confusion matrix with the the flag ‘-outcm’. This will allow you to calculate a lot of other measurements on the classifier. You may also output per class (one vs all) statistics with the ‘-outpc’ flag.

Building a classifier

To build a classifier:
uclassifytool.exe -build c:corpussentiment

This will create a binary that the uClassify server can read. It’s basically a frequency distribution with some additional information. The resulting file will be called ‘sentiment.dat’ and placed in the root dir of the classifier (in this case ‘c:corpussentimentsentiment.dat’).

Now you can just copy this file to your local uClassify server classifier directory.

Public data sets

There are hundreds of public data sets that you can test the classifier on. You can just download them, unzip and put their documents in a directory structure that the uClassify Corpus Tool understands. To mention a few:

Future

For now the .dat files can only be used by your own local uClassify server, however, we are looking into ways to make it possible to upload .dat classifier files to the uclassify.com to be used via the web api.