classifier – uClassify blog

Download evaluation server

It’s now possible to evaluate the uClassify server locally. We have built a new version of the server that can be downloaded freely and executed on Windows operating systems.

With the evaluation version you can test one of the servers key features – the classification speed without having to go via the web API which can be slow for large volumes of data that has to be sent over the web. This is important for anyone who want to make sure it can handle big volumes of data before purchasing a commercial license.

The only restriction is that it has to be restarted for every 10000 calls – just as a reminder as it’s only for evaluation =)

Have a look in the server manual and download it! …. and let us know what you think!

UrlAi.com – who are you?

We have created a new service called UrlAi.com, the basic concept is to run blog posts through a bunch of classifiers over time. To begin with we use Gender, Age, Mood and Tonality but the system is dynamic so we can add new classifiers at any time. If you have created a classifier that would fit on urlai.com let us know!

Some ideas

We have many ideas of how we can develop this project further, for example, now we are only showing a summary pie chart, it would be nice to see posts over time. User feedback for online training and classifier improvement may be possible. Another thing we could do is to have classified posts searchable, for example, enabling users to see the mood of everyone who mentioned ‘Avatar’.

Some kudos

Just want to thank the people that has been involved in this project, Roger Karlsson for coding, Johanna Forsman for the awesome logo and Mattias Östmar for sharing his Tonality and Mood classifiers. Mattias has also contributed with many ideas around this, being the idea fountain he is 😀

Artificial Intelligence to determine an authors age

We have just released ageanalyzer.com, a site that reads a blog and guesses the age of the author!

Background

Our writing style reflects us in many ways, for example texts written in anger probably differs from words written in joy. Reading a text intuitively gives us a clue about the author as you start forming a picture in your head. Sometimes it’s easy to pinpoint how you got this picture and at other times harder.

We wanted to know if we could give computers the same intuition, in this particular project we are interesting in finding out if a computer can tell the age of an author – only given a text.

To do this experiment we collected 7000 blogs that had age information in the profile and split it into 6 different age groups, 13-17, 18-25, 26-35, 36-50, 51-65 and 65+. We then created a classifier on uClassify and fed it with the training data. Viola!

Expected results

After running tests on the training data (10-fold-cross-validation) it was clear that our classifier was able to find differences between the six age groups. We expect the proportion of correctly classified blogs would be around 30% compared to a baseline of 17% which would be expected if the classifier was guessing out of the blue.

We have added a poll to the site to help us see how well (or poorly) it works!

Try AgeAnalyzer out here!

URL classification API with JSON responses

Many people have contacted me asking for JSON support. I’ve not used it myself earlier but it seems like a widely used format, so I decided to implement support in our new URL API.

Just add the the argument output=json on the end of a classification URL request.

Please let me know if there is anything that looks weird in the JSON API or if something should be redesigned.

Beer…. is the word more used by men or women?

http://uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyText?readkey=YourReadKeyHere&text=beer&output=json

If you click will be asked to save a file due to the MIME content type application/json, however this is how the response looks:

{
“version” : “1.00”,
“success” : true,
“statusCode” : 2000,
“errorMessage” : “”,
“cls1” :
{
“female” : 0.15623,
“male” : 0.84377
}
}

New API for classification

We are really happy to announce that we just have released an URL API to make it easier to access classifiers. This means that users can classify via an URL, for example finding out the language of the text snippet from Wiki:

“En apprentissage automatique, le terme de classifieur linéaire représente une famille d’algorithmes de classement statistique.”

http://www.uclassify.com/browse/uClassify/Text-Language/ClassifyText?readKey=YourReadKeyHere&text=En%20apprentissage%20automatique%2C%20le%20terme%20de%20classifieur%20lin%C3%A9aire%20repr%C3%A9sente%20une%20famille%20d’algorithmes%20de%20classement%20statistique.

You can also classify URLs directly. For example this call confirms that Rogers blog is written by a male:

http://www.uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyUrl?readKey=YourReadKeyHere&url=http%3A%2F%2Fwww.rogerkarlsson.com&removeHtml=1

The new API is documented here.

Language Detection

Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.

I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.

Do you think a language is missing? Let me know

Fidel or Franco?

Just wanted to share that another interesting uClassify application has been created. This application classifies Spanish web pages for Fidel or Franco alignment. (Note that the text has to be in Spanish to work properly).

Even though I think this page is intended for fun, I believe that such classifiers can be used in commercial purposes. Imagine a political party that want to find bloggers that have a right or left wing alignment to help them drive their election. They could then run a huge number of blogs (collected from some data source) through the classifier and find those that they are looking for.

Good work!

Attempt on predicting stock for 15 of May 2009

Finally friday! Todays classifier predictions seems to vary more than yesterdays.

Prediction for 15 of May 2009

It’s now the 15 of May 08:15 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
GPI, ANAD, TXI, FWRD, KMT, ATPG, AVP, JBLU, AAWW, DD, ESL, USG, ALV, KBR, ANF, AYR, GEF, STAA, ADPT, TSO, LINTA, ME, GIB, WCC, BC, CTSH, ACTG, AIR, BGC, HIT, AKS, USTR, CBI, DRQ, AVT, EXPE, COG, FCG, ALB, ATI, ACXM, EAC, DOW, BWS, ETH, LTD, CBT, AYI, CHRS, CIM, SCHN, FNM, DWSN, ARW, TER, TKTM, AMSG, DUG, CEDC, FXP, AUO, TRW, ENER, GLBL, CHL, CONN, OI, BMS, JOE, GYMB, HXL, PETM, PLL, ADBE, CHU, AFFX, COL, ALOG, EMN, COH, APC, CPX, GGB, ADVNB, EEQ, AXE, TIE, ENG, AFL, AN, A, FO, X, WIT, JRCC, MDC, DY, AAI, KBW, CHRD, OIS, JOSB, IBI, ASPM, CAJ, CTX, TEX

Some predicted losers (closing price < opening price):
TTMI, UNT, UTSI, GT, ABT, WAB, AWI, WGOV, EQIX, CLR, CRBC, TWI, ID, CAM, UEPS, CLF, DCI, IP, DEI, UYM, CNTF, DVN, LDSH, GNK, WRES, UBS, UFPI, GLF, WHR, AAPL, DVR, EGN, NOA, CHK, JCI, CEO, WWW, CBR, FIW, GGG, DSI, ACV, XEC, COMS, UA, HTCH, ENZN, GWR, OC, AMR, CRS, ARQL, KWK, WIN

Attempt on predicting stock for 14 of May 2009

It turns out that all this manual work takes a lot of time, so I’ll automate the process during this weekend. However, I will continue to post predictions. As soon it has been automated I will post accuracies, which is going to be interesting =)

Again today, it was hard to find predictions for stock that may go up, out of the 200 most reliable classifiers, only 15 were predicted as winners.

Prediction for 14 of May 2009

It’s now the 14 of May 08:30 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
ESL, FXP, MDC, DUG, PLL, PETM, ADPT, DVN, COG, LDSH, ACV, TSO, CRS, DLB, PTNR

Some predicted losers (closing price < opening price):
CPX, TTMI, UNT, FNM, HIT, UTSI, KMT, GT, WAB, AWI, GIB, GPI, DD, ME, ANAD, CHRD, TXI, WGOV, EQIX, ATPG, DOW, AIR, ALV, AVP, USG, CLR, CRBC, ACTG, ATI, DWSN, CIM, ACXM, TWI, CAM, ID, UEPS, COH, AYR, CLF, BWS, ALOG, CEDC, ADVNB, ANF, IP, DCI, GNK, UYM, AFL

Attempt on predicting stock for 12 of May 2009

A couple of months ago I wrote about a project that would use classifiers to predict tomorrows stock market. I’ve now found some time to implement the very basics, a database with stock history, +3000 classifiers trained on this data and some simple classification scripts.

Instead of waiting, pondering and validating that everything works properly I’ve decided to do this “live”, starting to predict what may happen with some of the stocks for today. There are probably some corrections that will need to be done on the way to see if this is working or not (bug fixes etc).

I’ll select stocks that seem to have been working well to for classification in the past, I’ll probably present 100 of them per time.

Prediction for 12 of May 2009

It’s now the 12 of May 08:00 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
DWSN, CEDC, DCI, AVT, FWRD, CTSH, UFPI, GEF, ALV, FXP, ESL, DUG, AUO, DRQ, EXPE, DLB, UNT, AKS, XEC, TXI, CONN, FCG, EGN, BWS, AAPL, GIB, COG, AWI, DOW, WCC, EAC, DD, CIM, FIW, EQIX, HIT, USG, ETH, AYR, UEPS, AAWW, COMS

Some predicted losers (closing price < opening price):
WWW, ATI, CP, CRBC, ECA, WHR, TRW, DISCA, GGG, USTR, GWR, CBI, DISH, CLF, ALB, CRS, GLT, AYI, TKTM, AMSG, FFIV, CHL, ARW, DVR, ENER, TER, DEI, CHRS, GLS, CBT, ACXM, GLBL, UBS, CBR, DSI, GT, TTMI, CVC, ANF, TWI, FNM, CKP, ADPT, TSO, CLR, UTSI, GPI, ANAD, AVP, CAM, GLF, ATPG, CBB, ACTG, AIR

Tomorrow I will report how many that were correctly classified, until then you may check for yourself at for example Yahoo! Finance.