Language Detection

June 11th, 2009

Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.

I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.

Do you think a language is missing? Let me know

Fidel or Franco?

June 4th, 2009

Just wanted to share that another interesting uClassify application has been created. This application classifies Spanish web pages for Fidel or Franco alignment. (Note that the text has to be in Spanish to work properly).

Even though I think this page is intended for fun, I believe  that such classifiers can be used in commercial purposes. Imagine a political party that want to find bloggers that have a right or left wing alignment to help them drive their election. They could then run a huge number of blogs (collected from some data source) through the classifier and find those that they are looking for.

Good work!

Mood classifier is helping to save the world

May 24th, 2009

Recently Team Curious created a really interesting website, MDGActors that uses a mix of APIs to recognize actors for the Millennium Development Goals (MDG). This is how they describe it:

“Tackling the toughest problems facing the world today is a big job. What if we could highlight the people that are making a big difference and call out the people that aren’t? We think that it would be inspiring—Umair Haque thinks it could change the world.” Read their complete manifesto

They scan relevant articles for personal names, and classifies sentences around it with the Mood classifier (created by Mattias). They call it ‘Empathize’ and describe it as:

“If an author voices particularly strong sentiment in a sentence, an icon is added next to the sentence to indicate that feeling. Phrases with hearts are usually about people doing good, and phrases with condescending frowns usually signal more controversial topics.”

I find the idea very interesting, being able to pinpoint heroes and anti-heroes for a good cause. Great work!

Another interesting application Team Curious are providing is their Spin, where you just enter any keyword and it will tell you the sentiment of the results.

Entering George Bush came back with 14 negative and 0 positive voices.

A negative voice about George Bush

Entering ‘Terminator Salvation‘ showed 2 negative and 11 positive voices.

A positive voice about Terminator Salvation

Their project is a part of the Microsofts Imagine Cup Competition, I really wish you guys the best of luck!!

Stock prediction results week 20

May 18th, 2009

In short, a much longer evaluation period is needed to verify if there is any significance in those classifications. It did quite well for Tuesday and Wednesday (60% and 83% correct), bad on Thursday (36%) and what can be expected by random on Friday (48%). I may setup a site that will feed predictions automatically over a long period of time. What also would be really interesting is to feed stock news into the training data. Below is detailed information of how the classifications turned out, if anyone is interested!

Friday 15/5

Predicted winners, actual winners:TXI, AVP, AAWW, DD, GEF, WCC, CTSH, ACTG, HIT, AKS, DRQ, AVT, ALB, ACXM, BWS, LTD, CBT, CHRS, DWSN, DUG, FXP, BMS, GYMB, PETM, AFFX, COL, ALOG, ADVNB, AXE, AN, WIT, MDC, DY, OIS, JOSB, IBI, ASPM, TEX

Predicted winners, actual losers: GPI, ANAD, FWRD, KMT, ATPG, JBLU, ESL, USG, ALV, KBR, ANF, AYR, STAA, ADPT, TSO, LINTA, ME, GIB, BC, AIR, BGC, USTR, CBI, EXPE, COG, FCG, ATI, EAC, DOW, ETH, AYI, CIM, SCHN, FNM, ARW, TER, TKTM, AMSG, CEDC, AUO, TRW, ENER, GLBL, CHL, CONN, OI, JOE, HXL, PLL, ADBE, CHU, EMN, COH, APC, CPX, GGB, EEQ, TIE, ENG, AFL, A, FO, X, JRCC, AAI, KBW, CHRD, CAJ, CTX

Predicted losers, actual losers:UNT, GT, ABT, WAB, AWI, WGOV, EQIX, CLR, CRBC, CAM, UEPS, CLF, IP, DEI, UYM, DVN, LDSH, WRES, UBS, UFPI, GLF, WHR, DVR, EGN, NOA, CHK, JCI, CEO, CBR, FIW, DSI, ACV, XEC, HTCH, ENZN, OC, AMR, CRS, KWK, WIN

Predicted losers, actual winners:TTMI, UTSI, TWI, ID, DCI, CNTF, GNK, AAPL, WWW, GGG, COMS, UA, GWR, ARQL

Accuracy (78/161) : 48%

Thursday 14/5

Predicted winners, actual winners:ESL, MDC, PLL, PETM, DVN, COG, LDSH, ACV, TSO, CRS, DLB, PTNR

Predicted winners, actual losers:FXP, DUG, ADPT

Predicted losers, actual losers:KMT, WAB, CRBC, DWSN, ADVNB

Predicted losers, actual winners:CPX, TTMI, UNT, FNM, HIT, UTSI, GT, AWI, GIB, GPI, DD, ME, ANAD, CHRD, TXI, WGOV, EQIX, ATPG, DOW, AIR, ALV, AVP, USG, CLR, ACTG, ATI, CIM, ACXM, TWI, CAM, ID, UEPS, COH, AYR, CLF, BWS, ALOG, CEDC, ANF, IP, DCI, GNK, UYM, AFL

Accuracy (17/47) : 36%

Wednesday 13/5

Predicted winners, actual winners:

Predicted winners, actual losers: HIT, EGN, GIB, DLB, TTMI, GT

Predicted losers, actual losers: FWRD, DWSN, CEDC, UTSI, FNM, KMT, AWI, TXI, ANAD, ME, WGOV, GPI, ATPG, AVP, EQIX, DOW, AIR, ALV, CLR, USG, CRBC, ATI, ACTG, CIM, ACXM, ID, TWI, CAM, AYR, CLF, BWS, USTR, FCG, EAC, DEI, WCC, EXPE, CBI, ALB, GLF, UFPI, WHR, AVT

Predicted losers, actual winners: ESL, ADPT, ANF

Accuracy (43/52) : 83%

Tuesday 12/2

Predicted winners, actual winners: DWSN, UFPI, FXP, ESL, DUG, EXPE, DLB, CONN, DD, COMS

Predicted winners, actual losers: CEDC, DCI, AVT, FWRD, CTSH, GEF, ALV, AUO, DRQ, UNT, AKS, XEC, TXI, FCG, EGN, BWS, AAPL, GIB, COG, AWI, DOW, WCC, EAC, CIM, FIW, EQIX, HIT, USG, ETH, AYR, UEPS, AAWW

Predicted losers, actual losers: WWW, ATI, CP, CRBC, ECA, WHR, TRW, DISCA, GGG, USTR, GWR, CBI, DISH, CLF, ALB, CRS, AYI, AMSG, FFIV, CHL, ARW, DVR, TER, CHRS, GLS, CBT, ACXM, GLBL, UBS, DSI, GT, TTMI, CVC, ANF, TWI, FNM, CKP, ADPT, TSO, CLR, GPI, ANAD, AVP, CAM, GLF, ATPG, CBB, ACTG

Predicted losers, actual winners: GLT, TKTM, ENER, DEI, CBR, UTSI, AIR

Accuracy (58/97) : 60%

Attempt on predicting stock for 15 of May 2009

May 14th, 2009

Finally friday! Todays classifier predictions seems to vary more than yesterdays.

Prediction for 15 of May 2009

It’s now the 15 of May 08:15 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
GPI, ANAD, TXI, FWRD, KMT, ATPG, AVP, JBLU, AAWW, DD, ESL, USG, ALV, KBR, ANF, AYR, GEF, STAA, ADPT, TSO, LINTA, ME, GIB, WCC, BC, CTSH, ACTG, AIR, BGC, HIT, AKS, USTR, CBI, DRQ, AVT, EXPE, COG, FCG, ALB, ATI, ACXM, EAC, DOW, BWS, ETH, LTD, CBT, AYI, CHRS, CIM, SCHN, FNM, DWSN, ARW, TER, TKTM, AMSG, DUG, CEDC, FXP, AUO, TRW, ENER, GLBL, CHL, CONN, OI, BMS, JOE, GYMB, HXL, PETM, PLL, ADBE, CHU, AFFX, COL, ALOG, EMN, COH, APC, CPX, GGB, ADVNB, EEQ, AXE, TIE, ENG, AFL, AN, A, FO, X, WIT, JRCC, MDC, DY, AAI, KBW, CHRD, OIS, JOSB, IBI, ASPM, CAJ, CTX, TEX

Some predicted losers (closing price < opening price):
TTMI, UNT, UTSI, GT, ABT, WAB, AWI, WGOV, EQIX, CLR, CRBC, TWI, ID, CAM, UEPS, CLF, DCI, IP, DEI, UYM, CNTF, DVN, LDSH, GNK, WRES, UBS, UFPI, GLF, WHR, AAPL, DVR, EGN, NOA, CHK, JCI, CEO, WWW, CBR, FIW, GGG, DSI, ACV, XEC, COMS, UA, HTCH, ENZN, GWR, OC, AMR, CRS, ARQL, KWK, WIN

Attempt on predicting stock for 14 of May 2009

May 13th, 2009

It turns out that all this manual work takes a lot of time, so I’ll automate the process during this weekend. However, I will continue to post predictions. As soon it has been automated I will post accuracies, which is going to be interesting =)

Again today, it was hard to find predictions for stock that may go up, out of the 200 most reliable classifiers, only 15 were predicted as winners.

Prediction for 14 of May 2009

It’s now the 14 of May 08:30 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
ESL, FXP, MDC, DUG, PLL, PETM, ADPT, DVN, COG, LDSH, ACV, TSO, CRS, DLB, PTNR

Some predicted losers (closing price < opening price):
CPX, TTMI, UNT, FNM, HIT, UTSI, KMT, GT, WAB, AWI, GIB, GPI, DD, ME, ANAD, CHRD, TXI, WGOV, EQIX, ATPG, DOW, AIR, ALV, AVP, USG, CLR, CRBC, ACTG, ATI, DWSN, CIM, ACXM, TWI, CAM, ID, UEPS, COH, AYR, CLF, BWS, ALOG, CEDC, ADVNB, ANF, IP, DCI, GNK, UYM, AFL

Attempt on predicting stock for 13 of May 2009

May 12th, 2009

I am in a hurry and didn’t have time to predict all winners, I’ll have to automate this =) I will get back with the results from yesterday later.

Prediction for 13 of May 2009

It’s now the 13 of May 08:30 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price), incomplete list:
HIT, FWRD, EGN, GIB, DWSN, CEDC, DLB

Some predicted losers (closing price < opening price):
TTMI, GT, UTSI, FNM, ESL, KMT, AWI, TXI, ANAD, ME, WGOV, GPI, ATPG, AVP, EQIX, DOW, AIR, ALV, CLR, USG, CRBC, ATI, ACTG, CIM, ADPT, ACXM, ID, TWI, CAM, AYR, CLF, BWS, ANF, USTR, FCG, EAC, DEI, WCC, EXPE, CBI, ALB, GLF, UFPI, WHR, AVT

Tomorrow I will report how many that were correctly classified, until then you may check for yourself at for example Yahoo! Finance.

Attempt on predicting stock for 12 of May 2009

May 11th, 2009

A couple of months ago I wrote about a project that would use classifiers to predict tomorrows stock market. I’ve now found some time to implement the very basics, a database with stock history, +3000 classifiers trained on this data and some simple classification scripts.

Instead of waiting, pondering and validating that everything works properly I’ve decided to do this “live”, starting to predict what may happen with some of the stocks for today. There are probably some corrections that will need to be done on the way to see if this is working or not (bug fixes etc).

I’ll select stocks that seem to have been working well to for classification in the past, I’ll probably present 100 of them per time.

Prediction for 12 of May 2009

It’s now the 12 of May 08:00 in the morning (GMT+1, Stockholm), and this is a prediction for today:

Some predicted winners (closing price > opening price):
DWSN, CEDC, DCI, AVT, FWRD, CTSH, UFPI, GEF, ALV, FXP, ESL, DUG, AUO, DRQ, EXPE, DLB, UNT, AKS, XEC, TXI, CONN, FCG, EGN, BWS, AAPL, GIB, COG, AWI, DOW, WCC, EAC, DD, CIM, FIW, EQIX, HIT, USG, ETH, AYR, UEPS, AAWW, COMS

Some predicted losers (closing price < opening price):
WWW, ATI, CP, CRBC, ECA, WHR, TRW, DISCA, GGG, USTR, GWR, CBI, DISH, CLF, ALB, CRS, GLT, AYI, TKTM, AMSG, FFIV, CHL, ARW, DVR, ENER, TER, DEI, CHRS, GLS, CBT, ACXM, GLBL, UBS, CBR, DSI, GT, TTMI, CVC, ANF, TWI, FNM, CKP, ADPT, TSO, CLR, UTSI, GPI, ANAD, AVP, CAM, GLF, ATPG, CBB, ACTG, AIR

Tomorrow I will report how many that were correctly classified, until then you may check for yourself at for example Yahoo! Finance.

uClassify in the press

March 28th, 2009

We’ve had hundred of thousands of mentions throughout the blogosphere and I’m really thankful for this!!! I’ll try to update this post as we go along. Please comment if you can help me with this list!

Here are some I remember/managed to find:


cnbc

CNBC blogs about Mattias Östmars genial invention Typealyzer, 2008-03-25.


Business Week

Business Week writes about Typealyzer, 2009-03-22.


Din Side

Norweigan newspaper test author recognition with uClassify, 2009-03-04.


Technology Review

Article about uClassify (only in paper version), 2009-01-01. Germans really seem to be interested in this kind of stuff!


ReadWriteWeb

Article about uClassify, 2008-12-07.


Kevin Kelly

Kevin Kelly mentions Typealyzer, 2008-12-05.


Doc Searls

Doc Searls mentions Typealyzer, 2008-11-30.

German Daily Taz
Another Genderanalyzer interview, 2008-11-12.


SuedDeutsche

Germanys biggest daily newspaper with a circulation of 450 000 copies. Genderanalyzer interview, 2008-11-10.


BoingBoing

Genderanalyzer is featured 2008-11-03.

The winner is Kelly and the prize is… gone?

February 22nd, 2009

Before Christmas LibraryThing annouced a prize competition for uClassify integration, which I think was a brilliant idea. They wrote:

The Prize! So, LibraryThing calls on the book and library worlds to create something cool with uClassify by February 1, 2009 and post it here. The winner gets Toby Segaran’s Programming Collective Intelligence and a $100 gift certificate to Amazon or IndieBound.

Encouraged by this Kelly Vista entered the competition and submitted his contribution, a classifier that he describes in a comment on LibraryThing:

My goal was to create a classifier that would automatically “tag” any book description based on actual LibraryThing tags. For example, if you paste the book description for “Truman” into UClassify, it should return to you LibraryThing tags that suit the book. This is one step more general than one of your ideas (fiction vs. non-fiction).

The classifier has been published and can be tested here.

Where did the prize go?

I just received a comment from Kelly describing that apparently he was the only competition participant. Also Tim from LibraryThing acknowledged Kelly as the winner per e-mail. However Kevin never received the prestigious prize or even a follow up blog post. Kelly writes in the comment:

Unfortunately, the odd folks at LibraryThing decided to get all silent on me and I have not heard a peep from anyone since. In the meantime, they have found plenty of time to blog about other things. I spent 4-6 hours of my weekend time (which I could have spent with family) participating in the contest, assuming they were decent Internet citizens. And I won (as I mentioned, Tim e-mailed me so) — but then they never got back to me, not after repeated pings, and as a result, I was never awarded the prize they promised.

Tim, what happened here? :)