Bloggitik.se – Swedish political blogs classified on subject

Bloggitik, a new service in Swedish based on uClassify has been launched. It collects political blogs and automatically categorizes each post. This makes it easier for users to find among all the blogs. The people at blogitik.se has also made sure that the system learns as it’s being used, when a blog is classified into the wrong category, readers can correct this and the system will improve over time.

This is a really cool usage of our service and the best of luck!

bloggitik

UrlAi.com – who are you?

UrlAi

We have created a new service called UrlAi.com, the basic concept is to run blog posts through a bunch of classifiers over time. To begin with we use Gender, Age, Mood and Tonality but the system is dynamic so we can add new classifiers at any time. If you have created a classifier that would fit on urlai.com let us know!

Some ideas

We have many ideas of how we can develop this project further, for example, now we are only showing a summary pie chart, it would be nice to see posts over time. User feedback for online training and classifier improvement may be possible. Another thing we could do is to have classified posts searchable, for example, enabling users to see the mood of everyone who mentioned ‘Avatar’.

Some kudos

Just want to thank the people that has been involved in this project, Roger Karlsson for coding, Johanna Forsman for the awesome logo and Mattias Östmar for sharing his Tonality and Mood classifiers. Mattias has also contributed with many ideas around this, being the idea fountain he is 😀

Connection pool bug fix

Some users have experienced

“Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.”

This happened because some database connections persisted longer than expected and as a result the connection pool was drained. This became obvious with the release of the URL API. The bug should have been fixed now.

I’m also really happy to see that we are getting a lot new users since the introduction of the URL API!

URL classification API with JSON responses

Many people have contacted me asking for JSON support. I’ve not used it myself earlier but it seems like a widely used format, so I decided to implement support in our new URL API.

Just add the the argument output=json on the end of a classification URL request.

Please let me know if there is anything that looks weird in the JSON API or if something should be redesigned.

Beer…. is the word more used by men or women?

http://uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyText?readkey=YourReadKeyHere&text=beer&output=json

If you click will be asked to save a file due to the MIME content type application/json, however this is how the response looks:

{
“version” : “1.00”,
“success” : true,
“statusCode” : 2000,
“errorMessage” : “”,
“cls1” :
{
“female” : 0.15623,
“male” : 0.84377
}
}

New API for classification

We are really happy to announce that we just have released an URL API to make it easier to access classifiers. This means that users can classify via an URL, for example finding out the language of the text snippet from Wiki:

“En apprentissage automatique, le terme de classifieur linéaire représente une famille d’algorithmes de classement statistique.”

http://www.uclassify.com/browse/uClassify/Text-Language/ClassifyText?readKey=YourReadKeyHere&text=En%20apprentissage%20automatique%2C%20le%20terme%20de%20classifieur%20lin%C3%A9aire%20repr%C3%A9sente%20une%20famille%20d’algorithmes%20de%20classement%20statistique.

You can also classify URLs directly. For example this call confirms that Rogers blog is written by a male:

http://www.uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyUrl?readKey=YourReadKeyHere&url=http%3A%2F%2Fwww.rogerkarlsson.com&removeHtml=1

The new API is documented here.

Faster site

I’ve spent this Saturday to rewrite the logging system, as it turned out the old logs were not getting pruned and growing really big. I needed to refactor parts of the logging architecture in order to fix this. By doing so I decided to nuke all old logs instead of wasting time to convert them into the new tables. This is why the ‘calls’ count is reset on all classifiers. I did keep the total number of classifications though – and we have now passed 6 million of them!

As a result this site is much faster to browse, the biggest contrast can be seen when browsing the classifiers.

Keep the classifications going!!

Sorry for unstable server tonight

I had to restart the Windows server tonight (for the first time in more 6 months), while doing this I decided to install some Windows updates. This took more time than expected (2 hours). Everything should be up and running now though, sorry for any inconvenience!

Language Detection

Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.

I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.

Do you think a language is missing? Let me know

Fidel or Franco?

Just wanted to share that another interesting uClassify application has been created. This application classifies Spanish web pages for Fidel or Franco alignment. (Note that the text has to be in Spanish to work properly).

Even though I think this page is intended for fun, I believe  that such classifiers can be used in commercial purposes. Imagine a political party that want to find bloggers that have a right or left wing alignment to help them drive their election. They could then run a huge number of blogs (collected from some data source) through the classifier and find those that they are looking for.

Good work!

Mood classifier is helping to save the world

Recently Team Curious created a really interesting website, MDGActors that uses a mix of APIs to recognize actors for the Millennium Development Goals (MDG). This is how they describe it:

“Tackling the toughest problems facing the world today is a big job. What if we could highlight the people that are making a big difference and call out the people that aren’t? We think that it would be inspiring—Umair Haque thinks it could change the world.” Read their complete manifesto

They scan relevant articles for personal names, and classifies sentences around it with the Mood classifier (created by Mattias). They call it ‘Empathize’ and describe it as:

“If an author voices particularly strong sentiment in a sentence, an icon is added next to the sentence to indicate that feeling. Phrases with hearts are usually about people doing good, and phrases with condescending frowns usually signal more controversial topics.”

I find the idea very interesting, being able to pinpoint heroes and anti-heroes for a good cause. Great work!

Another interesting application Team Curious are providing is their Spin, where you just enter any keyword and it will tell you the sentiment of the results.

Entering George Bush came back with 14 negative and 0 positive voices.

A negative voice about George Bush

Entering ‘Terminator Salvation‘ showed 2 negative and 11 positive voices.

A positive voice about Terminator Salvation

Their project is a part of the Microsofts Imagine Cup Competition, I really wish you guys the best of luck!!