Jon – Page 6 – uClassify blog

Feedback, anyone?

One of the most popular published classifiers is Language detection which classified more than 800000 texts last week. However, this is just the top of the iceberg as most classifiers are unpublished (about 500 classifiers). All classifiers are of course not active however a good number are, which brings me to my question. How does it work? I’m not getting much feedback or support – I am not sure if this is good or bad.

If you read this and are using uClassify for a project feel more than free to contact me with any positive or negative feedback on any aspect (classifier performance, documentation, response times or any unclarity). You may leave a comment or e-mail me at this address: contact at uclassify dot com. <– Are spambots able to read this now days?

Over and out!

Connection pool bug fix

Some users have experienced

“Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.”

This happened because some database connections persisted longer than expected and as a result the connection pool was drained. This became obvious with the release of the URL API. The bug should have been fixed now.

I’m also really happy to see that we are getting a lot new users since the introduction of the URL API!

URL classification API with JSON responses

Many people have contacted me asking for JSON support. I’ve not used it myself earlier but it seems like a widely used format, so I decided to implement support in our new URL API.

Just add the the argument output=json on the end of a classification URL request.

Please let me know if there is anything that looks weird in the JSON API or if something should be redesigned.

Beer…. is the word more used by men or women?

http://uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyText?readkey=YourReadKeyHere&text=beer&output=json

If you click will be asked to save a file due to the MIME content type application/json, however this is how the response looks:

{
“version” : “1.00”,
“success” : true,
“statusCode” : 2000,
“errorMessage” : “”,
“cls1” :
{
“female” : 0.15623,
“male” : 0.84377
}
}

New API for classification

We are really happy to announce that we just have released an URL API to make it easier to access classifiers. This means that users can classify via an URL, for example finding out the language of the text snippet from Wiki:

“En apprentissage automatique, le terme de classifieur linéaire représente une famille d’algorithmes de classement statistique.”

http://www.uclassify.com/browse/uClassify/Text-Language/ClassifyText?readKey=YourReadKeyHere&text=En%20apprentissage%20automatique%2C%20le%20terme%20de%20classifieur%20lin%C3%A9aire%20repr%C3%A9sente%20une%20famille%20d’algorithmes%20de%20classement%20statistique.

You can also classify URLs directly. For example this call confirms that Rogers blog is written by a male:

http://www.uclassify.com/browse/uClassify/GenderAnalyzer_v5/ClassifyUrl?readKey=YourReadKeyHere&url=http%3A%2F%2Fwww.rogerkarlsson.com&removeHtml=1

The new API is documented here.

Faster site

I’ve spent this Saturday to rewrite the logging system, as it turned out the old logs were not getting pruned and growing really big. I needed to refactor parts of the logging architecture in order to fix this. By doing so I decided to nuke all old logs instead of wasting time to convert them into the new tables. This is why the ‘calls’ count is reset on all classifiers. I did keep the total number of classifications though – and we have now passed 6 million of them!

As a result this site is much faster to browse, the biggest contrast can be seen when browsing the classifiers.

Keep the classifications going!!

Typealyzer is back

Typealyzer has been down for quite some time. The reason has been trouble with the domain name, but the inventor, Mattias Östmar finally got it working with help from Annika Lidne. Good work!

Happy times!

Sorry for unstable server tonight

I had to restart the Windows server tonight (for the first time in more 6 months), while doing this I decided to install some Windows updates. This took more time than expected (2 hours). Everything should be up and running now though, sorry for any inconvenience!

Language Detection

Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.

I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.

Do you think a language is missing? Let me know

Fidel or Franco?

Just wanted to share that another interesting uClassify application has been created. This application classifies Spanish web pages for Fidel or Franco alignment. (Note that the text has to be in Spanish to work properly).

Even though I think this page is intended for fun, I believe that such classifiers can be used in commercial purposes. Imagine a political party that want to find bloggers that have a right or left wing alignment to help them drive their election. They could then run a huge number of blogs (collected from some data source) through the classifier and find those that they are looking for.

Good work!

Mood classifier is helping to save the world

Recently Team Curious created a really interesting website, MDGActors that uses a mix of APIs to recognize actors for the Millennium Development Goals (MDG). This is how they describe it:

“Tackling the toughest problems facing the world today is a big job. What if we could highlight the people that are making a big difference and call out the people that aren’t? We think that it would be inspiring—Umair Haque thinks it could change the world.” Read their complete manifesto

They scan relevant articles for personal names, and classifies sentences around it with the Mood classifier (created by Mattias). They call it ‘Empathize’ and describe it as:

“If an author voices particularly strong sentiment in a sentence, an icon is added next to the sentence to indicate that feeling. Phrases with hearts are usually about people doing good, and phrases with condescending frowns usually signal more controversial topics.”

I find the idea very interesting, being able to pinpoint heroes and anti-heroes for a good cause. Great work!

Another interesting application Team Curious are providing is their Spin, where you just enter any keyword and it will tell you the sentiment of the results.

Entering George Bush came back with 14 negative and 0 positive voices.

A negative voice about George Bush

Entering ‘Terminator Salvation‘ showed 2 negative and 11 positive voices.

A positive voice about Terminator Salvation

Their project is a part of the Microsofts Imagine Cup Competition, I really wish you guys the best of luck!!