Language Detection

Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.

I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.

Do you think a language is missing? Let me know

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Blogosphere News
  • De.lirio.us
  • Furl
  • LinkedIn
  • Live
  • Ma.gnolia
  • Slashdot
  • Spurl
  • StumbleUpon
  • TailRank
  • Technorati
  • Tumblr
  • TwitThis
  • Wikio
  • Yahoo! Buzz

Tags: , , ,

3 Responses to “Language Detection”

  1. Tolga Erkal Says:

    Turkish is missing!

  2. Jon Says:

    Thanks Tolga! I will look into it!

  3. Jon Says:

    We have now added Turkish to our language detection classifier!

Leave a Reply