Language Detection
Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.
I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.
Do you think a language is missing? Let me know
Tags: classifier, detection, free, language





















subscribe
June 12th, 2009 at 12:47 pm
Turkish is missing!
June 14th, 2009 at 8:16 am
Thanks Tolga! I will look into it!
June 29th, 2009 at 10:18 am
We have now added Turkish to our language detection classifier!