Just wanted to mention that we have a freely available language classifier, called Text Language. I believe this classifier works very well on texts. It’s constructed from the 4000 most common words of each language.
I recently added Hungarian so now it can detect any of these 33 languages: Arabic, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian and Vietnamese.
Do you think a language is missing? Let me know
Turkish is missing!
Thanks Tolga! I will look into it!
We have now added Turkish to our language detection classifier!