Gender Text Analysis

Do males and females express themselves differently in text? Yes is the answer if we look at the research carried out at the University of Texas, in the article “Effects on age and Gender on Blogging” [1] it’s found that author gender can be determined with an accuracy of 80% by looking at a text. This is achieved with a classifier, trained on 37478 blogs written by males and females at

Gender stereotypes in the blogosphere

The research also shows the most discriminating terms for males of females (using information gain).

Male favorite words

– linux
– microsoft
– gaming
– server

– software
– gb
– programming
– google
– data
– graphics
– india
– nations
– democracy

– users
– economic

Female favorite words

– shopping
– mom
– cried
– freaked
– pink

– cute
– gosh
– kisses
– yummy
– mommy
– boyfriend
– skirt
– adorable
– husband
– hubby

They conclude “Male bloggers of all ages write more about politics, technology and money than do their female cohorts. Female bloggers discuss their personal lives – and use more personal writing style – much more than males do.”

Try it on your blog uses the same approach as described in the article, they have collected 2000 blogs from written by men and woman. They also have a poll which allows us to see how well it’s working, as we speak it has an accuracy of 70%.

Trying this blog in the analyzer gives us the correct answer

We think is written by a man.

[1] J. Schler, Moshe Koppel, S. Argamon and J. Pennebaker (2006), Effects of Age and Gender on Blogging, in Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, March 2006. PDF