We have just released ageanalyzer.com, a site that reads a blog and guesses the age of the author!
Background
Our writing style reflects us in many ways, for example texts written in anger probably differs from words written in joy. Reading a text intuitively gives us a clue about the author as you start forming a picture in your head. Sometimes it’s easy to pinpoint how you got this picture and at other times harder.
We wanted to know if we could give computers the same intuition, in this particular project we are interesting in finding out if a computer can tell the age of an author – only given a text.
To do this experiment we collected 7000 blogs that had age information in the profile and split it into 6 different age groups, 13-17, 18-25, 26-35, 36-50, 51-65 and 65+. We then created a classifier on uClassify and fed it with the training data. Viola!
Expected results
After running tests on the training data (10-fold-cross-validation) it was clear that our classifier was able to find differences between the six age groups. We expect the proportion of correctly classified blogs would be around 30% compared to a baseline of 17% which would be expected if the classifier was guessing out of the blue.
We have added a poll to the site to help us see how well (or poorly) it works!