Artificial Intelligence to determine an authors age

Young and old people

We have just released ageanalyzer.com, a site that reads a blog and guesses the age of the author!

Background

Our writing style reflects us in many ways, for example texts written in anger probably differs from words written in joy.  Reading a text intuitively gives us a clue about the author as you start forming a picture in your head.  Sometimes it’s easy to pinpoint how you got this picture and at other times harder.

We wanted to know if we could give computers the same intuition, in this particular project we are interesting in finding out if a computer can tell the age of an author – only given a text.

To do this experiment we collected 7000 blogs that had age information in the profile and split it into 6 different age groups, 13-17, 18-25, 26-35, 36-50, 51-65 and 65+. We then created a classifier on uClassify and fed it with the training data. Viola!

Expected results

After running tests on the training data (10-fold-cross-validation) it was clear that our classifier was able to find differences between the six age groups. We expect the proportion of correctly classified blogs would be around 30% compared to a baseline of 17% which would be expected if the classifier was guessing out of the blue.

We have added a poll to the site to help us see how well (or poorly) it works!

Try AgeAnalyzer out here!

2 thoughts on “Artificial Intelligence to determine an authors age”

  1. Hi, This is a very neat idea…
    I was wondering if you made any attempt to map the author age for a blogpost to the publication date?
    For example I’m 25years old but I’ve been blogging since 2003. Ironically this just puts all my blog data in the 18-25 age bucket, but I could easily have been spanning 3 buckets simply by starting blogging for slightly longer.
    I’m sure my blogpost writting has changed a lot of the years. I personally think there could be a big difference between a 18 year old and a 25 year old’s writing style.
    It would be interesting to see if you can do some kind of ML clustering to pick the age buckets. There might be some surprises…

    James

  2. Hi James!

    Thanks for your comment! Yes, classifying over time is very interesting and we are currently sketching on a another service that will do that. I also believe there can be a good difference between 18 and 25 years writing style and it would be VERY interesting to cluster out the classes, if I get time I may try this to see what it would suggest. Good idea! Right now the categories are based on what I thought might be a good split.

Comments are closed.