Up until now the development of urlai.com has mainly been an exercise in writing efficient SQL, the database itself has been filled with, if I recall correctly, millions of blog post and for each post 12 different classifications. Everything is queried from the database and if not in the database through our classifiers. Now when we have fixed most of the bottlenecks in the SQL queries we have decided to start playing a bit with the data.
Our first feature was to add blog ranking. We started with rank based on the mood (upset or happy), once this is stable and scalable we will add gender ranking as well (most manly/feminine bloggers).
We have some ideas of how to move urlai further in the future. As we are gathering a lot of classified data we imagine that there should be some value in text search through all the posts and plot the result with respect to the classifiers.
Also to improve our classifiers we are looking into user reinforced training. For example, when a blog is shown, users get an opportunity to leave classifier feedback. “Hey I’m not 60-100 years!”
We are definitely going to add more classifiers as well.
We also would like to add some cool visualization of the classifiers – perhaps one that even makes it possible to zoom in search set->blogger->posts->specific words. Perhaps with the cool GapMinder tool?