TrollGuard – protects your blog from spam comments

Me and Roger have just finished TrollGuard – an anti-spam plugin to WordPress 2.7 or later.

The plugin is in Beta and we are aware of some lacking features – however we would greatly appreciate if someone out there wanted to do some testing for us and come back with feedback!

This has been a small sideproject we did during our Christmas holidays using the uClassify API. We think it’s really cool that in less than a week we were able to setup a new Akismet service. Previous uClassify web applications have mostly been for entertainment, this plugin will acctually do something helpful – protect blogs from spam comments.

We are also confident in the accuracy of TrollGuard as similar classification technology has been used in Cactus Spam Filter since 2004.

Well now it’s up to you to test it! What isn’t working? What features are missing? Let us know!

Check TrollGuard out!

Spam, huh?

We are currently working on a prototype to identify spam blogs – splogs. Spam blogs can be really tricky to identify even to the human eye, as i-trepreneur.com writes in a recent post:

Why? These Splogs are user friendly. They were not made for search engines but for real visitors. There’s excellent design, well organized sections, working RSS feed. All the information on such Splogs is manually selected from the most popular resources on the net and is properly referenced. Only fresh content is used so it is not identified as duplicate instantly.

Pointing out that madconomist dot com and business-opportunities dot biz are two well made splogs which people are commenting and linking. I can’t tell by just looking at them with my bare eyes – so is’t spam huh? A later post on that philosophical aspect!

A prototype

We have set up a prototype to identify spam blogs. Right now it’s really rudimentary but shows potential. In the future by using clusters of classifiers hosted here at uclassify we think we can create a sufficiently good splog classifier.

Check out the project here, www.spamhuh.com. Remember that it’s only an early prototype!

Concerning the two hard to detect spam blogs above spamhuh.com is able to correctly identify one of them :)

Try it out and let us know what you think!!

Donate your spam!

We are evaluating our next move and are running preliminary tests on spam comments (spaments?). We only have a few corporas to test on and it looks good on those (I’ll get back with exact performance later).

We want your blog comments for a good cause

Following our own guidelines we are looking for more data to test on. If you have a WordPress installation you can help us out by:

  1. Log into phpMyAdmin
  2. Select your WordPress database
  3. Click on the table ‘wp_comments’
  4. Click on ‘Export’
  5. Select the XML format
  6. Check ‘Save to file’ and click ‘Run’
  7. Attach the exported XML to an e-mail for contact AT uclassify DOT com

We will not publish any comments without asking you for permission first. Also you will be credited with your name and blog when we return with the classifier results for your comments.

Thank you!

Developing the development

Since we released the beta version a couple a weeks ago we have seen a few websites pop up building on the uClassify techonology. This is very encouraging for us! Right now we are trying to reach out to more users who want to use our classifier API.

We have spent a lot of time on development of our service – making it parallel – robust – low on memory – fast etc. This is what we are really good at. The remaining part which is as important – to reach out to users – advertise ourselves and being seen on the right places is not our sharpest skill.

Besides writing this blog and posting the uClassify link on a couple of sites we haven’t done much to show our muscles – yet! We thought that we perhaps would use our own API ourselves – that is probably an easier way to create some buzz! We have a couple of ideas make us seen (feel free to use these ideas for yourself):

Build an Anti Spam Comment Plugin for WordPress?

We are quite confident that we could do really well as the classifier engine has shown really good results in Cactus Spam Filter. This would compete or be a good complement to Akismet, Defensio and similar. Is there anyone who needs another blog spam comment filter?
antispamspam

Build a Spam Blog Filter?

This seems to be a problem for many blog communities, building a splogs (spam blogs) filter could give us some good attention. What would be really nice is if somebody could provide us with dynamic training data on slogs and blogs – then we could automate the training process and find the undetected spam! Anyone who want to donate their spam? :)

Implement a JSON API for uClassify?

Building a JSON API would not only broaden our API (only XML API right now) it would also let users use our classification service via Yahoo! Pipes. Yahoo Pipes let’s you combine different RSS flows into one and use external web services (via JSON) – which is madly cool.

Language Detection – talar du svenska?

We already have a language detection classifier (not published yet) that only needs training data refinement (removal of noise such as English words in the Filipino class). It supports 40 languages. This would be fairly simple and could give us some buzz.

Ideas, anyone!

Do you have any ides? Let us know – or use the uClassify API to create your own classifier (spam filter, language detection or whatever comes to your mind).