uClassify blog – uClassify machine learning news and development

uClassify on Integromat

Integromat is an online service that allows you to easily connect different services, a bit like Lego for APIs. We have added uClassify to their library of public APIs. For example you can take some tweets from Twitter, run it through uClassify and store the result to Google Drive without doing any programming.

The first version exposes a module to classify texts to a number of predefined classifiers such as Sentiment, Topics and IAB taxonomy. In addition there is module to translate text from Spanish, French and Swedish to English. You can also create and train classifier by using the Generic API module.

Sentiment API en français

La plupart de nos classificateurs les plus demandés sont maintenant disponibles en plusieurs langues. Sentiment, l’un des plus populaires, est à présent disponible en français. Ce classificateur détermine si un texte peut être catégorisé comme positif ou négatif, en analysant l’utilisation de la langue. Cliquer ici pour tester Sentiment.

Pour la catégorisation par sujets, nous mettons aussi à votre disposition IAB taxonomy v2 et tous les classificateurs Topics.

L’utilisation de nos API est gratuite jusqu’à 500 requêtes par jour. Au delà de ce seuil, nous avons plusieurs alternatives, à partir de 9€ par mois (5000 requêtes/jour).

Download invoices

We have released a simple invoice system. It allows you to view and print invoices from your account. If you are a subscribing user you will find the invoices under the new ‘Payments’ tab.

Invoices will only be generated for any new payments.

If you have a Company/VAT number you may enter it under your profile settings.

To download the invoices as a pdf, click on the ‘Print’ button and then select ‘Print to pdf…’.

IAB Taxonomy v2 for French, Spanish, Swedish and English.

We have just enabled multi language support for our IAB Taxonomy V2 and our Topics classifiers. The new languages are French, Spanish and Swedish. You can try them out in our GUI or via our API.

If you already are using the API for English, you don’t need to make any changes.

Sentiment API för svenska

Många av våra klassificerare finns nu tillgängliga på flera språk. En av de populäraste som nu även finns på svenska är Sentiment. Den avgör om en text är positiv eller negativ genom att analysera språkbruket.

För ämneskategorisering finns även IAB taxonomy v2 och samtliga Topics klassificerare på flera språk inklusive svenska.

Det är gratis att använda vårt API upp till 500 anrop/dag, efter det finns det olika kostnadsnivåer från 9€ per månad (5000 anrop/dag).

This post announces that many of our classifiers are available in Swedish.

Happy new 2018

Here is a short summary of 2017 and some glimpses into 2018.

Last year was a good year for uClassify. The main theme was to offer classifiers in multiple languages (English, Spanish, French and Swedish). The task was non trivial and we decided to keep it in ‘beta’ for a long to make sure it works and scales as intended. Now we feel confident to move out of beta and start to promote the service.

We created a few new classifiers for our users, the most popular are the IAB Taxonomy V2 and Language Detection classifiers (I am particularly proud of its capability to detect 370 different languages!) .

For the second half of 2017 I went on parental leave, during this time I mostly monitored uClassify, answered emails and pushed a few fixes.

As a hobby project I created a site with tons of generated number sequences, sequencedb.net, if you are into that kind of thing.

Thoughts about 2018

In the beginning of 2018 we will add more classifiers in different languages and move out of beta and do some promoting.

As for the next big features we are not entirely sure, there is a big request for URL batching, for different reasons we’ve been dodging this in the past, but it deserves a reconsideration.

During parental leave I played a lot with numeric, images and time series classification (as opposed to text). This is something I’m thinking of might find it’s way into the platform, although not sure in what form.

Another thing we should do is to publish api clients in different languages (Java, Python, C# etc).

During the coming month (my last month on parental leave) I’ll start with some of the tasks and set a plan for the rest of the year.

Happy new years everyone!

Jon

Discourse classifier

We have added a new classifier that can determine the discourse of a text. It can for example distinguish questions from answers, if the answer is an agreement or disagreement. It even tries to see if there is humor in the text. The classes are listed below.

Agreement
Announcement
Answer
Appreciation
Disagreement
Elaboration
Humor
Negative_reaction
Other
Question

Since long texts often has mixed discourse, containing questions, answers, elaborations, humor an so on – it may make sense to pass single sentences or phrases for classification (split the text).

It’s based on the dataset from the paper “Characterizing Online Discussion Using Coarse Discourse Sequences (ICWSM ’17)” The dataset is built from annotated reddit comments.

Spanish, French and Swedish classifier languages

During the last half year the Sentiment classifier have been beta enabled for Spanish, French and Swedish. The test period has been very successful and we have decided to expand multi language support to more popular classifiers such as the Gender Analyzer, Mood and Myer Briggs classifiers.

Classifiers with multiple languages are have flags displayed like the icons above. From the GUI you can test them by clicking the flag first, from the API you simply add the language code (/es, /fr, /sv) to the request URL, for more information see the documentation.

The service is still in beta, as we still need to make sure it scales when more users start to use it. The API will probably not change.

New xml text element

Our XML API has been around since the release of uClassify back in 2008. It’s very flexible and powerful. Previously, to avoid breaking the XML all texts passed needed to be base64 encoded in the <textBase64> element. With this release we introduce the element <text> that doesn’t require base64 encoding. The <textBase64> is of course still supported.

The new <text> element can take plain text. This saves some bandwidth, performance and makes it easier to use. The string needs to be XML encoded so it doesn’t break the XML. Most languages have support functions for this, look for “escape XML” or similar. Basically it replaces 5 characters (<,>,&,’ and “) with their encoding (< etc.).

<text>I love new features & would like to see more in the future</text>

<textBase64>SSBsb3ZlIG5ldyBmZWF0dXJlcyAmIHdvdWxkIGxpa2UgdG8gc2VlIG1vcmUgaW4gdGhlIGZ1dHVyZQ==</textBase64>

The new <text> element makes the implementation of our next big feature easier… 😉

IAB Taxonomy V2

The Interactive Advertising Bureau (IAB) has released a version 2 of their taxonomy as of the first of March 2017. The new taxonomy contains more topics than the old and has gone through a general overhaul to make it more clear.

We have build a new classifier, IAB Taxonomy V2, that conforms with the latest standard.

The new ‘Content’ category has been left out but you can get content language by calling our Language Detector.

Any feedback is appreciated and we may add more training data if necessary.

Class name format

The new taxonomy has up to 4 tiers this is reflected in the class names. The format of the class names is level1_leaf_id1_id2_id3_id4 the ids correspond to the IAB codes and are integers.

You can read more about the taxonomy at their homepage where you also can find the complete id mapping.