Sentiment API en français

La plupart de nos classificateurs les plus demandés sont maintenant disponibles en plusieurs langues. Sentiment, l’un des plus populaires, est à présent disponible en français. Ce classificateur détermine si un texte peut être catégorisé comme positif ou négatif, en analysant l’utilisation de la langue. Cliquer ici pour tester Sentiment.

Pour la catégorisation par sujets, nous mettons aussi à votre disposition IAB taxonomy v2 et tous les classificateurs Topics.

L’utilisation de nos API est gratuite jusqu’à 500 requêtes par jour. Au delà de ce seuil, nous avons plusieurs alternatives, à partir de 9€ par mois (5000 requêtes/jour).

Download invoices

We have released a simple invoice system. It allows you to view and print invoices from your account. If you are a subscribing user you will find the invoices under the new ‘Payments’ tab.

Invoices will only be generated for any new payments.

If you have a Company/VAT number you may enter it under your profile settings.

To download the invoices as a pdf, click on the ‘Print’ button and then select ‘Print to pdf…’.

Sentiment API för svenska

Många av våra klassificerare finns nu tillgängliga på flera språk. En av de populäraste som nu även finns på svenska är Sentiment. Den avgör om en text är positiv eller negativ genom att analysera språkbruket.

För ämneskategorisering finns även IAB taxonomy v2 och samtliga Topics klassificerare på flera språk inklusive svenska.

Det är gratis att använda vårt API upp till 500 anrop/dag, efter det finns det olika kostnadsnivåer från 9€ per månad (5000 anrop/dag).

This post announces that many of our classifiers are available in Swedish.

Happy new 2018

Here is a short summary of 2017 and some glimpses into 2018.

Last year was a good year for uClassify. The main theme was to offer classifiers in multiple languages (English, Spanish, French and Swedish). The task was non trivial and we decided to keep it in ‘beta’ for a long to make sure it works and scales as intended. Now we feel confident to move out of beta and start to promote the service.

We created a few new classifiers for our users, the most popular are the IAB Taxonomy V2 and Language Detection classifiers (I am particularly proud of its capability to detect 370 different languages!) .

For the second half of 2017 I went on parental leave, during this time I mostly monitored uClassify, answered emails and pushed a few fixes.

As a hobby project I created a site with tons of generated number sequences, sequencedb.net, if you are into that kind of thing.

Thoughts about 2018

In the beginning of 2018 we will add more classifiers in different languages and move out of beta and do some promoting.

As for the next big features we are not entirely sure, there is a big request for URL batching, for different reasons we’ve been dodging this in the past, but it deserves a reconsideration.

During parental leave I played a lot with numeric, images and time series classification (as opposed to text). This is something I’m thinking of might find it’s way into the platform, although not sure in what form.

Another thing we should do is to publish api clients in different languages (Java, Python, C# etc).

During the coming month (my last month on parental leave) I’ll start with some of the tasks and set a plan for the rest of the year.

Happy new years everyone!

Jon

Discourse classifier

We have added a new classifier that can determine the discourse of a text. It can for example distinguish questions from answers, if the answer is an agreement or disagreement. It even tries to see if there is humor in the text. The classes are listed below.

  • Agreement
  • Announcement
  • Answer
  • Appreciation
  • Disagreement
  • Elaboration
  • Humor
  • Negative_reaction
  • Other
  • Question

 

 

Since long texts often has mixed discourse, containing questions, answers, elaborations, humor an so on – it may make sense to pass single sentences or phrases for classification (split the text).

It’s based on the dataset from the paper “Characterizing Online Discussion Using Coarse Discourse Sequences (ICWSM ’17)” The dataset is built from annotated reddit comments.

Spanish, French and Swedish classifier languages

During the last half year the Sentiment classifier have been beta enabled for Spanish, French and Swedish. The test period has been very successful and we have decided to expand multi language support to more popular classifiers such as the Gender Analyzer, Mood and Myer Briggs classifiers.

Classifiers with multiple languages are have flags displayed like the icons above. From the GUI you can test them by clicking the flag first, from the API you simply add the language code (/es, /fr, /sv) to the request URL, for more information see the documentation.

The service is still in beta, as we still need to make sure it scales when more users start to use it. The API will probably not change.

New xml text element

Our XML API has been around since the release of uClassify back in 2008. It’s very flexible and powerful. Previously, to avoid breaking the XML all texts passed needed to be base64 encoded in the <textBase64> element. With this release we introduce the element <text> that doesn’t require base64 encoding. The <textBase64> is of course still supported.

The new <text> element can take plain text. This saves some bandwidth, performance and makes it easier to use. The string needs to be XML encoded so it doesn’t break the XML. Most languages have support functions for this, look for “escape XML” or similar. Basically it replaces 5 characters (<,>,&,’ and “) with their encoding (&lt; etc.).

<text>I love new features &amp; would like to see more in the future</text>

<textBase64>SSBsb3ZlIG5ldyBmZWF0dXJlcyAmIHdvdWxkIGxpa2UgdG8gc2VlIG1vcmUgaW4gdGhlIGZ1dHVyZQ==</textBase64>

The new <text> element makes the implementation of our next big feature easier… 😉

IAB Taxonomy V2

The Interactive Advertising Bureau (IAB) has released a version 2 of their taxonomy as of the first of March 2017. The new taxonomy contains more topics than the old and has gone through a general overhaul to make it more clear.

We have build a new classifier, IAB Taxonomy V2, that conforms with the latest standard.

The new ‘Content’ category has been left out but you can get content language by calling our Language Detector.

Any feedback is appreciated and we may add more training data if necessary.

Class name format

The new taxonomy has up to 4 tiers this is reflected in the class names. The format of the class names is level1_leaf_id1_id2_id3_id4 the ids correspond to the IAB codes and are integers.

You can read more about the taxonomy at their homepage where you also can find the complete id mapping.

Classifier accuracy improvements

We have updated some of our most popular classifiers to give better results.

Our most popular classifier for sentiment has been updated to give better performance. The major difference is that the data has gone through a cleaning pass, removing non english texts (noise). And with a slightly improved feature extractor and optimized data we can expect better accuracy.

We’ve also updated the following popular classifiers to use a new feature extractor. The result is better accuracy.

We might update more classifiers in the future.