Download invoices

We have released a simple invoice system. It allows you to view and print invoices from your account. If you are a subscribing user you will find the invoices under the new ‘Payments’ tab.

Invoices will only be generated for any new payments.

If you have a Company/VAT number you may enter it under your profile settings.

To download the invoices as a pdf, click on the ‘Print’ button and then select ‘Print to pdf…’.

Sentiment API för svenska

Många av våra klassificerare finns nu tillgängliga på flera språk. En av de populäraste som nu även finns på svenska är Sentiment. Den avgör om en text är positiv eller negativ genom att analysera språkbruket.

För ämneskategorisering finns även IAB taxonomy v2 och samtliga Topics klassificerare på flera språk inklusive svenska.

Det är gratis att använda vårt API upp till 500 anrop/dag, efter det finns det olika kostnadsnivåer från 9€ per månad (5000 anrop/dag).

This post announces that many of our classifiers are available in Swedish.

Happy new 2018

Here is a short summary of 2017 and some glimpses into 2018.

Last year was a good year for uClassify. The main theme was to offer classifiers in multiple languages (English, Spanish, French and Swedish). The task was non trivial and we decided to keep it in ‘beta’ for a long to make sure it works and scales as intended. Now we feel confident to move out of beta and start to promote the service.

We created a few new classifiers for our users, the most popular are the IAB Taxonomy V2 and Language Detection classifiers (I am particularly proud of its capability to detect 370 different languages!) .

For the second half of 2017 I went on parental leave, during this time I mostly monitored uClassify, answered emails and pushed a few fixes.

As a hobby project I created a site with tons of generated number sequences, sequencedb.net, if you are into that kind of thing.

Thoughts about 2018

In the beginning of 2018 we will add more classifiers in different languages and move out of beta and do some promoting.

As for the next big features we are not entirely sure, there is a big request for URL batching, for different reasons we’ve been dodging this in the past, but it deserves a reconsideration.

During parental leave I played a lot with numeric, images and time series classification (as opposed to text). This is something I’m thinking of might find it’s way into the platform, although not sure in what form.

Another thing we should do is to publish api clients in different languages (Java, Python, C# etc).

During the coming month (my last month on parental leave) I’ll start with some of the tasks and set a plan for the rest of the year.

Happy new years everyone!

Jon

Discourse classifier

We have added a new classifier that can determine the discourse of a text. It can for example distinguish questions from answers, if the answer is an agreement or disagreement. It even tries to see if there is humor in the text. The classes are listed below.

  • Agreement
  • Announcement
  • Answer
  • Appreciation
  • Disagreement
  • Elaboration
  • Humor
  • Negative_reaction
  • Other
  • Question

 

 

Since long texts often has mixed discourse, containing questions, answers, elaborations, humor an so on – it may make sense to pass single sentences or phrases for classification (split the text).

It’s based on the dataset from the paper “Characterizing Online Discussion Using Coarse Discourse Sequences (ICWSM ’17)” The dataset is built from annotated reddit comments.

New xml text element

Our XML API has been around since the release of uClassify back in 2008. It’s very flexible and powerful. Previously, to avoid breaking the XML all texts passed needed to be base64 encoded in the <textBase64> element. With this release we introduce the element <text> that doesn’t require base64 encoding. The <textBase64> is of course still supported.

The new <text> element can take plain text. This saves some bandwidth, performance and makes it easier to use. The string needs to be XML encoded so it doesn’t break the XML. Most languages have support functions for this, look for “escape XML” or similar. Basically it replaces 5 characters (<,>,&,’ and “) with their encoding (&lt; etc.).

<text>I love new features &amp; would like to see more in the future</text>

<textBase64>SSBsb3ZlIG5ldyBmZWF0dXJlcyAmIHdvdWxkIGxpa2UgdG8gc2VlIG1vcmUgaW4gdGhlIGZ1dHVyZQ==</textBase64>

The new <text> element makes the implementation of our next big feature easier… 😉

Classifier accuracy improvements

We have updated some of our most popular classifiers to give better results.

Our most popular classifier for sentiment has been updated to give better performance. The major difference is that the data has gone through a cleaning pass, removing non english texts (noise). And with a slightly improved feature extractor and optimized data we can expect better accuracy.

We’ve also updated the following popular classifiers to use a new feature extractor. The result is better accuracy.

We might update more classifiers in the future.

New account limits

We have updated the daily quota limits for the different accounts. If you already (before 3rd of March 2017) are subscribing on an account this will not affect you.

The adjustments are made after looking at the statistics of our users. Before the Indie and Professional account had the same rate limits but at different prices. The new model is a ladder from Indie to Professional accounts, with growing discount the higher you get.

This also affected our translation api, which now gives you 40 characters per call instead of 10.

The new pricing can be found here. We will see how it works out the coming weeks, we might have to make some adjustments.

The feature extractor update

With the latest version of uClassify we have abandoned the old feature extractor for user trained classifiers. This means that you will get better performance for all new classifier created after 2017-03-26. All already created classifiers won’t be affected.

Background

In the beginning, around 2007, I thought it would be best to allow the users to do preprocessing and use a really simple feature extractor on the server. This feature extractor only separated words by the space character (32 decimal). The idea was that users could preprocess their texts to allow more delimiters (e.g. exchange ! to space on their side). You can also generate your own bigrams by combining them with underscore or something.

Classifiers made by uClassify have used other feature extractors, e.g. the sentiment classifier uses unigrams, bigrams, some stemming and converting to lower case. This improves the performance significantly.

Now, it’s not easy for someone who is new to machine learning and text classification to guess which delimiters to use. Also it can be very counter intuitive. For that reason, I decided to replace the old unigram feature extractor with a new high performance general purpose extractor.

The new feature extractor

The new feature extractor has been found heuristically by running extensive tests over a large set of corpora. The datasets are a part of our internal testing suite that contains over 80 different test sets for a wide range of problems. Therefore I am very confident that the new feature extractor will do really well.

In short here is what it does:

  • convert text to lower case
  • separate words on white spaces, exclamation mark, parentheses, period and slash.
  • generate uni grams
  • generate bi grams
  • generate parts of long unigrams

This is similar to what the Sentiment classifier has been using which allows it to differentiate between “I don’t like” and “I like”.

Let me know if you have any feedback, you can reach me at contact AT uclassify DOT com or @jonkagstrom on twitter!

Translation API

We get a lot of requests for classifiers in different languages and as a next step we are building a translation API. The idea is to have an affordable in-house machine translation service that can quickly translate requests to the classifier language, classify the request and send back the response. Since the majority of classifiers are in English, the primary focus will be to target English.

Initially we support French, Spanish and Swedish to English translations.

Translation demo

The API is accessible with your ordinary API read key and a GET/POST REST protocol.

You can test and read all about the translation API here.

Hello world

Please don’t hesitate to report any weirdness to me!

IAB taxonomy classifier

Upon popular request I’ve built a new topics classifier based on the IAB taxonomy. EDIT: We also support the IAB Taxonomy V2 now.

The classifier has two levels of depth, a main category (sports, science…) and a sub category (soccer, physics…). In total there are about 360 different classes following the IAB Quality Assurance Guidelines (QAG) Taxonomy specification.

uClassify interface to IAB
uClassify interface to IAB

You can try the online demo here.

Class name format

The class names are composed of 4 parts separated by an underscore, with the following structure:

main topic_sub topic_main id_sub id
home and garden_flowers_5_4
sports_climbing_17_3
sports_volleyball_17_7

The last two ids are the IAB ids, this will make it easier for users tho map and integrate the result.

With a free uClassify account you can make 1000 free calls per day, if you need more there are affordable options from 9€ per month. You can sign up here.

List of topics

IAB12 News and IAB24 Uncategorized is not supported.

IAB1 Arts & Entertainment
IAB1-1 Books & Literature
IAB1-2 Celebrity Fan/Gossip
IAB1-3 Fine Art
IAB1-4 Humor
IAB1-5 Movies
IAB1-6 Music
IAB1-7 Television

IAB2 Automotive
IAB2-1 Auto Parts
IAB2-2 Auto Repair
IAB2-3 Buying/Selling Cars
IAB2-4 Car Culture
IAB2-5 Certified Pre-Owned
IAB2-6 Convertible
IAB2-7 Coupe
IAB2-8 Crossover
IAB2-9 Diesel
IAB2-10 Electric Vehicle
IAB2-11 Hatchback
IAB2-12 Hybrid
IAB2-13 Luxury
IAB2-14 Minivan
IAB2-15 Motorcycles
IAB2-16 Off-Road Vehicles
IAB2-17 Performance Vehicles
IAB2-18 Pickup
IAB2-19 Road-Side Assistance
IAB2-20 Sedan
IAB2-21 Trucks & Accessories
IAB2-22 Vintage Cars
IAB2-23 Wagon

IAB3 Business
IAB3-1 Advertising
IAB3-2 Agriculture
IAB3-3 Biotech/Biomedical
IAB3-4 Business Software
IAB3-5 Construction
IAB3-6 Forestry
IAB3-7 Government
IAB3-8 Green Solutions
IAB3-9 Human Resources
IAB3-10 Logistics
IAB3-11 Marketing
IAB3-12 Metals

IAB4 Careers
IAB4-1 Career Planning
IAB4-2 College
IAB4-3 Financial Aid
IAB4-4 Job Fairs
IAB4-5 Job Search
IAB4-6 Resume Writing/Advice
IAB4-7 Nursing
IAB4-8 Scholarships
IAB4-9 Telecommuting
IAB4-10 U.S. Military
IAB4-11 Career Advice

IAB5 Education
IAB5-1 7-12 Education
IAB5-2 Adult Education
IAB5-3 Art History
IAB5-4 College Administration
IAB5-5 College Life
IAB5-6 Distance Learning
IAB5-7 English as a 2nd Language
IAB5-8 Language Learning
IAB5-9 Graduate School
IAB5-10 Homeschooling
IAB5-11 Homework/Study Tips
IAB5-12 K-6 Educators
IAB5-13 Private School
IAB5-14 Special Education
IAB5-15 Studying Business

IAB6 Family & Parenting
IAB6-1 Adoption
IAB6-2 Babies & Toddlers
IAB6-3 Daycare/Pre School
IAB6-4 Family Internet
IAB6-5 Parenting – K-6 Kids
IAB6-6 Parenting teens
IAB6-7 Pregnancy
IAB6-8 Special Needs Kids
IAB6-9 Eldercare

IAB7 Health & Fitness
IAB7-1 Exercise
IAB7-2 ADD
IAB7-3 AIDS/HIV
IAB7-4 Allergies
IAB7-5 Alternative Medicine
IAB7-6 Arthritis
IAB7-7 Asthma
IAB7-8 Autism/PDD
IAB7-9 Bipolar Disorder
IAB7-10 Brain Tumor
IAB7-11 Cancer
IAB7-12 Cholesterol
IAB7-13 Chronic Fatigue Syndrome
IAB7-14 Chronic Pain
IAB7-15 Cold & Flu
IAB7-16 Deafness
IAB7-17 Dental Care
IAB7-18 Depression
IAB7-19 Dermatology
IAB7-20 Diabetes
IAB7-21 Epilepsy
IAB7-22 GERD/Acid Reflux
IAB7-23 Headaches/Migraines
IAB7-24 Heart Disease
IAB7-25 Herbs for Health
IAB7-26 Holistic Healing
IAB7-27 IBS/Crohn’s Disease
IAB7-28 Incest/Abuse Support
IAB7-29 Incontinence
IAB7-30 Infertility
IAB7-31 Men’s Health
IAB7-32 Nutrition
IAB7-33 Orthopedics
IAB7-34 Panic/Anxiety Disorders
IAB7-35 Pediatrics
IAB7-36 Physical Therapy
IAB7-37 Psychology/Psychiatry
IAB7-38 Senior Health
IAB7-39 Sexuality
IAB7-40 Sleep Disorders
IAB7-41 Smoking Cessation
IAB7-42 Substance Abuse
IAB7-43 Thyroid Disease
IAB7-44 Weight Loss
IAB7-45 Women’s Health

IAB8 Food & Drink
IAB8-1 American Cuisine
IAB8-2 Barbecues & Grilling
IAB8-3 Cajun/Creole
IAB8-4 Chinese Cuisine
IAB8-5 Cocktails/Beer
IAB8-6 Coffee/Tea
IAB8-7 Cuisine-Specific
IAB8-8 Desserts & Baking
IAB8-9 Dining Out
IAB8-10 Food Allergies
IAB8-11 French Cuisine
IAB8-12 Health/Low-Fat Cooking
IAB8-13 Italian Cuisine
IAB8-14 Japanese Cuisine
IAB8-15 Mexican Cuisine
IAB8-16 Vegan
IAB8-17 Vegetarian
IAB8-18 Wine

IAB9 Hobbies & Interests
IAB9-1 Art/Technology
IAB9-2 Arts & Crafts
IAB9-3 Beadwork
IAB9-4 Bird-Watching
IAB9-5 Board Games/Puzzles
IAB9-6 Candle & Soap Making
IAB9-7 Card Games
IAB9-8 Chess
IAB9-9 Cigars
IAB9-10 Collecting
IAB9-11 Comic Books
IAB9-12 Drawing/Sketching
IAB9-13 Freelance Writing
IAB9-14 Genealogy
IAB9-15 Getting Published
IAB9-16 Guitar
IAB9-17 Home Recording
IAB9-18 Investors & Patents
IAB9-19 Jewelry Making
IAB9-20 Magic & Illusion
IAB9-21 Needlework
IAB9-22 Painting
IAB9-23 Photography
IAB9-24 Radio
IAB9-25 Roleplaying Games
IAB9-26 Sci-Fi & Fantasy
IAB9-27 Scrapbooking
IAB9-28 Screenwriting
IAB9-29 Stamps & Coins
IAB9-30 Video & Computer Games
IAB9-31 Woodworking

IAB10 Home & Garden
IAB10-1 Appliances
IAB10-2 Entertaining
IAB10-3 Environmental Safety
IAB10-4 Gardening
IAB10-5 Home Repair
IAB10-6 Home Theater
IAB10-7 Interior Decorating
IAB10-8 Landscaping
IAB10-9 Remodeling & Construction

IAB11 Law, Government, & Politics
IAB11-1 Immigration
IAB11-2 Legal Issues
IAB11-3 U.S. Government Resources
IAB11-4 Politics
IAB11-5 Commentary

IAB12 News*
IAB12-1 International News
IAB12-2 National News
IAB12-3 Local News

IAB13 Personal Finance
IAB13-1 Beginning Investing
IAB13-2 Credit/Debt & Loans
IAB13-3 Financial News
IAB13-4 Financial Planning
IAB13-5 Hedge Fund
IAB13-6 Insurance
IAB13-7 Investing
IAB13-8 Mutual Funds
IAB13-9 Options
IAB13-10 Retirement Planning
IAB13-11 Stocks
IAB13-12 Tax Planning

IAB14 Society
IAB14-1 Dating
IAB14-2 Divorce Support
IAB14-3 Gay Life
IAB14-4 Marriage
IAB14-5 Senior Living
IAB14-6 Teens
IAB14-7 Weddings
IAB14-8 Ethnic Specific

IAB15 Science
IAB15-1 Astrology
IAB15-2 Biology
IAB15-3 Chemistry
IAB15-4 Geology
IAB15-5 Paranormal Phenomena
IAB15-6 Physics
IAB15-7 Space/Astronomy
IAB15-8 Geography
IAB15-9 Botany
IAB15-10 Weather

IAB16 Pets
IAB16-1 Aquariums
IAB16-2 Birds
IAB16-3 Cats
IAB16-4 Dogs
IAB16-5 Large Animals
IAB16-6 Reptiles
IAB16-7 Veterinary Medicine

IAB17 Sports
IAB17-1 Auto Racing
IAB17-2 Baseball
IAB17-3 Bicycling
IAB17-4 Bodybuilding
IAB17-5 Boxing
IAB17-6 Canoeing/Kayaking
IAB17-7 Cheerleading
IAB17-8 Climbing
IAB17-9 Cricket
IAB17-10 Figure Skating
IAB17-11 Fly Fishing
IAB17-12 Football
IAB17-13 Freshwater Fishing
IAB17-14 Game & Fish
IAB17-15 Golf
IAB17-16 Horse Racing
IAB17-17 Horses
IAB17-18 Hunting/Shooting
IAB17-19 Inline Skating
IAB17-20 Martial Arts
IAB17-21 Mountain Biking
IAB17-22 NASCAR Racing
IAB17-23 Olympics
IAB17-24 Paintball
IAB17-25 Power & Motorcycles
IAB17-26 Pro Basketball
IAB17-27 Pro Ice Hockey
IAB17-28 Rodeo
IAB17-29 Rugby
IAB17-30 Running/Jogging
IAB17-31 Sailing
IAB17-32 Saltwater Fishing
IAB17-33 Scuba Diving
IAB17-34 Skateboarding
IAB17-35 Skiing
IAB17-36 Snowboarding
IAB17-37 Surfing/Body-Boarding
IAB17-38 Swimming
IAB17-39 Table Tennis/Ping-Pong
IAB17-40 Tennis
IAB17-41 Volleyball
IAB17-42 Walking
IAB17-43 Waterski/Wakeboard
IAB17-44 World Soccer

IAB18 Style & Fashion
IAB18-1 Beauty
IAB18-2 Body Art
IAB18-3 Fashion
IAB18-4 Jewelry
IAB18-5 Clothing
IAB18-6 Accessories

IAB19 Technology & Computing
IAB19-1 3-D Graphics
IAB19-2 Animation
IAB19-3 Antivirus Software
IAB19-4 C/C++
IAB19-5 Cameras & Camcorders
IAB19-6 Cell Phones
IAB19-7 Computer Certification
IAB19-8 Computer Networking
IAB19-9 Computer Peripherals
IAB19-10 Computer Reviews
IAB19-11 Data Centers
IAB19-12 Databases
IAB19-13 Desktop Publishing
IAB19-14 Desktop Video
IAB19-15 Email
IAB19-16 Graphics Software
IAB19-17 Home Video/DVD
IAB19-18 Internet Technology
IAB19-19 Java
IAB19-20 JavaScript
IAB19-21 Mac Support
IAB19-22 MP3/MIDI
IAB19-23 Net Conferencing
IAB19-24 Net for Beginners
IAB19-25 Network Security
IAB19-26 Palmtops/PDAs
IAB19-27 PC Support
IAB19-28 Portable
IAB19-29 Entertainment
IAB19-30 Shareware/Freeware
IAB19-31 Unix
IAB19-32 Visual Basic
IAB19-33 Web Clip Art
IAB19-34 Web Design/HTML
IAB19-35 Web Search
IAB19-36 Windows

IAB20 Travel
IAB20-1 Adventure Travel
IAB20-2 Africa
IAB20-3 Air Travel
IAB20-4 Australia & New Zealand
IAB20-5 Bed & Breakfasts
IAB20-6 Budget Travel
IAB20-7 Business Travel
IAB20-8 By US Locale
IAB20-9 Camping
IAB20-10 Canada
IAB20-11 Caribbean
IAB20-12 Cruises
IAB20-13 Eastern Europe
IAB20-14 Europe
IAB20-15 France
IAB20-16 Greece
IAB20-17 Honeymoons/Getaways
IAB20-18 Hotels
IAB20-19 Italy
IAB20-20 Japan
IAB20-21 Mexico & Central America
IAB20-22 National Parks
IAB20-23 South America
IAB20-24 Spas
IAB20-25 Theme Parks
IAB20-26 Traveling with Kids
IAB20-27 United Kingdom

IAB21 Real Estate
IAB21-1 Apartments
IAB21-2 Architects
IAB21-3 Buying/Selling Homes

IAB22 Shopping
IAB22-1 Contests & Freebies
IAB22-2 Couponing
IAB22-3 Comparison
IAB22-4 Engines
IAB23 Religion & Spirituality
IAB23-1 Alternative Religions
IAB23-2 Atheism/Agnosticism
IAB23-3 Buddhism
IAB23-4 Catholicism
IAB23-5 Christianity
IAB23-6 Hinduism
IAB23-7 Islam
IAB23-8 Judaism
IAB23-9 Latter-Day Saints
IAB23-10 Pagan/Wiccan

IAB24 Uncategorized*

IAB25 Non-Standard Content
IAB25-1 Unmoderated UGC
IAB25-2 Extreme Graphic/Explicit Violence
IAB25-3 Pornography
IAB25-4 Profane Content
IAB25-5 Hate Content
IAB25-6 Under Construction
IAB25-7 Incentivized

IAB26 Illegal Content
IAB26-1 Illegal Content
IAB26-2 Warez
IAB26-3 Spyware/Malware
IAB26-4 Copyright Infringement

*  IAB12 News and IAB24 Uncategorized is not supported.