IAB Taxonomy V2

The Interactive Advertising Bureau (IAB) has released a version 2 of their taxonomy as of the first of March 2017. The new taxonomy contains more topics than the old and has gone through a general overhaul to make it more clear.

We have build a new classifier, IAB Taxonomy V2, that conforms with the latest standard.

The new ‘Content’ category has been left out but you can get content language by calling our Language Detector.

Any feedback is appreciated and we may add more training data if necessary.

Class name format

The new taxonomy has up to 4 tiers this is reflected in the class names. The format of the class names is level1_leaf_id1_id2_id3_id4 the ids correspond to the IAB codes and are integers.

You can read more about the taxonomy at their homepage where you also can find the complete id mapping.

Language Detector for +370 major and rare languages

We have constructed a language detector consisting of about 374 languages.

It can detect both living and extinct languages (e.g. English and Tupi), identify ancient and constructed (e.g. Latin and Klingon) and even different dialects.

Each language class has been named with its English name followed by an underscore and the corresponding ISO 639-3 three letter code. E.g.

  • Swedish_swe
  • English_eng
  • Chinese_zho
  • Mesopotamian Arabic_acm

You can try it here, it needs a few words to make accurate detections.

Some of the rare languages (about 30) may have insufficient training data. The idea is to improve the classifier as more documents are gathered. Also we may add more languages in the future, so make sure your code can handle that.

Here is the full list of supported languages

Language Name ISO 639-3 Type
Abkhazian abk living
Achinese ace living
Adyghe ady living
Afrihili afh constructed
Afrikaans afr living
Ainu ain living
Akan aka living
Albanian sqi living
Algerian Arabic arq living
Amharic amh living
Ancient Greek grc historical
Arabic ara living
Aragonese arg living
Armenian hye living
Arpitan frp living
Assamese asm living
Assyrian Neo-Aramaic aii living
Asturian ast living
Avaric ava living
Awadhi awa living
Aymara aym living
Azerbaijani aze living
Balinese ban living
Bambara bam living
Banjar bjn living
Bashkir bak living
Basque eus living
Bavarian bar living
Baybayanon bvy living
Belarusian bel living
Bengali ben living
Berber ber living
Bhojpuri bho living
Bishnupriya bpy living
Bislama bis living
Bodo brx living
Bosnian bos living
Breton bre living
Bulgarian bul living
Buriat bua living
Burmese mya living
Catalan cat living
Cebuano ceb living
Central Bikol bcl living
Central Huasteca Nahuatl nch living
Central Khmer khm living
Central Kurdish ckb living
Central Mnong cmo living
Chamorro cha living
Chavacano cbk living
Chechen che living
Cherokee chr living
Chinese zho living
Choctaw cho living
Chukot ckt living
Church Slavic chu ancient
Chuvash chv living
Coastal Kadazan kzj living
Cornish cor living
Corsican cos living
Cree cre living
Crimean Tatar crh living
Croatian hrv living
Cuyonon cyo living
Czech ces living
Danish dan living
Dhivehi div living
Dimli diq living
Dungan dng living
Dutch nld living
Dutton World Speedwords dws constructed
Dzongkha dzo living
Eastern Mari mhr living
Egyptian Arabic arz living
Emilian egl living
English eng living
Erzya myv living
Esperanto epo constructed
Estonian est living
Ewe ewe living
Extremaduran ext living
Faroese fao living
Fiji Hindi hif living
Finnish fin living
French fra living
Friulian fur living
Fulah ful living
Gagauz gag living
Galician glg living
Gan Chinese gan living
Ganda lug living
Garhwali gbm living
Georgian kat living
German deu living
Gilaki glk living
Gilbertese gil living
Goan Konkani gom living
Gothic got ancient
Guarani grn living
Guerrero Nahuatl ngu living
Gujarati guj living
Gulf Arabic afb living
Haitian hat living
Hakka Chinese hak living
Hausa hau living
Hawaiian haw living
Hebrew heb living
Hiligaynon hil living
Hindi hin living
Hmong Daw mww living
Hmong Njua hnj living
Ho hoc living
Hungarian hun living
Iban iba living
Icelandic isl living
Ido ido constructed
Igbo ibo living
Iloko ilo living
Indonesian ind living
Ingrian izh living
Interlingua ina constructed
Interlingue ile constructed
Iranian Persian pes living
Irish gle living
Italian ita living
Jamaican Creole English jam living
Japanese jpn living
Javanese jav living
Jinyu Chinese cjy living
Judeo-Tat jdt living
K’iche’ quc living
Kabardian kbd living
Kabyle kab living
Kadazan Dusun dtp living
Kalaallisut kal living
Kalmyk xal living
Kamba kam living
Kannada kan living
Kara-Kalpak kaa living
Karachay-Balkar krc living
Karelian krl living
Kashmiri kas living
Kashubian csb living
Kazakh kaz living
Kekchķ kek living
Keningau Murut kxi living
Khakas kjh living
Khasi kha living
Kinyarwanda kin living
Kirghiz kir living
Klingon tlh constructed
Kölsch ksh living
Komi kom living
Komi-Permyak koi living
Komi-Zyrian kpv living
Kongo kon living
Korean kor living
Kotava avk constructed
Kumyk kum living
Kurdish kur living
Ladin lld living
Ladino lad living
Lakota lkt living
Lao lao living
Latgalian ltg living
Latin lat ancient
Latvian lav living
Laz lzz living
Lezghian lez living
Lįadan ldn constructed
Ligurian lij living
Lingala lin living
Lingua Franca Nova lfn constructed
Literary Chinese lzh historical
Lithuanian lit living
Liv liv living
Livvi olo living
Lojban jbo constructed
Lombard lmo living
Louisiana Creole lou living
Low German nds living
Lower Sorbian dsb living
Luxembourgish ltz living
Macedonian mkd living
Madurese mad living
Maithili mai living
Malagasy mlg living
Malay zlm living
Malay msa living
Malayalam mal living
Maltese mlt living
Mambae mgm living
Mandarin Chinese cmn living
Manx glv living
Maori mri living
Marathi mar living
Marshallese mah living
Mazanderani mzn living
Mesopotamian Arabic acm living
Mi’kmaq mic living
Middle English enm historical
Middle French frm historical
Min Nan Chinese nan living
Minangkabau min living
Mingrelian xmf living
Mirandese mwl living
Modern Greek ell living
Mohawk moh living
Moksha mdf living
Mon mnw living
Mongolian mon living
Morisyen mfe living
Moroccan Arabic ary living
Na nbt living
Narom nrm living
Nauru nau living
Navajo nav living
Neapolitan nap living
Nepali npi living
Nepali nep living
Newari new living
Ngeq ngt living
Nigerian Fulfulde fuv living
Niuean niu living
Nogai nog living
North Levantine Arabic apc living
North Moluccan Malay max living
Northern Frisian frr living
Northern Luri lrc living
Northern Sami sme living
Norwegian nor living
Norwegian Bokmål nob living
Norwegian Nynorsk nno living
Novial nov constructed
Nyanja nya living
Occitan oci living
Official Aramaic arc ancient
Ojibwa oji living
Old Aramaic oar ancient
Old English ang historical
Old Norse non historical
Old Russian orv historical
Old Saxon osx historical
Oriya ori living
Orizaba Nahuatl nlv living
Oromo orm living
Ossetian oss living
Ottoman Turkish ota historical
Palauan pau living
Pampanga pam living
Pangasinan pag living
Panjabi pan living
Papiamento pap living
Pedi nso living
Pennsylvania German pdc living
Persian fas living
Pfaelzisch pfl living
Picard pcd living
Piemontese pms living
Pipil ppl living
Pitcairn-Norfolk pih living
Polish pol living
Pontic pnt living
Portuguese por living
Prussian prg living
Pulaar fuc living
Pushto pus living
Quechua que living
Quenya qya constructed
Romanian ron living
Romansh roh living
Romany rom living
Rundi run living
Russia Buriat bxr living
Russian rus living
Rusyn rue living
Samoan smo living
Samogitian sgs living
Sango sag living
Sanskrit san ancient
Sardinian srd living
Saterfriesisch stq living
Scots sco living
Scottish Gaelic gla living
Serbian srp living
Serbo-Croatian hbs living
Seselwa Creole French crs living
Shona sna living
Shuswap shs living
Sicilian scn living
Silesian szl living
Sindarin sjn constructed
Sindhi snd living
Sinhala sin living
Slovak slk living
Slovenian slv living
Somali som living
South Azerbaijani azb living
Southern Sami sma living
Southern Sotho sot living
Spanish spa living
Sranan Tongo srn living
Standard Latvian lvs living
Standard Malay zsm living
Sumerian sux ancient
Sundanese sun living
Swabian swg living
Swahili swa living
Swahili swh living
Swati ssw living
Swedish swe living
Swiss German gsw living
Tagal Murut mvv living
Tagalog tgl living
Tahitian tah living
Tajik tgk living
Talossan tzl constructed
Talysh tly living
Tamil tam living
Tarifit rif living
Tase Naga nst living
Tatar tat living
Telugu tel living
Temuan tmw living
Tetum tet living
Thai tha living
Tibetan bod living
Tigrinya tir living
Tok Pisin tpi living
Tokelau tkl living
Tonga ton living
Tosk Albanian als living
Tsonga tso living
Tswana tsn living
Tulu tcy living
Tupķ tpw extinct
Turkish tur living
Turkmen tuk living
Tuvalu tvl living
Tuvinian tyv living
Udmurt udm living
Uighur uig living
Ukrainian ukr living
Umbundu umb living
Upper Sorbian hsb living
Urdu urd living
Urhobo urh living
Uzbek uzb living
Venda ven living
Venetian vec living
Veps vep living
Vietnamese vie living
Vlaams vls living
Vlax Romani rmy living
Volapük vol constructed
Võro vro living
Walloon wln living
Waray war living
Welsh cym living
Western Frisian fry living
Western Mari mrj living
Western Panjabi pnb living
Wolof wol living
Wu Chinese wuu living
Xhosa xho living
Xiang Chinese hsn living
Yakut sah living
Yiddish yid living
Yoruba yor living
Yue Chinese yue living
Zaza zza living
Zeeuws zea living
Zhuang zha living
Zulu zul living

Attribution

The classifier has been trained by reading texts in many different languages. Finding high quality, non noisy texts is really difficult. Many thanks to

  1. Wikipedia that exists in so many languages
  2. Tatoeba which is a great resources for clean sentences in many languages

Analyze your uClassify results with Excel

Did you know there’s a way to classify texts without having to leave Excel? We have paired up with SeoTools for Excel, a Swiss army knife Excel-plugin, which offers a tailored “Connector” for all uClassify users.

In this blog post, we will show how SeoTools allows you to classify lists of texts or URLs with the classifiers of your choice, and having the results ready for analysis in a matter of seconds.

Don’t be worried if your Excel spreadsheet doesn’t look as the example above. The extra ribbon tab “SeoTools” is added when SeoTools for Excel is installed. At the end of this post you find all the links necessary to setup your uClassify account.

Selecting a classifier

The uClassify Connector is, as the name suggests, connected to uClassify library. Clicking on “Select” opens a window of all available classifiers. It is also possible to choose input type (Text or URL) and if the results include classification and probability.

When you are satisfied with your settings, click “Insert”, and SeoTools will generate the data in columns A and onwards.

Save time and automate the process

Exporting and filtering Excel data from web based platforms takes time, especially if it’s required on a daily or weekly basis. The filtering part of standardized files is also associated with human error. SeoTools solves this with saving and loading of “Configurations”:

Next time, just load a previous configuration and you will get classifications based on the same settings as last time.

Use Formula Mode to supercharge your classification

The beauty of combining uClassify with Excel is the ability to create large numbers of requests automatically. Instead of populating cells with values, select “Formula” before Inserting the results:

Next, you can change the formula to reference a cell and the uClassify Connector will generate results based on the value or text in that cell.

In the following example, company A has been mentioned 100 times on Twitter in the last week and we want to determine the text Language and Sentiment for these tweets.

First, select the Text Language Classifier and enter a random character in the Input field (we will change this in the formula to reference the tweets). Also, don’t forget select “Exclude headers in result” since we only want the values for each row.

When the formula has been inserted in cell C2, change the input “y” to B2, and SeoTools will return the language with the highest probability. Repeat the same steps for the Sentiment classifier, but insert it in cell D2. It should look like this:

To get the results for all rows, select cell C2 and D2 and drag the formula down and SeoTools will generate the classifications for all tweets. In the example below, we’ve started on row 16 to illustrate the results:

Do you want to try it with your uClassify account?

⦁ Sign up for a 14-Day Trial and follow the instructions to download and install the latest version of SeoTools.

⦁ Register your access key under “Upgrade to Pro” and access uClassify in the Connectors menu:

⦁ Next, go to API keys in the top menu of your uClassify account and copy the Read key

⦁ Finally, copy your API-key and paste it in the “Options” menu:

The complete documentation of the uClassify Connector features can be found here.

If you have any questions, feedback, or suggestions about ways to improve the Connector, please contact victor@seotoolsforexcel.com.

Translation API

We get a lot of requests for classifiers in different languages and as a next step we are building a translation API. The idea is to have an affordable in-house machine translation service that can quickly translate requests to the classifier language, classify the request and send back the response. Since the majority of classifiers are in English, the primary focus will be to target English.

Initially we support French, Spanish and Swedish to English translations.

Translation demo

The API is accessible with your ordinary API read key and a GET/POST REST protocol.

You can test and read all about the translation API here.

Hello world

Please don’t hesitate to report any weirdness to me!

IAB taxonomy classifier

Upon popular request I’ve built a new topics classifier based on the IAB taxonomy. EDIT: We also support the IAB Taxonomy V2 now.

The classifier has two levels of depth, a main category (sports, science…) and a sub category (soccer, physics…). In total there are about 360 different classes following the IAB Quality Assurance Guidelines (QAG) Taxonomy specification.

uClassify interface to IAB
uClassify interface to IAB

You can try the online demo here.

Class name format

The class names are composed of 4 parts separated by an underscore, with the following structure:

main topic_sub topic_main id_sub id
home and garden_flowers_5_4
sports_climbing_17_3
sports_volleyball_17_7

The last two ids are the IAB ids, this will make it easier for users tho map and integrate the result.

With a free uClassify account you can make 1000 free calls per day, if you need more there are affordable options from 9€ per month. You can sign up here.

List of topics

IAB12 News and IAB24 Uncategorized is not supported.

IAB1 Arts & Entertainment
IAB1-1 Books & Literature
IAB1-2 Celebrity Fan/Gossip
IAB1-3 Fine Art
IAB1-4 Humor
IAB1-5 Movies
IAB1-6 Music
IAB1-7 Television

IAB2 Automotive
IAB2-1 Auto Parts
IAB2-2 Auto Repair
IAB2-3 Buying/Selling Cars
IAB2-4 Car Culture
IAB2-5 Certified Pre-Owned
IAB2-6 Convertible
IAB2-7 Coupe
IAB2-8 Crossover
IAB2-9 Diesel
IAB2-10 Electric Vehicle
IAB2-11 Hatchback
IAB2-12 Hybrid
IAB2-13 Luxury
IAB2-14 Minivan
IAB2-15 Motorcycles
IAB2-16 Off-Road Vehicles
IAB2-17 Performance Vehicles
IAB2-18 Pickup
IAB2-19 Road-Side Assistance
IAB2-20 Sedan
IAB2-21 Trucks & Accessories
IAB2-22 Vintage Cars
IAB2-23 Wagon

IAB3 Business
IAB3-1 Advertising
IAB3-2 Agriculture
IAB3-3 Biotech/Biomedical
IAB3-4 Business Software
IAB3-5 Construction
IAB3-6 Forestry
IAB3-7 Government
IAB3-8 Green Solutions
IAB3-9 Human Resources
IAB3-10 Logistics
IAB3-11 Marketing
IAB3-12 Metals

IAB4 Careers
IAB4-1 Career Planning
IAB4-2 College
IAB4-3 Financial Aid
IAB4-4 Job Fairs
IAB4-5 Job Search
IAB4-6 Resume Writing/Advice
IAB4-7 Nursing
IAB4-8 Scholarships
IAB4-9 Telecommuting
IAB4-10 U.S. Military
IAB4-11 Career Advice

IAB5 Education
IAB5-1 7-12 Education
IAB5-2 Adult Education
IAB5-3 Art History
IAB5-4 College Administration
IAB5-5 College Life
IAB5-6 Distance Learning
IAB5-7 English as a 2nd Language
IAB5-8 Language Learning
IAB5-9 Graduate School
IAB5-10 Homeschooling
IAB5-11 Homework/Study Tips
IAB5-12 K-6 Educators
IAB5-13 Private School
IAB5-14 Special Education
IAB5-15 Studying Business

IAB6 Family & Parenting
IAB6-1 Adoption
IAB6-2 Babies & Toddlers
IAB6-3 Daycare/Pre School
IAB6-4 Family Internet
IAB6-5 Parenting – K-6 Kids
IAB6-6 Parenting teens
IAB6-7 Pregnancy
IAB6-8 Special Needs Kids
IAB6-9 Eldercare

IAB7 Health & Fitness
IAB7-1 Exercise
IAB7-2 ADD
IAB7-3 AIDS/HIV
IAB7-4 Allergies
IAB7-5 Alternative Medicine
IAB7-6 Arthritis
IAB7-7 Asthma
IAB7-8 Autism/PDD
IAB7-9 Bipolar Disorder
IAB7-10 Brain Tumor
IAB7-11 Cancer
IAB7-12 Cholesterol
IAB7-13 Chronic Fatigue Syndrome
IAB7-14 Chronic Pain
IAB7-15 Cold & Flu
IAB7-16 Deafness
IAB7-17 Dental Care
IAB7-18 Depression
IAB7-19 Dermatology
IAB7-20 Diabetes
IAB7-21 Epilepsy
IAB7-22 GERD/Acid Reflux
IAB7-23 Headaches/Migraines
IAB7-24 Heart Disease
IAB7-25 Herbs for Health
IAB7-26 Holistic Healing
IAB7-27 IBS/Crohn’s Disease
IAB7-28 Incest/Abuse Support
IAB7-29 Incontinence
IAB7-30 Infertility
IAB7-31 Men’s Health
IAB7-32 Nutrition
IAB7-33 Orthopedics
IAB7-34 Panic/Anxiety Disorders
IAB7-35 Pediatrics
IAB7-36 Physical Therapy
IAB7-37 Psychology/Psychiatry
IAB7-38 Senior Health
IAB7-39 Sexuality
IAB7-40 Sleep Disorders
IAB7-41 Smoking Cessation
IAB7-42 Substance Abuse
IAB7-43 Thyroid Disease
IAB7-44 Weight Loss
IAB7-45 Women’s Health

IAB8 Food & Drink
IAB8-1 American Cuisine
IAB8-2 Barbecues & Grilling
IAB8-3 Cajun/Creole
IAB8-4 Chinese Cuisine
IAB8-5 Cocktails/Beer
IAB8-6 Coffee/Tea
IAB8-7 Cuisine-Specific
IAB8-8 Desserts & Baking
IAB8-9 Dining Out
IAB8-10 Food Allergies
IAB8-11 French Cuisine
IAB8-12 Health/Low-Fat Cooking
IAB8-13 Italian Cuisine
IAB8-14 Japanese Cuisine
IAB8-15 Mexican Cuisine
IAB8-16 Vegan
IAB8-17 Vegetarian
IAB8-18 Wine

IAB9 Hobbies & Interests
IAB9-1 Art/Technology
IAB9-2 Arts & Crafts
IAB9-3 Beadwork
IAB9-4 Bird-Watching
IAB9-5 Board Games/Puzzles
IAB9-6 Candle & Soap Making
IAB9-7 Card Games
IAB9-8 Chess
IAB9-9 Cigars
IAB9-10 Collecting
IAB9-11 Comic Books
IAB9-12 Drawing/Sketching
IAB9-13 Freelance Writing
IAB9-14 Genealogy
IAB9-15 Getting Published
IAB9-16 Guitar
IAB9-17 Home Recording
IAB9-18 Investors & Patents
IAB9-19 Jewelry Making
IAB9-20 Magic & Illusion
IAB9-21 Needlework
IAB9-22 Painting
IAB9-23 Photography
IAB9-24 Radio
IAB9-25 Roleplaying Games
IAB9-26 Sci-Fi & Fantasy
IAB9-27 Scrapbooking
IAB9-28 Screenwriting
IAB9-29 Stamps & Coins
IAB9-30 Video & Computer Games
IAB9-31 Woodworking

IAB10 Home & Garden
IAB10-1 Appliances
IAB10-2 Entertaining
IAB10-3 Environmental Safety
IAB10-4 Gardening
IAB10-5 Home Repair
IAB10-6 Home Theater
IAB10-7 Interior Decorating
IAB10-8 Landscaping
IAB10-9 Remodeling & Construction

IAB11 Law, Government, & Politics
IAB11-1 Immigration
IAB11-2 Legal Issues
IAB11-3 U.S. Government Resources
IAB11-4 Politics
IAB11-5 Commentary

IAB12 News*
IAB12-1 International News
IAB12-2 National News
IAB12-3 Local News

IAB13 Personal Finance
IAB13-1 Beginning Investing
IAB13-2 Credit/Debt & Loans
IAB13-3 Financial News
IAB13-4 Financial Planning
IAB13-5 Hedge Fund
IAB13-6 Insurance
IAB13-7 Investing
IAB13-8 Mutual Funds
IAB13-9 Options
IAB13-10 Retirement Planning
IAB13-11 Stocks
IAB13-12 Tax Planning

IAB14 Society
IAB14-1 Dating
IAB14-2 Divorce Support
IAB14-3 Gay Life
IAB14-4 Marriage
IAB14-5 Senior Living
IAB14-6 Teens
IAB14-7 Weddings
IAB14-8 Ethnic Specific

IAB15 Science
IAB15-1 Astrology
IAB15-2 Biology
IAB15-3 Chemistry
IAB15-4 Geology
IAB15-5 Paranormal Phenomena
IAB15-6 Physics
IAB15-7 Space/Astronomy
IAB15-8 Geography
IAB15-9 Botany
IAB15-10 Weather

IAB16 Pets
IAB16-1 Aquariums
IAB16-2 Birds
IAB16-3 Cats
IAB16-4 Dogs
IAB16-5 Large Animals
IAB16-6 Reptiles
IAB16-7 Veterinary Medicine

IAB17 Sports
IAB17-1 Auto Racing
IAB17-2 Baseball
IAB17-3 Bicycling
IAB17-4 Bodybuilding
IAB17-5 Boxing
IAB17-6 Canoeing/Kayaking
IAB17-7 Cheerleading
IAB17-8 Climbing
IAB17-9 Cricket
IAB17-10 Figure Skating
IAB17-11 Fly Fishing
IAB17-12 Football
IAB17-13 Freshwater Fishing
IAB17-14 Game & Fish
IAB17-15 Golf
IAB17-16 Horse Racing
IAB17-17 Horses
IAB17-18 Hunting/Shooting
IAB17-19 Inline Skating
IAB17-20 Martial Arts
IAB17-21 Mountain Biking
IAB17-22 NASCAR Racing
IAB17-23 Olympics
IAB17-24 Paintball
IAB17-25 Power & Motorcycles
IAB17-26 Pro Basketball
IAB17-27 Pro Ice Hockey
IAB17-28 Rodeo
IAB17-29 Rugby
IAB17-30 Running/Jogging
IAB17-31 Sailing
IAB17-32 Saltwater Fishing
IAB17-33 Scuba Diving
IAB17-34 Skateboarding
IAB17-35 Skiing
IAB17-36 Snowboarding
IAB17-37 Surfing/Body-Boarding
IAB17-38 Swimming
IAB17-39 Table Tennis/Ping-Pong
IAB17-40 Tennis
IAB17-41 Volleyball
IAB17-42 Walking
IAB17-43 Waterski/Wakeboard
IAB17-44 World Soccer

IAB18 Style & Fashion
IAB18-1 Beauty
IAB18-2 Body Art
IAB18-3 Fashion
IAB18-4 Jewelry
IAB18-5 Clothing
IAB18-6 Accessories

IAB19 Technology & Computing
IAB19-1 3-D Graphics
IAB19-2 Animation
IAB19-3 Antivirus Software
IAB19-4 C/C++
IAB19-5 Cameras & Camcorders
IAB19-6 Cell Phones
IAB19-7 Computer Certification
IAB19-8 Computer Networking
IAB19-9 Computer Peripherals
IAB19-10 Computer Reviews
IAB19-11 Data Centers
IAB19-12 Databases
IAB19-13 Desktop Publishing
IAB19-14 Desktop Video
IAB19-15 Email
IAB19-16 Graphics Software
IAB19-17 Home Video/DVD
IAB19-18 Internet Technology
IAB19-19 Java
IAB19-20 JavaScript
IAB19-21 Mac Support
IAB19-22 MP3/MIDI
IAB19-23 Net Conferencing
IAB19-24 Net for Beginners
IAB19-25 Network Security
IAB19-26 Palmtops/PDAs
IAB19-27 PC Support
IAB19-28 Portable
IAB19-29 Entertainment
IAB19-30 Shareware/Freeware
IAB19-31 Unix
IAB19-32 Visual Basic
IAB19-33 Web Clip Art
IAB19-34 Web Design/HTML
IAB19-35 Web Search
IAB19-36 Windows

IAB20 Travel
IAB20-1 Adventure Travel
IAB20-2 Africa
IAB20-3 Air Travel
IAB20-4 Australia & New Zealand
IAB20-5 Bed & Breakfasts
IAB20-6 Budget Travel
IAB20-7 Business Travel
IAB20-8 By US Locale
IAB20-9 Camping
IAB20-10 Canada
IAB20-11 Caribbean
IAB20-12 Cruises
IAB20-13 Eastern Europe
IAB20-14 Europe
IAB20-15 France
IAB20-16 Greece
IAB20-17 Honeymoons/Getaways
IAB20-18 Hotels
IAB20-19 Italy
IAB20-20 Japan
IAB20-21 Mexico & Central America
IAB20-22 National Parks
IAB20-23 South America
IAB20-24 Spas
IAB20-25 Theme Parks
IAB20-26 Traveling with Kids
IAB20-27 United Kingdom

IAB21 Real Estate
IAB21-1 Apartments
IAB21-2 Architects
IAB21-3 Buying/Selling Homes

IAB22 Shopping
IAB22-1 Contests & Freebies
IAB22-2 Couponing
IAB22-3 Comparison
IAB22-4 Engines
IAB23 Religion & Spirituality
IAB23-1 Alternative Religions
IAB23-2 Atheism/Agnosticism
IAB23-3 Buddhism
IAB23-4 Catholicism
IAB23-5 Christianity
IAB23-6 Hinduism
IAB23-7 Islam
IAB23-8 Judaism
IAB23-9 Latter-Day Saints
IAB23-10 Pagan/Wiccan

IAB24 Uncategorized*

IAB25 Non-Standard Content
IAB25-1 Unmoderated UGC
IAB25-2 Extreme Graphic/Explicit Violence
IAB25-3 Pornography
IAB25-4 Profane Content
IAB25-5 Hate Content
IAB25-6 Under Construction
IAB25-7 Incentivized

IAB26 Illegal Content
IAB26-1 Illegal Content
IAB26-2 Warez
IAB26-3 Spyware/Malware
IAB26-4 Copyright Infringement

*  IAB12 News and IAB24 Uncategorized is not supported.

New URL REST API

The new URL REST API is our simplest to use API. You can copy paste the API url in the browser and get the result. The read api key and text are passed as parameters in the url.

Here is an example:
https://api.uclassify.com/v1/uClassify/Sentiment/classify/?readKey=YOUR_READ_API_KEY_HERE&text=I+am+so+happy+today

The result is simply a JSON dictionary with class=>probabilities:

{
"negative": 0.133639,
"positive": 0.866361
}

The only thing you need to do is to sign up for a free account (allows you 1000 calls per day) and replace ‘YOUR_READ_API_KEY_HERE’ with your read api key (found after you log in).

Here is the documentation for the api. The API is a simplified subset of our standard JSON REST API, you can read more the uClassify API differences here.

Happy classifying!

Improved classifier accuracy

I am very happy to announce this performance update that means that classification will have better accuracy than before.

When I was building a new topic classifier based on the IAB taxonomy I did notice some weird behaviour for classes with much less training data than the others. As I started to investigate this I was able to understand how the overall classification could be improved, not only those with low training data. After weeks of testing different implementations I found a few improvements that significantly gave better results on the test datasets.

In short classifiers are much more robust and less sensitive to imbalanced data.

This update doesn’t affect any api endpoints it will only give you better probabilities.

I might write a short post on the technicalities of this update.

JSON REST API

Since uClassify was launched back in 2008 we have seen many technological changes. Last year I modernised the site to use bootstrap as a foundation. Now it’s time to take the api to a more modern format.

Initially the uClassify api only had an XML endpoint, however over the years JSON has become more common and I have been getting more and more requests for REST endpoints with JSON format. The graph below shows google trends ‘json api’ (red) vs ‘xml api’ (blue)

XML API VS JSON API
XML API VS JSON API

Today I have launched a beta of the JSON REST API, changes may still occur but it will hopefully be finalised during Mars 2016.

You can find the documentation here, please feel free to leave feedback.

The old XML and URL API endpoints will of course continue to work as before.

Update with limit changes

The last major update has been running very smoothly, this is the first patch since!

Max request size limit increased

After feedback from the community I’ve increased the maximum allowed request size from 1MB to 3MB. I will monitor the servers and make sure this works fine. Maybe it’s possible to increase it further.

Max query string length increase

After the last update, when I updated the IIS server the default max request string url length was lower then previous. Thanks Liz who noticed this. I’ve not set the max size to 65kb.

Max free calls per day decreased

When I looked at the call statistics it didn’t make much sense to offer 5000 free calls per day. Most people aren’t even close to this, by lowering it to 1000 calls per day only a few will be affected, but most will not notice anything. This is also motivated by looking on competitors free limits and 1000 calls per day is still very generous. Let me know if you have any questions about this.

Bugs

Besides fixing some typos (thanks to everyone who reported) I’ve made it so you can’t publish untrained classifiers and fixed a so the front page buttons work better on small displays. I’ll also unpublished previous classifiers that are untrained and published.

Future

I am extremely happy with the performance of the new Sentiment classifier. It uses a new version of the classifier that looks at combinations of words among other things. Tests show that this type of classifier improves the performance of all tested data sets, therefore I am trying to figure out how to use it for all new classifiers, but it does require some work.

Let me know if you have any questions.

@jonkagstrom

Sentiment Analysis Api

A Sentiment analyzer tells you if a text it’s positive or negative. For example “I love the new Mad Max Fury road” (positive) or “i am not impressed by the bike” (negative). The Sentiment classifier hosted by uClassify is very popular so I decided to spend some time on improving it.

sentiment

The goal was to improve the classification accuracy, especially for short texts such as Twitter messages, Facebook statuses or other snippets while maintaining high quality results on texts with more information.

The old Sentiment classifier was built by 40k amazon product reviews. The straight forward way to improve a classifier is to add more data. Thanks to the Internet we were able to find multiple data sources we could train our classifier on. In fact it’s now trained on 2.8 million documents!

The results are good very good, the accuracy on large documents (reviews) went from about 75% to 83%. Tweets went from 63% to about 77%.

You can play with it here there is also an API available (free to use).

Datasets used are from sentiment-140 (twitter), amazon product reviews and rotten tomatoes.

Image by Anna Gathu