Skip to content

Toggle service links

Sentiment Analysis for Arabizi: A multilingual jargon in Social Media
Mr Taha Tobaili

This event took place on 25th May 2017 at 12:00pm (11:00 GMT)
Knowledge Media Institute, Berrill Building, The Open University, Milton Keynes, United Kingdom, MK7 6AA

Arabizi is a portmanteau for the words Arabic and Englizi (meaning English), it is a linguistic phenomenon where Arab natives express their dialectal mother tongue in Latinscript text using alphanumeral to represent Arabic phonemes that are non-existent in Latin such as the term 7abibi (my darling) where the number 7 is used as a transcription for a voiceless fricative Arabic letter that sounds like a soft 'h'. Several researchers working in Arabic NLP have filtered out Arabizi text from their datasets due to the challenges associated with the nature of this texting language. In this talk, I will mention the challenges that makes sentiment analysis for Arabizi a non-trivial task. I will discuss a pilot case study on the percentage of using Arabizi in Twitter across 2 countries. I will demonstrate a method that we created to detect Arabizi from within multilingual streams of data. Finally, I will present the results of a lexicon-based sentiment analysis approach using SenZi, a novel Arabizi lexicon.‎


The webcast was open to 300 users
Creative Commons Licence KMi logo