Mubarak et al. Abuse in Arabic Social Media Dataset

Dataset 1 includes offensive Arabic tweets sampled in March 2014 using obscene keywords and hashtags used for pornographic pages (available as a .txt file word list). Dataset 2 includes deleted comments from an Arabic social media (Al Jazeera). Data was labelled using CrowdFlower by 3 people each, with 85% and 87% of agreement respectively.

Data and Resources

Additional Info

Field Value
Paper Authors Mubarak, H., Darwish, K. and Magdy, W.
Author contact email Mubarak, H., Darwish, K. and Magdy, W.
Publication / paper reference Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56.
Publication / paper link
Publication Year
Dataset about page
Language(s) covered Arabic
Source data platform(s) Twitter,AlJazeera
Phenomena annotated Incivility
Level of instances Single comment / post
Data statement link
Total number of instances in dataset 1,100; 32,000
Proportion of positive/abusive instances 0.59; 0.81
Submitter Laila Sprejer
Submitter Email
State active