Mubarak et al. Abuse in Arabic Social Media Dataset
Data and Resources
-
Abuse in Arabic TwitterXLSX
Dataset of 1,100 labelled tweets.
-
Abuse in Arabic Al JazeeraXLSX
Dataset of 32000 labelled comments.
-
Obscene WordsTXT
Obscene words extracted from tweets, manually assessed.
Additional Info
Field | Value |
---|---|
Paper Authors | Mubarak, H., Darwish, K. and Magdy, W. |
Author contact email | Mubarak, H., Darwish, K. and Magdy, W. |
Publication / paper reference | Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56. |
Publication / paper link | https://www.aclweb.org/anthology/W17-3008 |
Publication Year | |
Dataset about page | https://alt.qcri.org/~hmubarak/offensive/ |
Approved | |
Language(s) covered | Arabic |
Source data platform(s) | Twitter,AlJazeera |
Phenomena annotated | Incivility |
Level of instances | Single comment / post |
Data statement link | |
Total number of instances in dataset | 1,100; 32,000 |
Proportion of positive/abusive instances | 0.59; 0.81 |
Submitter | Laila Sprejer |
Submitter Email | sprejerlaila@gmail.com |
State | active |