Ousidhoum et al. Multilingual and Multi-Aspect Hate Speech Analysis on Twitter

Dataset of multi-aspect hate speech posts sampled from Twitter and labelled through crowd-sourcing (Amazon Mechanical Turk). Tweets were sampled by common slurs and demeaning expressions and derived ones from discussions around controversial topics. Annotators agreement using average Krippendorff are 0.153, 0.244 and 0.202 for English, French, Arabic respectively.

Data and Resources

Additional Info

Field Value
Authors Ousidhoum, N., Lin, Z., Zhang, H., Song, Y. and Yeung, D.
Author contact email Ousidhoum, N., Lin, Z., Zhang, H., Song, Y. and Yeung, D.
Publication / paper reference Ousidhoum, N., Lin, Z., Zhang, H., Song, Y. and Yeung, D., 2019. Multilingual and Multi-Aspect Hate Speech Analysis. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1908.11049.pdf
Dataset about page https://github.com/HKUST-KnowComp/MLMA_hate_speech
Language(s) covered Arabic,French,English
Source data platform(s) Twitter
Annotation schema description multiple, 3 or more classes
Phenomena annotated group-directed gender, sexual orientation, religion, disability
Level of instances Single comment / post
Data statement link N/A
Total umber of instances in dataset English: 5647, French: 4014, Arabic: 3353
Proportion of positive/abusive instances English: 0.76, French: 0.72, Arabic: 0.64
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active