Sanguinetti et al. Italian Corpus of Hate Speech against Immigrants

Dataset of hateful tweets against immigrants, roma and muslims, sampled using keywords. 3,154 tweets were annotated by experts (2 per tweet) and then 2,855 annotated by CF (3+ per tweet). Crowdflower annotators are assessed for minimum reliability (65% required).

Experts (Cohen's K): hs (0.45), aggression (0.45), irony (0.32), stereotypes (0.41), intensity (0.21) CF (Krippendorf's Alpha): hs (0.38), aggression (0.25), irony (0.12), stereotypes (0.20), intensity (0.31)

Data and Resources

Additional Info

Field Value
Paper Authors Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M.
Author contact email Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M.
Publication / paper reference Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M., 2018. An Italian Twitter Corpus of Hate Speech against Immigrants. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).
Publication / paper link https://www.aclweb.org/anthology/L18-1443
Publication Year
Dataset about page https://github.com/msang/hate-speech-corpus
Approved
Language(s) covered Italian
Source data platform(s) Twitter
Phenomena annotated Hate
Level of instances Single comment / post
Data statement link
Total number of instances in dataset 6,009 Tweets
Proportion of positive/abusive instances 0.13
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active