Sanguinetti et al. Italian Corpus of Hate Speech against Immigrants

Dataset of hateful tweets against immigrants, roma and muslims, sampled using keywords. 3,154 tweets were annotated by experts (2 per tweet) and then 2,855 annotated by CF (3+ per tweet). Crowdflower annotators are assessed for minimum reliability (65% required).

Experts (Cohen's K): hs (0.45), aggression (0.45), irony (0.32), stereotypes (0.41), intensity (0.21) CF (Krippendorf's Alpha): hs (0.38), aggression (0.25), irony (0.12), stereotypes (0.20), intensity (0.31)

Data and Resources

Additional Info

Field Value
Authors Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M.
Author contact email Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M.
Publication / paper reference Sanguinetti, M., Poletto, F., Bosco, C., Patti, V. and Stranisci, M., 2018. An Italian Twitter Corpus of Hate Speech against Immigrants. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA).
Publication / paper link https://www.aclweb.org/anthology/L18-1443
Dataset about page https://github.com/msang/hate-speech-corpus
Language(s) covered Italian
Source data platform(s) Twitter
Annotation schema description Binary target: Immigrants, Roma and Muslims [combined]. Multi-thematic categories are identified. Hate is also measured by intensity, making it hiearchical (Hate: no/yes, Aggressiveness: no/weak/strong, Offensiveness: no/weak/strong, Irony: no/yes, Stereotype: no/yes, Incitement degree: 0-4)
Phenomena annotated Hate
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 6,009 Tweets
Proportion of positive/abusive instances 0.13
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active