de Gibert et al. Hate Speech from a White Supremacy Forum Dataset

Hate speech dataset composed of thousands of sentences extracted from Stormfront, a white supremacist forum, manually labelled by experts. Annotator agreement for the 1st round of annotation (1,144 sentences) was 91%, 0.614 average Cohen's pairwise Kappa, 0.607 Fleiss' Kappa. For the 2nd round of annotation (1,018 sentences) agreement was 91%, 0.627 average Cohen's pairwise Kappa, 0.632 Fleiss' Kappa.

Data and Resources

Additional Info

Field Value
Authors de Gibert, O., Perez, N., Garcia-Pablos, A. and Cuadros, M.
Author contact email de Gibert, O., Perez, N., Garcia-Pablos, A. and Cuadros, M.
Publication / paper reference de Gibert, O., Perez, N., Garcia-Pablos, A. and Cuadros, M., 2018. Hate Speech Dataset from a White Supremacy Forum. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1809.04444.pdf
Dataset about page https://github.com/Vicomtech/hate-speech-dataset
Language(s) covered English
Source data platform(s) Stormfront
Annotation schema description Ternary (Hate, Relation, Not)
Phenomena annotated Hate (defined at a general level to include all targeted groups/identities)
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 9,916 sentences
Proportion of positive/abusive instances 0.11
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active