Waseem Racism and Sexism on Twitter Dataset

Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who re-annotated them. No IRR tests for the Experts as they are treated 'as a single entity'. (139). Annotator agreement is Kappa of 0.57 for CF amateur annotations.

Data and Resources

Additional Info

Field Value
Authors Waseem, Z.
Author contact email Waseem, Z.
Publication / paper reference Waseem, Z., 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In: Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science. Copenhagen, Denmark: Association for Computational Linguistics, pp.138-142.
Publication / paper link https://pdfs.semanticscholar.org/3eeb/b7907a9b94f8d65f969f63b76ff5f643f6d3.pdf
Dataset about page https://github.com/ZeerakW/hatespeech
Language(s) covered English
Source data platform(s) Twitter
Annotation schema description Multi-topic (sexist, racist, not, both)
Phenomena annotated Group-directed racism, sexism
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 6,901 tweets (4,033 new, remaining from prior W/H paper)
Proportion of positive/abusive instances 0.16
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active