Waseem and Hovy Racism and Sexism on Twitter Dataset

Dataset of racist and sexist tweets sampled from Twitter and labelled by a mix of expert annotators and activists. Tweets were sampled in 2016 over 2 months using keywords. Annotator agreement is Kappa of 0.84.

Data and Resources

Additional Info

Field Value
Authors Waseem, Z. and Hovy, D.
Author contact email Waseem, Z. and Hovy, D.
Publication / paper reference Waseem, Z. and Horvy, D., 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In: Proceedings of the NAACL Student Research Workshop. San Diego, California: Association for Computational Linguistics, pp.88-93.
Publication / paper link https://www.aclweb.org/anthology/N16-2013
Dataset about page https://github.com/ZeerakW/hatespeech
Language(s) covered English
Source data platform(s) Twitter
Annotation schema description 3-topic (Sexist, Racist, Not)
Phenomena annotated Racism, Sexism
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 16,914 Tweets
Proportion of positive/abusive instances 0.32
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active