Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset

Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement.

Data and Resources

Additional Info

Field Value
Authors Davidson, T., Warmsley, D., Macy, M. and Weber, I.
Author contact email Davidson, T., Warmsley, D., Macy, M. and Weber, I.
Publication / paper reference Davidson, T., Warmsley, D., Macy, M. and Weber, I., 2017. Automated Hate Speech Detection and the Problem of Offensive Language. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1703.04009.pdf
Dataset about page https://github.com/t-davidson/hate-speech-and-offensive-language
Language(s) covered English
Source data platform(s) Twitter
Annotation schema description Hierarchy (Hate, Offensive, Neither)
Phenomena annotated Hate speech, offensive languge
Level of instances Single comment / post
Data statement link https://github.com/t-davidson/hate-speech-and-offensive-language/tree/master/data
Total umber of instances in dataset 24,802 tweets
Proportion of positive/abusive instances 0.06
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active