Qian et al. from Dataset for Learning Intervene in Online Hate Speech Gab and Reddit

Dataset of hateful/ not hateful posts in the context of conversations from Reddit and Gab. The data is annotated through crowd-sourcing with Amazon Mechanical Turk with annotators from English-speaking countries (926 different workers). 2/3 agreements were required to label a post as hate speech. The content was sampled filtering for relevant subreddits and entries filtered by hateful keywords. Conversations around the filtered comments were reconstructed with up to 20 comments in total.

Data and Resources

Additional Info

Field Value
Authors Qian, J., Bethke, A., Belding, E. and Yang Wang, W.
Author contact email Qian, J., Bethke, A., Belding, E. and Yang Wang, W.
Publication / paper reference Qian, J., Bethke, A., Belding, E. and Yang Wang, W., 2019. A Benchmark Dataset for Learning to Intervene in Online Hate Speech. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1909.04251.pdf
Dataset about page https://github.com/jing-qian/A-Benchmark-Dataset-for-Learning-to-Intervene-in-Online-Hate-Speech
Language(s) covered English
Source data platform(s) Reddit,Gab
Annotation schema description Binary (hate/not)
Phenomena annotated Hate Speech
Level of instances Conversation thread
Data statement link N/A
Total umber of instances in dataset Gab: 33,776 posts/ 11,825 conversations, Reddit: 22,324 posts/ 5,020 conversations
Proportion of positive/abusive instances Gab: 0.43, Reddit: 0.24
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active