Wulczyn et al. Personal Attacks on Wikipedia Dataset

Dataset of hateful Wikipedia comments. The sampling of the data was a combination of random + oversampled on banned comments. Annotation was crowdsourced, and each comment was labelled by 10 annotators. Annotator agreement was Krippendorf Alpha of 0.45.

Data and Resources

Additional Info

Field Value
Paper Authors Wulczyn, E., Thain, N. and Dixon, L
Author contact email Wulczyn, E., Thain, N. and Dixon, L
Publication / paper reference Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1610.08914
Publication Year
Dataset about page https://meta.wikimedia.org/wiki/Research:Detox/Data_Release
Approved
Language(s) covered English
Source data platform(s) Wikipedia
Phenomena annotated Person-directed attacks, toxicity, aggression
Level of instances Single comment / post
Data statement link
Total number of instances in dataset 115,737 comments; 100,000; 160,000
Proportion of positive/abusive instances 0.12; NA; NA
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active