Wulczyn et al. Personal Attacks on Wikipedia Dataset

Dataset of hateful Wikipedia comments. The sampling of the data was a combination of random + oversampled on banned comments. Annotation was crowdsourced, and each comment was labelled by 10 annotators. Annotator agreement was Krippendorf Alpha of 0.45.

Data and Resources

Additional Info

Field Value
Authors Wulczyn, E., Thain, N. and Dixon, L
Author contact email Wulczyn, E., Thain, N. and Dixon, L
Publication / paper reference Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1610.08914
Dataset about page https://meta.wikimedia.org/wiki/Research:Detox/Data_Release
Language(s) covered English
Source data platform(s) Wikipedia
Annotation schema description Binary (Personal attack, Not); Toxicity/healthiness judgement 5 points (-2 == very toxic, 0 == neutral, 2 == very heallthy); Aggression/friendliness judgement on a 5 point scale. (-2 == very aggressive, 0 == neutral, 3 == very friendly). Aggression also includes 'passive aggression'.
Phenomena annotated Person-directed attacks, toxicity, aggression
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 115,737 comments; 100,000; 160,000
Proportion of positive/abusive instances 0.12; NA; NA
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active