Wulczyn et al. Personal Attacks on Wikipedia Dataset

Dataset of hateful Wikipedia comments. The sampling of the data was a combination of random + oversampled on banned comments. Annotation was crowdsourced, and each comment was labelled by 10 annotators. Annotator agreement was Krippendorf Alpha of 0.45.

Authors Wulczyn, E., Thain, N. and Dixon, L
Author contact email Wulczyn, E., Thain, N. and Dixon, L
Publication / paper reference Wulczyn, E., Thain, N. and Dixon, L., 2017. Ex Machina: Personal Attacks Seen at Scale. ArXiv,.
Publication / paper link https://arxiv.org/pdf/1610.08914
Dataset about page https://meta.wikimedia.org/wiki/Research:Detox/Data_Release
Language(s) covered English
Source data platform(s) Wikipedia
Annotation schema description Binary (Personal attack, Not); Toxicity/healthiness judgement 5 points (-2 == very toxic, 0 == neutral, 2 == very heallthy); Aggression/friendliness judgement on a 5 point scale. (-2 == very aggressive, 0 == neutral, 3 == very friendly). Aggression also includes 'passive aggression'.
Phenomena annotated Person-directed attacks, toxicity, aggression
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 115,737 comments; 100,000; 160,000
Proportion of positive/abusive instances 0.12; NA; NA
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active