-
Wulczyn et al. Personal Attacks on Wikipedia Dataset
Dataset of hateful Wikipedia comments. The sampling of the data was a combination of random + oversampled on banned comments. Annotation was crowdsourced, and each comment was... -
Caselli et al. Implicit/Explicit Expansion on OLID
This dataset expands the OLID/OffensEval (OLID (Zampieri et al., 2019a), Offensive Language Identification Dataset) by adding the explicitness of the message. The OLID data was... -
Ljubešić et al. Slovene Moderated News Comments
Dataset of moderated news comments from Slovene RTV MCC. Comments were labelled by expert annotators based on the type of inappropriate content. Note that this data is encrypted. -
Ljubešić et al. Croatian Moderated News Comments
Dataset of moderated news comments from Croatian 24sata. Comments were labelled by expert annotators based on the type of inappropriate content. Note that this data is encrypted. -
Jha and Mamidi Sexism on Twitter Dataset
Dataset of sexist tweets sampling based on benevolent sexist key phrases from which 712 tweets were manually selected by the authors, and validated by three non-activist... -
Sanguinetti et al. Italian Corpus of Hate Speech against Immigrants
Dataset of hateful tweets against immigrants, roma and muslims, sampled using keywords. 3,154 tweets were annotated by experts (2 per tweet) and then 2,855 annotated by CF (3+... -
Turkish OffensEval
The Turkish dataset used in OffensEval 2020