-
Kurrek et al. Towards a Comprehensive Taxonomy and Large-Scale Annotated Corp...
The dataset addresses discriminations across sexuality, ethnicity, and gender. Posts are sampled slur-based ("f*ggot","n*gger","tr*nny") from several subreddits from October... -
Wulczyn et al. Personal Attacks on Wikipedia Dataset
Dataset of hateful Wikipedia comments. The sampling of the data was a combination of random + oversampled on banned comments. Annotation was crowdsourced, and each comment was... -
Founta et al. Hate and Abusive Speech on Twitter
Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to... -
Multi-lingual of Dirty, Naughty, Obscene, and Otherwise Bad Words from Shutte...
The repo contains a list of words that Shutterstpck uses to filter results from our autocomplete server and recommendation engine. Can be installed in a npm project by: npm... -
YouTube Blacklist Words
YouTube Blacklist Words List includes; a list of unacceptable words, inappropriate words, a list of swear words, offensive words, curse words, insulting words, all cuss words,... -
WordPress Comment Blacklist Words
WordPress Comment Blacklist Words, WordPress Comment Moderation, and WordPress Comment Spam The WordPress Comment Blacklist Words/Phrases include; a list of swears, unacceptable... -
Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset
Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement. -
Caselli et al. Implicit/Explicit Expansion on OLID
This dataset expands the OLID/OffensEval (OLID (Zampieri et al., 2019a), Offensive Language Identification Dataset) by adding the explicitness of the message. The OLID data was... -
Breitfeller et al. Microaggressions Dataset
Dataset of self-reported microaggressions from microaggressions.com. 2,934 posts were collected targeted towards gender (1,314 posts), race (1,278 posts), sexuality (461 posts),... -
Bretschneider and Peters Cyberbullying on WoW and LoL Forum Dataset
Dataset collected from the World of Warcraft (dataset 1) and League of Legends (dataset 2) forum. 20 topics were selected for each dataset based on offensive terms from... -
Jha and Mamidi Sexism on Twitter Dataset
Dataset of sexist tweets sampling based on benevolent sexist key phrases from which 712 tweets were manually selected by the authors, and validated by three non-activist... -
Waseem Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who... -
Gao and Huang Hate Speech on Fox News Dataset
Dataset of 1528 annotated comments from Fox News website, taken from 10 news articles. Comments were labelled by experts in two stages, with an annotator agreement of Cohen's... -
CONAN: Multilingual Dataset of Responses to Fight Online Hate Speech
Dataset of pairs islamophobic hate speech and counter-responses with 3 types of metadata: expert demographics, hate speech sub-topic, counter-narrative type. The dataset is... -
Qian et al. from Dataset for Learning Intervene in Online Hate Speech Gab and...
Dataset of hateful/ not hateful posts in the context of conversations from Reddit and Gab. The data is annotated through crowd-sourcing with Amazon Mechanical Turk with... -
Waseem and Hovy Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled by a mix of expert annotators and activists. Tweets were sampled in 2016 over 2 months using keywords.... -
HASOC 2019: Hate Speech and Offensive Content Identification in Indo-European...
Hate Speech Dataset for Hindi, German and English. Three datasets sampled from Twitter and Facebook sampled by topics, hashtags, other keywords and the timeline of users (last... -
de Gibert et al. Hate Speech from a White Supremacy Forum Dataset
Hate speech dataset composed of thousands of sentences extracted from Stormfront, a white supremacist forum, manually labelled by experts. Annotator agreement for the 1st round... -
Ousidhoum et al. Multilingual and Multi-Aspect Hate Speech Analysis on Twitter
Dataset of multi-aspect hate speech posts sampled from Twitter and labelled through crowd-sourcing (Amazon Mechanical Turk). Tweets were sampled by common slurs and demeaning... -
ElSherief et al. Hate Speech Instigators and Their Targets Dataset from Twitter
Dataset of hate speech and targets from Twitter collected through a multi-step classification process and annotated through CrowdFlower. 92.8% agreement among the annotators for... -
Parikh multi-label sexism
10 annotators initially annotated 20,000 entries from the Everyday Sexism Project. At least two annotators labelled every entry. Average Cohen's kappa for the per-category pairs...