-
Kurrek et al. Towards a Comprehensive Taxonomy and Large-Scale Annotated Corp...
The dataset addresses discriminations across sexuality, ethnicity, and gender. Posts are sampled slur-based ("f*ggot","n*gger","tr*nny") from several subreddits from October... -
Founta et al. Hate and Abusive Speech on Twitter
Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to... -
Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset
Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement. -
Fernando Hate Speech Dataset in Sinhalese from Twitter
Datasets contain racism and sexism in Sinhalese from Twitter. The data was sampled using pre-identified keywords from surveys and experts. The data was annotated by experts... -
Breitfeller et al. Microaggressions Dataset
Dataset of self-reported microaggressions from microaggressions.com. 2,934 posts were collected targeted towards gender (1,314 posts), race (1,278 posts), sexuality (461 posts),... -
Alfina et al. Hate Speech Detection in the Indonesian Language from Twitter
Dataset of hate speech in Indonesian, including hatred for religion, race, ethnicity, and gender. Posts from Twitter are sampled using relevant hashtags to contentious political... -
Mathur et al. Hinglish Sexism on Twitter Dataset
Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83. -
Fortuna et al. A Hierarchically-Labeled Portugese Hate Speech Dataset From Tw...
Dataset contains hate speech in Portuguese sampled from Twitter with 81 categories. The dataset is manually annotated for Hate Speech using a hierarchical structure of classes.... -
Mubarak et al. Abuse in Arabic Social Media Dataset
Dataset 1 includes offensive Arabic tweets sampled in March 2014 using obscene keywords and hashtags used for pornographic pages (available as a .txt file word list). Dataset 2... -
Bretschneider and Peters Cyberbullying on WoW and LoL Forum Dataset
Dataset collected from the World of Warcraft (dataset 1) and League of Legends (dataset 2) forum. 20 topics were selected for each dataset based on offensive terms from... -
Mulki et al. A Levantine Twitter Dataset for Hate Speech and Abusive Language...
Dataset of Hate Speech and Abusive language. It is streamed from Twitter from March 2018 to February 2019 using relevant terms and from specified users, which have 100k... -
Bretschneider and Peters Prejudice on Facebook Dataset
Dataset of Facebook posts and comments published in response to them from the Facebook pages “Pegida” (dataset 1), “Ich bin Patriot, aber kein Nazi” (“I’m a patriot, not a... -
Albadi et al. Arabic Religious Hate on Twitter
Dataset of Arabic religious hate tweets sampled using neutral religious names as keywords. Annotation was crowdsourced using CrowdFlower, with a minimum of 3 annotations per... -
Jha and Mamidi Sexism on Twitter Dataset
Dataset of sexist tweets sampling based on benevolent sexist key phrases from which 712 tweets were manually selected by the authors, and validated by three non-activist... -
Waseem Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who... -
Gao and Huang Hate Speech on Fox News Dataset
Dataset of 1528 annotated comments from Fox News website, taken from 10 news articles. Comments were labelled by experts in two stages, with an annotator agreement of Cohen's... -
de Pelle and Moreira Offensive Comments in the Brazilian Web from News Platform
Dataset of offensive comments by collecting comments in the Brazilian Web from g1.gobo.com, the most accessed news site in Brazil. Sampled by comments about politics and sports... -
CONAN: Multilingual Dataset of Responses to Fight Online Hate Speech
Dataset of pairs islamophobic hate speech and counter-responses with 3 types of metadata: expert demographics, hate speech sub-topic, counter-narrative type. The dataset is... -
Ousidhoum et al. Multilingual and Multi-Aspect Hate Speech Analysis on Twitter
Dataset of multi-aspect hate speech posts sampled from Twitter and labelled through crowd-sourcing (Amazon Mechanical Turk). Tweets were sampled by common slurs and demeaning... -
ElSherief et al. Hate Speech Instigators and Their Targets Dataset from Twitter
Dataset of hate speech and targets from Twitter collected through a multi-step classification process and annotated through CrowdFlower. 92.8% agreement among the annotators for... -
Parikh multi-label sexism
10 annotators initially annotated 20,000 entries from the Everyday Sexism Project. At least two annotators labelled every entry. Average Cohen's kappa for the per-category pairs... -
Turkish OffensEval
The Turkish dataset used in OffensEval 2020