-
ViHSD - Vietnamese Hate Speech Detection on Soical Media Texts
A large-scaled dataset for Vietnamese Hate Speech Detection on Social media texts. The dataset is crawled from Facebook and Youtube, and is manually annotated by human. -
Founta et al. Hate and Abusive Speech on Twitter
Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to... -
Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset
Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement. -
Breitfeller et al. Microaggressions Dataset
Dataset of self-reported microaggressions from microaggressions.com. 2,934 posts were collected targeted towards gender (1,314 posts), race (1,278 posts), sexuality (461 posts),... -
Fortuna et al. A Hierarchically-Labeled Portugese Hate Speech Dataset From Tw...
Dataset contains hate speech in Portuguese sampled from Twitter with 81 categories. The dataset is manually annotated for Hate Speech using a hierarchical structure of classes.... -
Ibrohim and Budi Abuse in Indonesian Twitter Dataset
Dataset of abusive tweets sampled with offensive terms. Tweets were annotated by 20 volunteer annotators and labelled by at least 3 people each. Only tweets with 100% annotators... -
Ibrohim and Budi Multi-label Hate Speech and Abusive Language Detection in In...
Dataset of hate speech and abusive language sampled from Twitter by using keywords and keyphrases. The dataset includes posts from March 2018 until September 2018 and integrated... -
Bretschneider and Peters Prejudice on Facebook Dataset
Dataset of Facebook posts and comments published in response to them from the Facebook pages “Pegida” (dataset 1), “Ich bin Patriot, aber kein Nazi” (“I’m a patriot, not a... -
Albadi et al. Arabic Religious Hate on Twitter
Dataset of Arabic religious hate tweets sampled using neutral religious names as keywords. Annotation was crowdsourced using CrowdFlower, with a minimum of 3 annotations per... -
Waseem Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who... -
Ross et al. Hate Speech Against Refugees
Dataset of German annotated corpus of tweets regarding refugees in Germany. Tweets were sampled using 10 hateful hashtags and labelled by experts with 2 annotators per tweet.... -
Qian et al. from Dataset for Learning Intervene in Online Hate Speech Gab and...
Dataset of hateful/ not hateful posts in the context of conversations from Reddit and Gab. The data is annotated through crowd-sourcing with Amazon Mechanical Turk with... -
Waseem and Hovy Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled by a mix of expert annotators and activists. Tweets were sampled in 2016 over 2 months using keywords.... -
DKhate: Danish Hate Speech & Abusive Language data
Task description: Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other) Details of task:...