-
Founta et al. Hate and Abusive Speech on Twitter
Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to... -
Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset
Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement. -
Fernando Hate Speech Dataset in Sinhalese from Twitter
Datasets contain racism and sexism in Sinhalese from Twitter. The data was sampled using pre-identified keywords from surveys and experts. The data was annotated by experts... -
Caselli et al. Implicit/Explicit Expansion on OLID
This dataset expands the OLID/OffensEval (OLID (Zampieri et al., 2019a), Offensive Language Identification Dataset) by adding the explicitness of the message. The OLID data was... -
Alfina et al. Hate Speech Detection in the Indonesian Language from Twitter
Dataset of hate speech in Indonesian, including hatred for religion, race, ethnicity, and gender. Posts from Twitter are sampled using relevant hashtags to contentious political... -
Mathur et al. Hinglish Sexism on Twitter Dataset
Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83. -
Fortuna et al. A Hierarchically-Labeled Portugese Hate Speech Dataset From Tw...
Dataset contains hate speech in Portuguese sampled from Twitter with 81 categories. The dataset is manually annotated for Hate Speech using a hierarchical structure of classes.... -
Mubarak et al. Abuse in Arabic Social Media Dataset
Dataset 1 includes offensive Arabic tweets sampled in March 2014 using obscene keywords and hashtags used for pornographic pages (available as a .txt file word list). Dataset 2... -
Ibrohim and Budi Abuse in Indonesian Twitter Dataset
Dataset of abusive tweets sampled with offensive terms. Tweets were annotated by 20 volunteer annotators and labelled by at least 3 people each. Only tweets with 100% annotators... -
Ibrohim and Budi Multi-label Hate Speech and Abusive Language Detection in In...
Dataset of hate speech and abusive language sampled from Twitter by using keywords and keyphrases. The dataset includes posts from March 2018 until September 2018 and integrated... -
Mulki et al. A Levantine Twitter Dataset for Hate Speech and Abusive Language...
Dataset of Hate Speech and Abusive language. It is streamed from Twitter from March 2018 to February 2019 using relevant terms and from specified users, which have 100k... -
Albadi et al. Arabic Religious Hate on Twitter
Dataset of Arabic religious hate tweets sampled using neutral religious names as keywords. Annotation was crowdsourced using CrowdFlower, with a minimum of 3 annotations per... -
Jha and Mamidi Sexism on Twitter Dataset
Dataset of sexist tweets sampling based on benevolent sexist key phrases from which 712 tweets were manually selected by the authors, and validated by three non-activist... -
Waseem Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who... -
Ross et al. Hate Speech Against Refugees
Dataset of German annotated corpus of tweets regarding refugees in Germany. Tweets were sampled using 10 hateful hashtags and labelled by experts with 2 annotators per tweet.... -
Sanguinetti et al. Italian Corpus of Hate Speech against Immigrants
Dataset of hateful tweets against immigrants, roma and muslims, sampled using keywords. 3,154 tweets were annotated by experts (2 per tweet) and then 2,855 annotated by CF (3+... -
Waseem and Hovy Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled by a mix of expert annotators and activists. Tweets were sampled in 2016 over 2 months using keywords.... -
HASOC 2019: Hate Speech and Offensive Content Identification in Indo-European...
Hate Speech Dataset for Hindi, German and English. Three datasets sampled from Twitter and Facebook sampled by topics, hashtags, other keywords and the timeline of users (last... -
Ousidhoum et al. Multilingual and Multi-Aspect Hate Speech Analysis on Twitter
Dataset of multi-aspect hate speech posts sampled from Twitter and labelled through crowd-sourcing (Amazon Mechanical Turk). Tweets were sampled by common slurs and demeaning... -
ElSherief et al. Hate Speech Instigators and Their Targets Dataset from Twitter
Dataset of hate speech and targets from Twitter collected through a multi-step classification process and annotated through CrowdFlower. 92.8% agreement among the annotators for... -
DKhate: Danish Hate Speech & Abusive Language data
Task description: Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other) Details of task:... -
Turkish OffensEval
The Turkish dataset used in OffensEval 2020