-
Founta et al. Hate and Abusive Speech on Twitter
Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to... -
Davidson et al. Crowd-sourced Hate Speech On Twitter Dataset
Dataset of hateful tweets sampled from Twitter using keywords. Labelled by Crowdflower, 3+ people annotated each tweet. Majority decision was taken with 92% annotator agreement. -
Fernando Hate Speech Dataset in Sinhalese from Twitter
Datasets contain racism and sexism in Sinhalese from Twitter. The data was sampled using pre-identified keywords from surveys and experts. The data was annotated by experts... -
Alfina et al. Hate Speech Detection in the Indonesian Language from Twitter
Dataset of hate speech in Indonesian, including hatred for religion, race, ethnicity, and gender. Posts from Twitter are sampled using relevant hashtags to contentious political... -
Mathur et al. Hinglish Sexism on Twitter Dataset
Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83. -
Fortuna et al. A Hierarchically-Labeled Portugese Hate Speech Dataset From Tw...
Dataset contains hate speech in Portuguese sampled from Twitter with 81 categories. The dataset is manually annotated for Hate Speech using a hierarchical structure of classes.... -
Mubarak et al. Abuse in Arabic Social Media Dataset
Dataset 1 includes offensive Arabic tweets sampled in March 2014 using obscene keywords and hashtags used for pornographic pages (available as a .txt file word list). Dataset 2... -
Mulki et al. A Levantine Twitter Dataset for Hate Speech and Abusive Language...
Dataset of Hate Speech and Abusive language. It is streamed from Twitter from March 2018 to February 2019 using relevant terms and from specified users, which have 100k... -
Albadi et al. Arabic Religious Hate on Twitter
Dataset of Arabic religious hate tweets sampled using neutral religious names as keywords. Annotation was crowdsourced using CrowdFlower, with a minimum of 3 annotations per... -
Jha and Mamidi Sexism on Twitter Dataset
Dataset of sexist tweets sampling based on benevolent sexist key phrases from which 712 tweets were manually selected by the authors, and validated by three non-activist... -
Waseem Racism and Sexism on Twitter Dataset
Dataset of racist and sexist tweets sampled from Twitter and labelled first by experts (including feminist and anti-racist activists), and then by CF amateur annotators who... -
Ousidhoum et al. Multilingual and Multi-Aspect Hate Speech Analysis on Twitter
Dataset of multi-aspect hate speech posts sampled from Twitter and labelled through crowd-sourcing (Amazon Mechanical Turk). Tweets were sampled by common slurs and demeaning... -
ElSherief et al. Hate Speech Instigators and Their Targets Dataset from Twitter
Dataset of hate speech and targets from Twitter collected through a multi-step classification process and annotated through CrowdFlower. 92.8% agreement among the annotators for... -
Turkish OffensEval
The Turkish dataset used in OffensEval 2020