Founta et al. Hate and Abusive Speech on Twitter

Dataset of tweets collected from 30th March 2017 to 9th April 2017 with a boosted random sampling technique, by using text analysis and preliminary crowdsourcing rounds to design a model that can pre-select the tweets of the boosted set. Tweets were labeled by 20 crowdworkers.

Data and Resources

Additional Info

Field Value
Authors Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, Nicolas Kourtellis
Author contact email Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, Nicolas Kourtellis
Publication / paper reference
Publication / paper link https://arxiv.org/pdf/1802.00393.pdf
Dataset about page https://github.com/ENCASEH2020/hatespeech-twitter/
Language(s) covered English
Source data platform(s) Twitter
Annotation schema description Multi-topic (abusive, hateful, normal, spam)
Phenomena annotated Abuse
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 100.000
Proportion of positive/abusive instances Abusive (11%), hateful (7.5%), normal (59%), spam (22.5%)
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active