Caselli et al. Implicit/Explicit Expansion on OLID

This dataset expands the OLID/OffensEval (OLID (Zampieri et al., 2019a), Offensive Language Identification Dataset) by adding the explicitness of the message. The OLID data was retrieved using a keyword approach through Twitter API, and the annotation has been conducted via Figure Eight, a crowdsourcing platform.

Data and Resources

Additional Info

Field Value
Authors Tommaso Caselli, Valerio Basile, Jelena Mitrovic, Inga Kartoziya, Michael Granitzer
Author contact email Tommaso Caselli, Valerio Basile, Jelena Mitrovic, Inga Kartoziya, Michael Granitzer
Publication / paper reference Caselli, T., Basile, V., Mitrovic, J., Kartoziya, I., & Granitzer, M. (2020). I Feel Offended, Don't Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language. LREC.
Publication / paper link https://www.aclweb.org/anthology/2020.lrec-1.760.pdf
Dataset about page https://github.com/tommasoc80/AbuseEval
Language(s) covered English
Source data platform(s) Twitter
Annotation schema description Ternary (explicit, implicit, not abusive)
Phenomena annotated Offensive language
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 4640
Proportion of positive/abusive instances Explicit: 0.66
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active