Mathur et al. Hinglish Sexism on Twitter Dataset

Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83.

Data and Resources

Additional Info

Field Value
Authors Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Author contact email Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Publication / paper reference Mathur, P., Sawhney, R., Ayyar, M. and Shah, R., 2018. Did you offend me? Classification of Offensive Tweets in Hinglish Language. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Brussels, Belgium: Association for Computational Linguistics, pp.138-148.
Publication / paper link https://www.aclweb.org/anthology/W18-5118
Dataset about page https://github.com/pmathur5k10/Hinglish-Offensive-Text-Classification
Language(s) covered Hinglish
Source data platform(s) Twitter
Annotation schema description Hierarchy (Not Offensive, Abusive, Hate)
Phenomena annotated Group-directed Sexism
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 3,189 Tweets
Proportion of positive/abusive instances 0.65
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active