Mathur et al. Hinglish Sexism on Twitter Dataset

Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83.

Data and Resources

Additional Info

Field Value
Paper Authors Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Author contact email Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Publication / paper reference Mathur, P., Sawhney, R., Ayyar, M. and Shah, R., 2018. Did you offend me? Classification of Offensive Tweets in Hinglish Language. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Brussels, Belgium: Association for Computational Linguistics, pp.138-148.
Publication / paper link https://www.aclweb.org/anthology/W18-5118
Publication Year
Dataset about page https://github.com/pmathur5k10/Hinglish-Offensive-Text-Classification
Approved
Language(s) covered Hinglish
Source data platform(s) Twitter
Phenomena annotated Group-directed Sexism
Level of instances Single comment / post
Data statement link
Total number of instances in dataset 3,189 Tweets
Proportion of positive/abusive instances 0.65
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active