Mathur et al. Hinglish Sexism on Twitter Dataset

Dataset of Hinglish sexist Tweets sampled by crawling popular hashtags and well-known people. Tweets were labelled by experts, with an average Cohen's kappa of 0.83.

Data and Resources

Additional Info

Field Value
Authors Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Author contact email Mathur, P., Sawhney, R., Ayyar, M. and Shah, R
Publication / paper reference Mathur, P., Sawhney, R., Ayyar, M. and Shah, R., 2018. Did you offend me? Classification of Offensive Tweets in Hinglish Language. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Brussels, Belgium: Association for Computational Linguistics, pp.138-148.
Publication / paper link
Dataset about page
Language(s) covered Hinglish
Source data platform(s) Twitter
Annotation schema description Hierarchy (Not Offensive, Abusive, Hate)
Phenomena annotated Group-directed Sexism
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 3,189 Tweets
Proportion of positive/abusive instances 0.65
Submitter Laila Sprejer
Submitter Email
State active