Parikh multi-label sexism

10 annotators initially annotated 20,000 entries from the Everyday Sexism Project. At least two annotators labelled every entry. Average Cohen's kappa for the per-category pairs is 0.58. Then, cases where annotators disagreed were sent for further review. After cleaning, the final dataset comprises 13,023 entries. 23 categories were labelled, which were merged to 14 categories for machine learning. Entries with fewer than 7 words were excluded from the dataset.

Data and Resources

Additional Info

Field Value
Paper Authors Pulkit Parikh, Harika Abburi, Pinkesh Badjatiya, Radhika Krishnan, Niyati Chhaya, Manish Gupta, Vasudeva Varma
Author contact email Pulkit Parikh, Harika Abburi, Pinkesh Badjatiya, Radhika Krishnan, Niyati Chhaya, Manish Gupta, Vasudeva Varma
Publication / paper reference Parikh et al. (2019), "Multi-label Categorization of Accounts of Sexism using a Neural Framework", Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 1642-1652).
Publication / paper link https://doi.org/10.18653/v1/D19-1174
Publication Year
Dataset about page None
Approved
Language(s) covered English
Source data platform(s) Everyday Sexism Project
Phenomena annotated sexism
Level of instances Single comment / post
Data statement link None
Total number of instances in dataset 13,023
Proportion of positive/abusive instances 1
Submitter Bertie Vidgen
Submitter Email bvidgen@turing.ac.uk
State active