Fernando Hate Speech Dataset in Sinhalese from Twitter

Datasets contain racism and sexism in Sinhalese from Twitter. The data was sampled using pre-identified keywords from surveys and experts. The data was annotated by experts (graduates of Department of Political Science, Faculty of Social Science, University of Kelaniya, K A D Thusitha Pradeep and D M M Ruwan Kumara).

Data and Resources

Additional Info

Field Value
Authors Renuka Piyumal Fernando
Author contact email Renuka Piyumal Fernando
Publication / paper reference N/A
Publication / paper link N/A
Dataset about page https://github.com/renuka-fernando/sinhalese_language_racism_detection
Language(s) covered Sinhalese
Source data platform(s) Twitter
Annotation schema description Ternary
Phenomena annotated sexism and racism
Level of instances Single comment / post
Data statement link N/A
Total umber of instances in dataset 1,411
Proportion of positive/abusive instances 0.23
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active