Alfina et al. Hate Speech Detection in the Indonesian Language from Twitter

Dataset of hate speech in Indonesian, including hatred for religion, race, ethnicity, and gender. Posts from Twitter are sampled using relevant hashtags to contentious political events, the Jakarta Governor election of 2017. 40,000 tweets were collected, which after removing duplicates, left 1,100. It was collected from February until April 2017 and annotated by experts. Each tweet is annotated by 3 annotators of a group of 30 volunteers with diverse profiles based on religion, race and gender. Only tweets with total agreement remained in the dataset.

Data and Resources

Additional Info

Field Value
Authors Alfina, I., Mulia, R., Fanany, M. and Ekanata, Y.
Author contact email Alfina, I., Mulia, R., Fanany, M. and Ekanata, Y.
Publication / paper reference Alfina, I., Mulia, R., Fanany, M. and Ekanata, Y., 2017. Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study. In: International Conference on Advanced Computer Science and Information Systems. pp.233-238.
Publication / paper link https://ieeexplore.ieee.org/document/8355039
Dataset about page https://github.com/ialfina/id-hatespeech-detection
Language(s) covered Indonesian
Source data platform(s) Twitter
Annotation schema description Binary (Hate, Not)
Phenomena annotated group-directed hate
Level of instances Single comment / post
Data statement link N/A
Total umber of instances in dataset 713
Proportion of positive/abusive instances 0.36
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active