Ibrohim and Budi Abuse in Indonesian Twitter Dataset

Dataset of abusive tweets sampled with offensive terms. Tweets were annotated by 20 volunteer annotators and labelled by at least 3 people each. Only tweets with 100% annotators agreement were kept, which constituted 80.6% of data.

Data and Resources

Additional Info

Field Value
Authors Ibrohim, M. and Budi, I.
Author contact email Ibrohim, M. and Budi, I.
Publication / paper reference Ibrohim, M. and Budi, I., 2018. A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media. Procedia Computer Science, 135, pp.222-229.
Publication / paper link https://www.sciencedirect.com/science/article/pii/S1877050918314583
Dataset about page https://github.com/okkyibrohim/id-abusive-language-detection
Language(s) covered Indonesian
Source data platform(s) Twitter
Annotation schema description Hierarchical (Not abusive, Abusive but not offensive, Offensive)
Phenomena annotated Abuse + offensive language
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 2,016 Tweets
Proportion of positive/abusive instances 0.54
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active