Ljubešić et al. Croatian Moderated News Comments

Dataset of moderated news comments from Croatian 24sata. Comments were labelled by expert annotators based on the type of inappropriate content. Note that this data is encrypted.

Data and Resources

Additional Info

Field Value
Authors Ljubešić, N., Erjavec, T. and Fišer, D.
Author contact email Ljubešić, N., Erjavec, T. and Fišer, D.
Publication / paper reference Ljubešić, N., Erjavec, T. and Fišer, D., 2018. Datasets of Slovene and Croatian Moderated News Comments. In: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Brussels, Belgium: Association for Computational Linguistics, pp.124-131.
Publication / paper link https://www.aclweb.org/anthology/W18-5116
Dataset about page http://hdl.handle.net/11356/1202
Language(s) covered Croatian
Source data platform(s) 24sata.hr
Annotation schema description Binary (Deleted, Not), Flagged content (calumniation, discrimination, disrespect, hooliganism, insult, irony, swearing, threat and other)
Phenomena annotated Deleted + inappropriate content
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 17042965
Proportion of positive/abusive instances 0.02
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active