Ljubešić et al. Slovene Moderated News Comments

Dataset of moderated news comments from Slovene RTV MCC. Comments were labelled by expert annotators based on the type of inappropriate content. Note that this data is encrypted.

Data and Resources

Additional Info

Field Value
Authors Ljubešić, N., Erjavec, T. and Fišer, D.
Author contact email Ljubešić, N., Erjavec, T. and Fišer, D.
Publication / paper reference Ljubešić, Nikola; Erjavec, Tomaž and Fišer, Darja, 2018, Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1201.
Publication / paper link https://www.aclweb.org/anthology/W18-5116
Dataset about page https://www.clarin.si/repository/xmlui/handle/11356/1201
Language(s) covered Slovene
Source data platform(s) rtvslo.si
Annotation schema description Binary (Deleted, Not), Flagged content (calumniation, discrimination, disrespect, hooliganism, insult, irony, swearing, threat and other)
Phenomena annotated Deleted + inappropriate content
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 7597560
Proportion of positive/abusive instances 0.08
Submitter Laila Sprejer
Submitter Email sprejerlaila@gmail.com
State active