DKhate: Danish Hate Speech & Abusive Language data

  • Task description: Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other)
  • Details of task: Group-directed + Person-directed
  • Size of dataset: 3,600
  • Percentage abusive: 0.12
  • Language: Danish
  • Level of annotation: Posts
  • Platform: Twitter, Reddit, newspaper comments
  • Medium: Text

Data and Resources

Additional Info

Field Value
Paper Authors Leon Derczynski, Gudbjartur Sigurbergsson
Author contact email Leon Derczynski, Gudbjartur Sigurbergsson
Publication / paper reference Offensive Language and Hate Speech Detection for Danish, LREC 2020
Publication / paper link https://www.aclweb.org/anthology/2020.lrec-1.430/
Publication Year
Dataset about page https://figshare.com/articles/dataset/Danish_Hate_Speech_Abusive_Language_data/12220805
Approved
Language(s) covered Danish
Source data platform(s) Twitter,Reddit,News comments
Phenomena annotated Group-directed, Person-directed
Level of instances Single comment / post
Data statement link
Total number of instances in dataset 3600
Proportion of positive/abusive instances 0.12
Submitter
Submitter Email