Alakrot et al. Dataset Construction for the Detection of Anti-Social Behaviour from YouTube

Datasets contain offensive comments from YouTube. The data was sampled from 2015 to 2017 and collected in July 2017. Channels with controversial comments about celebrities were used to sample. They sampled down from 167,549 collected comments, which they reduced to 16,000 for labelling as these were from one of 9 highly offensive channels. The annotation was undertaken by 3 annotators and the final label decision made by majority voting. Inter-annotator-agreement is 71% and pairwise Kappa: 0.698, 0.579, 0.512.

Data and Resources

Additional Info

Field Value
Authors Alakrot, A., Murray, L. and Nikolov, N.
Author contact email Alakrot, A., Murray, L. and Nikolov, N.
Publication / paper reference Alakrot, A., Murray, L. and Nikolov, N., 2018. Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic. Procedia Computer Science, 142, pp.174-181.
Publication / paper link https://www.sciencedirect.com/science/article/pii/S1877050918321756
Dataset about page https://onedrive.live.com/?authkey=!ACDXj_ZNcZPqzy0&id=6EF6951FBF8217F9!105&cid=6EF6951FBF8217F9
Language(s) covered Arabic
Source data platform(s) YouTube
Annotation schema description Binary
Phenomena annotated Incivility
Level of instances Single comment / post
Data statement link N/A
Total umber of instances in dataset 15,050
Proportion of positive/abusive instances 0.39
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active