Alakrot et al. Dataset Construction for the Detection of Anti-Social Behaviour from YouTube

Datasets contain offensive comments from YouTube. The data was sampled from 2015 to 2017 and collected in July 2017. Channels with controversial comments about celebrities were used to sample. They sampled down from 167,549 collected comments, which they reduced to 16,000 for labelling as these were from one of 9 highly offensive channels. The annotation was undertaken by 3 annotators and the final label decision made by majority voting. Inter-annotator-agreement is 71% and pairwise Kappa: 0.698, 0.579, 0.512.

Data and Resources

Additional Info

Field Value
Paper Authors Alakrot, A., Murray, L. and Nikolov, N.
Author contact email Alakrot, A., Murray, L. and Nikolov, N.
Publication / paper reference Alakrot, A., Murray, L. and Nikolov, N., 2018. Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic. Procedia Computer Science, 142, pp.174-181.
Publication / paper link https://www.sciencedirect.com/science/article/pii/S1877050918321756
Publication Year
Dataset about page https://onedrive.live.com/?authkey=!ACDXj_ZNcZPqzy0&id=6EF6951FBF8217F9!105&cid=6EF6951FBF8217F9
Approved
Language(s) covered Arabic
Source data platform(s) YouTube
Phenomena annotated Incivility
Level of instances Single comment / post
Data statement link N/A
Total number of instances in dataset 15,050
Proportion of positive/abusive instances 0.39
Submitter Philine Zeinert
Submitter Email phze@itu.dk
State active