ViHSD - Vietnamese Hate Speech Detection on Soical Media Texts

A large-scaled dataset for Vietnamese Hate Speech Detection on Social media texts. The dataset is crawled from Facebook and Youtube, and is manually annotated by human.

Data and Resources

Additional Info

Field Value
Authors Son T. Luu; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen
Author contact email Son T. Luu; Kiet Van Nguyen; Ngan Luu-Thuy Nguyen
Publication / paper reference Son T. Luu, Kiet Van Nguyen and Ngan Luu-Thuy Nguyen, A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts, ArXiv
Publication / paper link https://arxiv.org/abs/2103.11528
Dataset about page
Language(s) covered Vietnamese
Source data platform(s)
Annotation schema description 3-categories (clean, offensive, hate)
Phenomena annotated
Level of instances Single comment / post
Data statement link
Total umber of instances in dataset 33,400
Proportion of positive/abusive instances
Submitter Son T. Luu
Submitter Email sonlt@uit.edu.vn
State active