Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter

Main Article Content

Muhammad Ilham Abidin
Khairil Anwar Notodiputro
Bagus Sartono

Abstract

Efforts from the police to address hate speech on social media such as Twitter will not be sufficient to rely solely on manual checks. Therefore, it is necessary to use statistical modelling like the classification model to detect hate speech automatically. Classification is a type of predictive modelling to produce accurate predictions based on labelled data. Generally, the available data are usually unlabelled implying that the labelling process needs to be done beforehand. Data labelling is time consuming, high cost, and often fails to produce correct labels. This research aims to improve the performances of classification models by adding a small amount of data through the so called active learning method. The results showed that there was no significant difference in the performances of logistic regression and naïve bayes classification models in detecting hate speech. However, the results also showed that adding data through the active learning method substantially improved the logistics regression performance in detecting hate speech when compared to data addition based on a simple random sampling method. Therefore, the performances of classification models in detecting hate speech on Twitter could be improved by using an active learning method.

Downloads

Download data is not yet available.

Article Details

How to Cite
1.
Abidin MI, Notodiputro KA, Sartono B. Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter: Peningkatan Kinerja Model Klasifikasi dengan Pembelajaran Aktif dalam Mendeteksi Ujaran Kebencian di Twitter. IJSA [Internet]. 2021 Mar. 31 [cited 2025 Nov. 30];5(1):26-38. Available from: https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/699
Section
Articles

References

Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York (US): Springer.

Hu, R. (2011). Active Learning for Text Classification [disertasi]. Ireland (US): Dublin Institute of Technology.

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York (US): Springer.

Kuhn, M., Johnson, K. (2013). Applied Predictive Modeling. 1st ed. New York (US): Springer.

Manning, C. D., Raghavan, P., Schütze, H. (2008). An Introduction to Information Retrieval. England: Cambridge University Press Cambridge.

Medistiara, Y. (2017). Selama 2017 polri tangani 3.325 kasus ujaran kebencian. 2017. [Internet]. [diunduh 2019 Nov 10]; Tersedia pada: https://news.detik.com/berita/d-3790973/selama-2017-polri-tangani-3325-kasus-ujaran-kebencian.

Sudut Hukum. (2016). Tinjauan tentang ujaran kebencian (hate speech). [Internet]. [diunduh 2019 Nov 10]; Tersedia pada: https://suduthukum.com/2016/11/tinjauan-tentang-ujaran-kebencian-hate.html.

Ying, X. (2019). An Overview of Overfitting and its Solutions. Journal of Physics: Conference Series. 1168(2): 22.

Most read articles by the same author(s)

<< < 1 2 3 > >>