Comparison of Chi-Square Automatic Interaction Detector (CHAID) and Random Forest Methods in the Classification of Household Poverty Status in Central Java

Perbandingan Metode Chi-Square Automatic Interaction Detector (CHAID) dan Random Forest dalam Klasifikasi Status Kemiskinan Rumah Tangga di Jawa Tengah

Authors

  • Fatkhul Izzati Department of Statistics, IPB University, Indonesia
  • Mohammad Masjkur Department of Statistics, IPB University, Indonesia
  • Farit Mochamad Afendi Department of Statistics, IPB University, Indonesia

DOI:

https://doi.org/10.29244/ijsa.v8i1p1-13

Keywords:

CHAID, poverty, random forest, SMOTE

Abstract

Central Java was in second position as the province with the highest number of poor people in Indonesia in March 2020. Poverty alleviation efforts have been carried out, but many are still not yet on target. The purpose of this study was to model the classification of household poverty status in Central Java using CHAID and random forest methods and compare the two methods. The data used in this study is data from the 2020 National Socioeconomic Survey (SUSENAS) conducted by the Central Bureau of Statistics (BPS) for Central Java. The number of poor households is much less than non-poor households. Therefore, Synthetic Minority Oversampling Technique (SMOTE) was performed to handle unbalanced data. The random forest method produced better classification performance than the CHAID method with accuracy, sensitivity, specificity, and AUC of 93,95%, 98,43%, 89,92%, and 0,9417, respectively. The important variables that build the random forest model are the floor area of the house, the age of the head of the household, cooking fuel, the place for the final disposal of feces, and ownership of the place to defecate.

Downloads

Download data is not yet available.

References

[BPS] Badan Pusat Statistik. 2020. Kemiskinan Provinsi Jawa Tengah Maret 2020 [internet]. Berita Resmi Statistik No. 46/07/33/Th. XIV [diunduh 2021 Jan 17]. Tersedia dari: https://jateng.bps.go.id/pressrelease/2020/07/15/1225/persentase-penduduk-miskin-maret-2020-naik-menjadi-11-41-persen--dibanding-september-2019--yang-sebesar-10-58-persen.html

Breiman L. 1996. Bagging predictors. Machine Learning. 24:123–140.

Breiman L. 2001. Random forest. Machine Learning. 45:5–32.

Breiman L, Cutler A. 2003. Manual on setting up, using, and understanding random forest v4.0 [internet]. [diunduh 2022 Sep 7]. Tersedia dari: https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 16:321–357.

Gorunescu F. 2011. Data Mining: Concept, Models and Techniques. Berlin (GER): Springer-Verlag Berlin Heidenberg.

Hidayat DS. 2016. Sistem penunjang keputusan untuk identifikasi kelelawar menggunakan random forest dan C5.0 [skripsi]. Bogor: Institut Pertanian Bogor.

James G, Witten D, Hastie T, Tibshirani R. 2013. An Introduction to Statistical Learning with Application in R. New York (US): Springer.

Kass GV. 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics. 29(2):119–127.

Kristiani YP, Safitri D, Ispriyanti D. 2015. Klasifikasi kelompok rumah tangga di Kabupaten Blora menggunakan multivariate adaptive regression spline (MARS) dan fuzzy k-nearest neighbor (FK-NN). Jurnal Gaussian. 4(4):1077–1085.

Nugraha H. 2017. Pendeteksian lalu lintas botnet berbasis jaringan dengan k-nearest neighbor dan random forest [skripsi]. Bogor: Institut Pertanian Bogor.

Nurpadilah W. 2019. Metode ensemble pada pohon klasifikasi tunggal untuk klasifikasi status kemiskinan rumah tangga di Provinsi Jawa Barat [skripsi]. Bogor: Institut Pertanian Bogor.

Nuzula L, Prahutama A, Hakim AR. 2020. Klasifikasi status kemiskinan rumah tangga dengan metode support vector machines (SVM) dan classification and regression trees (CART) menggunakan GUI R (studi kasus di Kabupaten Wonosobo tahun 2018). Jurnal Gaussian. 9(4):525–534. ISSN: 2339 2541.

Oktavia AD. 2018. Faktor-faktor yang berpengaruh dalam mendapatkan pekerjaan bagi lulusan statistika IPB dengan menggunakan metode CHAID (studi kasus: alumni Departemen Statistika IPB angkatan 48-50) [skripsi]. Bogor: Institut Pertanian Bogor.

Sartono B, Syafitri UD. 2010. Metode pohon gabungan: solusi pilihan untuk mengatasi kelemahan pohon regresi dan klasifikasi tunggal. Forum Statistika dan Komputasi. 15(1):1–7.

Sulviana V. 2018. Penggunaan metode CHAID (chi-squared automatic interaction detection) pada segmentasi tren penjualan berbagai jenis minuman ringann di Indonesia [skripsi]. Bogor: Institut Pertanian Bogor.

Sutton CD. 2005. Classification and regression trees, bagging, and boosting. Handbook of Statistics. 24:303–329. doi: 10.1016/S0169-7161(04)24011-1.

Utami TP. 2019. Penerapan metode random forest dalam menentukan status istitaah kesehatan jemaah haji (studi kasus: jemaah haji di Kecamatan Plered, Kabuaten Purwakarta) [skripsi]. Bogor: Institut Pertanian Bogor.

Yanthy M. 2013. Penentuan karakteristik kelancaran pembayaran kartu kredit menggunakan metode CHAID [skripsi]. Bogor: Institut Pertanian Bogor.

Downloads

Published

11-06-2024

How to Cite

Izzati, F., Masjkur, M., & Afendi, F. M. (2024). Comparison of Chi-Square Automatic Interaction Detector (CHAID) and Random Forest Methods in the Classification of Household Poverty Status in Central Java: Perbandingan Metode Chi-Square Automatic Interaction Detector (CHAID) dan Random Forest dalam Klasifikasi Status Kemiskinan Rumah Tangga di Jawa Tengah. Indonesian Journal of Statistics and Its Applications, 8(1), 1–13. https://doi.org/10.29244/ijsa.v8i1p1-13

Issue

Section

Articles

Most read articles by the same author(s)