Comparing Self-Paced Ensemble and RUSBoost for Imbalanced Poverty Classification in West Java

Nur Andi Setiabudi; Bagus Sartono; Utami Dyah Syafitri; Komang Budi Aryasa

doi:10.29244/ijsa.v9i2p218-229

PDF

Published: Dec 30, 2025

DOI: https://doi.org/10.29244/ijsa.v9i2p218-229

Keywords:

ensemble learning imbalance classification RUSBoost Self-Paced Ensemble undersampling

Nur Andi Setiabudi

Study Program on Statistics and Data Science, IPB University, Indonesia

Bagus Sartono

Study Program on Statistics and Data Science, IPB University, Indonesia

Utami Dyah Syafitri

Study Program on Statistics and Data Science, IPB University, Indonesia

Komang Budi Aryasa

Divisi Digital Business and Technology, Telkom Indonesia

Abstract

Class imbalance remains a major challenge in classification modelling that frequently leads to biased predictive models. This study aimed to compare two ensemble techniques based on an undersampling approach, namely Self-Paced Ensemble and RUSBoost, for handling imbalanced classification in poverty identification in West Java. The results suggested that RUSBoost consistently outperformed Self-Paced Ensemble across the most critical metrics. It showed better balance in classification outcomes. When the objective is to maximize the identification of poor households, the default threshold in the RUSBoost model was prefered. On the other hand, if precision is prioritized due to limited resources, the Youden Index threshold offers a better alternative. Given the overall evaluation metrics, RUSBoost with the default threshold was suggested as the most reliable and well-balanced option among the compared models for classifying poor households in West Java under imbalanced data condition

Downloads

Download data is not yet available.

How to Cite

1.

Setiabudi NA, Sartono B, Syafitri UD, Aryasa KB. Comparing Self-Paced Ensemble and RUSBoost for Imbalanced Poverty Classification in West Java. IJSA [Internet]. 2025 Dec. 30 [cited 2026 Jan. 9];9(2):218-29. Available from: https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1333

Issue

Vol. 9 No. 2 (2025)

Section

Articles

References

Agusta, Z. P., & Adiwijaya, A. (2018). Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 5(1), 58. https://doi.org/10.26555/ijain.v5i1.255

Altalhan, M., Algarni, A., Turki-Hadj Alouane, M., Altalhan, M., Algarni, A., & Turki-Hadj Alouane, M. (2025). Imbalanced Data Problem in Machine Learning: A Review. IEEE Access, 13, 13686–13699. https://doi.org/10.1109/ACCESS. 2025.3531662

Badan Pusat Statistik. (2023a). Jumlah dan Persentase Penduduk Miskin Menurut Kabupaten/Kota di Provinsi Jawa Barat, 2023.

Badan Pusat Statistik. (2023b). Survei Sosial Ekonomi Nasional (SUSENAS) Tahun 2023. In BPS. BPS.

Bano, S., Zhi, W., Qiu, B., Raza, M., Sehito, N., Kamal, M. M., Aldehim, G., & Alruwais, N. (2024). Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis. Journal of Supercomputing, 80(7), 9848–9869. https://doi.org/10.1007/S11227-023-05828-6

Chen, Y., Du, X., & Guo, M. (2023). Self-paced ensemble for constructing an efficient robust high-performance classification model for detecting mineralization anomalies from geochemical exploration data. Ore Geology Reviews, 157, 105418. https://doi.org/10.1016/j.oregeorev.2023.105418

Fulazzaky, T., Saefuddin, A., & Soleh, A. M. (2024). Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost. Scientific Journal of Informatics, 11(4), 969–980. https://doi.org/10.15294/SJI.V11I4.15937

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. In IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (Vol. 42, Issue 4). https://doi.org/10.1109/TSMCC. 2011.2161285

Hasanin, T., Khoshgoftaar, T. M., Leevy, J. L., & Bauder, R. A. (2019). Severely imbalanced Big Data challenges: investigating data sampling approaches. Journal of Big Data, 6(1), 107. https://doi.org/10.1186/s40537-019-0274-4

Hassanzad, M., & Hajian-Tilaki, K. (2024). Methods of determining optimal cut-point of diagnostic biomarkers with application of clinical data in ROC analysis: an update review. BMC Medical Research Methodology 2024 24:1, 24(1), 84-. https://doi.org/10.1186/S12874-024-02198-2

Jeong, D. H., Kim, S. E., Choi, W. H., & Ahn, S. H. (2022). A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset. Healthcare, 10(7), 1255. https://doi.org/10.3390/HEALTHCARE10071255

Liu, Z., Cao, W., Gao, Z., Bian, J., Chen, H., Chang, Y., & Liu, T.-Y. (2020). Self-paced Ensemble for Highly Imbalanced Massive Data Classification. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 841–852. https://doi.org/10.1109/ICDE48307.2020.00078

McHugh, M. L. (2012). Interrater Reliability: The Kappa Statistic. Biochem Med (Zagreb), 22(3), 276–282.

Permatasari, Y., Sartono, B., & Permatasari, Y. (2016). Penanganan Masalah Kelas Tidak Seimbang Dengan Rusboost Dan Underbagging (Studi Kasus: Mahasiswa Drop Out SPs IPB Program Magister). [Institut Pertanian Bogor]. In Master Theses. http://repository.ipb.ac.id/handle/123456789/80118

Rahmadini, R. (Rina), & Santoso, B. J. (Bagus). (2025). Machine Learning-Based Prediction of Divorce Verdicts Using Posita Data and Imbalanced Data Handling: A Case Study in Padang Sidempuan. International Journal of Advances in Data and Information Systems, 6(2), 460–478. https://doi.org/10.59395/IJADIS. V6I2.1405

Ristea, N. C., & Ionescu, R. T. (2021). Self-paced ensemble learning for speech and audio classification. Interspeech, 2, 1276–1280. https://doi.org/10.21437/ INTERSPEECH. 2021-155

Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 40(1). https://doi.org/10.1109/TSMCA.2009.2029559

Wang, L., Han, M., Li, X., Zhang, N., & Cheng, H. (2021). Review of Classification Methods on Unbalanced Data Sets. IEEE Access, 9, 64606–64628. https://doi.org/10.1109/ACCESS.2021.3074243

Zhang, Y., Chen, H. C., Du, Y., Chen, M., Liang, J., Li, J., Fan, X., & Yao, X. (2021). Power transformer fault diagnosis considering data imbalance and data set fusion. High Voltage, 6(3), 543–554. https://doi.org/10.1049/hve2.12059

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)