Comparing Self-Paced Ensemble and RUSBoost for Imbalanced Poverty Classification in West Java
Main Article Content
Abstract
Class imbalance remains a major challenge in classification modelling that frequently leads to biased predictive models. This study aimed to compare two ensemble techniques based on an undersampling approach, namely Self-Paced Ensemble and RUSBoost, for handling imbalanced classification in poverty identification in West Java. The results suggested that RUSBoost consistently outperformed Self-Paced Ensemble across the most critical metrics. It showed better balance in classification outcomes. When the objective is to maximize the identification of poor households, the default threshold in the RUSBoost model was prefered. On the other hand, if precision is prioritized due to limited resources, the Youden Index threshold offers a better alternative. Given the overall evaluation metrics, RUSBoost with the default threshold was suggested as the most reliable and well-balanced option among the compared models for classifying poor households in West Java under imbalanced data condition
Downloads
Article Details
References
Agusta, Z. P., & Adiwijaya, A. (2018). Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 5(1), 58. https://doi.org/10.26555/ijain.v5i1.255
Altalhan, M., Algarni, A., Turki-Hadj Alouane, M., Altalhan, M., Algarni, A., & Turki-Hadj Alouane, M. (2025). Imbalanced Data Problem in Machine Learning: A Review. IEEE Access, 13, 13686–13699. https://doi.org/10.1109/ACCESS. 2025.3531662
Badan Pusat Statistik. (2023a). Jumlah dan Persentase Penduduk Miskin Menurut Kabupaten/Kota di Provinsi Jawa Barat, 2023.
Badan Pusat Statistik. (2023b). Survei Sosial Ekonomi Nasional (SUSENAS) Tahun 2023. In BPS. BPS.
Bano, S., Zhi, W., Qiu, B., Raza, M., Sehito, N., Kamal, M. M., Aldehim, G., & Alruwais, N. (2024). Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis. Journal of Supercomputing, 80(7), 9848–9869. https://doi.org/10.1007/S11227-023-05828-6
Chen, Y., Du, X., & Guo, M. (2023). Self-paced ensemble for constructing an efficient robust high-performance classification model for detecting mineralization anomalies from geochemical exploration data. Ore Geology Reviews, 157, 105418. https://doi.org/10.1016/j.oregeorev.2023.105418
Fulazzaky, T., Saefuddin, A., & Soleh, A. M. (2024). Evaluating Ensemble Learning Techniques for Class Imbalance in Machine Learning: A Comparative Analysis of Balanced Random Forest, SMOTE-RF, SMOTEBoost, and RUSBoost. Scientific Journal of Informatics, 11(4), 969–980. https://doi.org/10.15294/SJI.V11I4.15937
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. In IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews (Vol. 42, Issue 4). https://doi.org/10.1109/TSMCC. 2011.2161285
Hasanin, T., Khoshgoftaar, T. M., Leevy, J. L., & Bauder, R. A. (2019). Severely imbalanced Big Data challenges: investigating data sampling approaches. Journal of Big Data, 6(1), 107. https://doi.org/10.1186/s40537-019-0274-4
Hassanzad, M., & Hajian-Tilaki, K. (2024). Methods of determining optimal cut-point of diagnostic biomarkers with application of clinical data in ROC analysis: an update review. BMC Medical Research Methodology 2024 24:1, 24(1), 84-. https://doi.org/10.1186/S12874-024-02198-2
Jeong, D. H., Kim, S. E., Choi, W. H., & Ahn, S. H. (2022). A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset. Healthcare, 10(7), 1255. https://doi.org/10.3390/HEALTHCARE10071255
Liu, Z., Cao, W., Gao, Z., Bian, J., Chen, H., Chang, Y., & Liu, T.-Y. (2020). Self-paced Ensemble for Highly Imbalanced Massive Data Classification. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 841–852. https://doi.org/10.1109/ICDE48307.2020.00078
McHugh, M. L. (2012). Interrater Reliability: The Kappa Statistic. Biochem Med (Zagreb), 22(3), 276–282.
Permatasari, Y., Sartono, B., & Permatasari, Y. (2016). Penanganan Masalah Kelas Tidak Seimbang Dengan Rusboost Dan Underbagging (Studi Kasus: Mahasiswa Drop Out SPs IPB Program Magister). [Institut Pertanian Bogor]. In Master Theses. http://repository.ipb.ac.id/handle/123456789/80118
Rahmadini, R. (Rina), & Santoso, B. J. (Bagus). (2025). Machine Learning-Based Prediction of Divorce Verdicts Using Posita Data and Imbalanced Data Handling: A Case Study in Padang Sidempuan. International Journal of Advances in Data and Information Systems, 6(2), 460–478. https://doi.org/10.59395/IJADIS. V6I2.1405
Ristea, N. C., & Ionescu, R. T. (2021). Self-paced ensemble learning for speech and audio classification. Interspeech, 2, 1276–1280. https://doi.org/10.21437/ INTERSPEECH. 2021-155
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 40(1). https://doi.org/10.1109/TSMCA.2009.2029559
Wang, L., Han, M., Li, X., Zhang, N., & Cheng, H. (2021). Review of Classification Methods on Unbalanced Data Sets. IEEE Access, 9, 64606–64628. https://doi.org/10.1109/ACCESS.2021.3074243
Zhang, Y., Chen, H. C., Du, Y., Chen, M., Liang, J., Li, J., Fan, X., & Yao, X. (2021). Power transformer fault diagnosis considering data imbalance and data set fusion. High Voltage, 6(3), 543–554. https://doi.org/10.1049/hve2.12059