Handling Unbalanced Data with SMOTE Algorithm for Unemployment Classification in Lima Puluh Kota Regency Using CART Method
DOI:
https://doi.org/10.29244/ijsa.v8i2p166-177Keywords:
AUC, CART, Lima Puluh Kota Regency, SMOTE, UnemploymentAbstract
Unemployment is a problem that occurs in the labor force, where high unemployment is caused by the low ability of the labor force. A region that is still experiencing unemployment problems in West Sumatera is Lima Puluh Kota Regency. Unemployment in Lima Puluh Kota Regency is caused by the low competence of human resources to fulfill employment market requirements. Based on the results of the Sakernas survey in August 2023, Lima Puluh Kota Regency has more employed labor force than unemployed labor force, so this results in unbalanced data. A method that can overcome unbalanced data is Synthetic Minority Oversampling Technique (SMOTE). SMOTE is a technique with addition of synthetic data in minority class so that the proportion is balanced. Data imbalance conditions need to be handled so as to improve the performance of the classification model. Classification and Regression Trees (CART) is a classification technique with a decision tree method that can obtain the characteristics of a classification. The purpose of this research is to compare the CART model before and after applying SMOTE which can be measured by comparing the highest Area Under Curve (AUC) value. The AUC value in the CART method before SMOTE applied has a value of 62.1% while the AUC value in the CART method after SMOTE applied has a value of 70.2%. Therefore, it can be concluded that the CART classification analysis after SMOTE applied is able to provide better performance compared to the CART classification analysis before SMOTE applied.