Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa <p><strong>Indonesian Journal of Statistics and Its Applications (<a href="https://issn.brin.go.id/terbit/detail/1510202061" target="_blank" rel="noopener">eISSN:2599-0802</a>) (formerly named <a href="https://journal.ipb.ac.id/index.php/statistika" target="_blank" rel="noopener">Forum Statistika dan Komputasi</a>), </strong><strong>established since 2017</strong><strong>, </strong>publishes scientific papers in the area of statistical science and the applications. The published papers should be research papers with, but not limited to, the following topics: experimental design and analysis, survey methods and analysis, operation research, data mining, statistical modeling, computational statistics, time series and econometrics, and statistics education. All papers were reviewed by peer reviewers consisting of experts and academicians across universities and agencies. This journal is <strong>nationally accredited (SINTA 3)</strong> by Directorate General of Research and Development Strengthening (DGRDS), Ministry of Research, Technology and Higher Education of the Republic of Indonesia No.: <a href="https://stat.ipb.ac.id/main/wp-content/uploads/2024/08/Surat_Pemberitahuan_Hasil_Akreditasi_Jurnal_Ilmiah_Elektronik_Periode_III_Tahun_2019_dan_Lampiran.pdf" target="_blank" rel="noopener">14/E/KPT/2019, dated 10 May 2019</a>. </p> <p><strong>Indonesian Journal of Statistics and Its Applications</strong> is a scientific journal managed by the <strong>Department of Statistics, IPB University</strong>, in collaboration with the <strong>Forum Pendidikan Tinggi Statistika Indonesia</strong> (<a href="https://forstat.org/jurnal/" target="_blank" rel="noopener">FORSTAT</a>) and the <strong>Ikatan Statistisi Indonesia</strong> (<a href="https://isi-indonesia.org/isi/frontend/web/jurnal-ilmiah" target="_blank" rel="noopener">ISI</a>).</p> <p><strong>FORSTAT</strong> Decision Letter: [<a href="https://stat.ipb.ac.id/main/wp-content/uploads/2024/08/SK-Jurnal-Bekerja-Sama-FORSTAT.pdf" target="_blank" rel="noopener">Link to the Decision Letter</a>]</p> <p><strong>Scope:</strong><br />Indonesian Journal of Statistics and Its Applications is a refereed journal committed to Statistics and its applications.</p> <p><strong>Issues</strong> are released in June/July (Issue No. 1), October/November (Issue No. 2), and any Special Issues if applicable.</p> Departemen Statistika, IPB University dengan Forum Perguruan Tinggi Statistika (FORSTAT) en-US Indonesian Journal of Statistics and Its Applications 2599-0802 Sentiment Classification on the 2024 Indonesian Presidential Candidate Dataset Using Deep Learning Approaches https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1259 <p>This study aims to compare the performance of three deep learning models (LSTM, BiLSTM, and GRU) in the task of sentiment classification for the 2024 Indonesian Presidential Candidate dataset, focusing specifically on the case of Prabowo Subianto. The dataset comprises social media X posts sourced from kaggle, and the analysis investigates the effectiveness of different variants of recurrent neural network architectures in identifying public sentiment. The models were evaluated on accuracy and F1 score. The results demonstrate that BiLSTM outperformed both LSTM and GRU models in all metrics, achieving a testing accuracy of 80.70% and an F1 score of 86.86%, compared to LSTM and GRU which both achieved a testing accuracy of 72.56% and an F1 score of approximately 84%. The higher performance of BiLSTM is attributed to its ability to capture bidirectional context within the text, thereby understanding complex sentiment patterns more effectively. LSTM and GRU models displayed similar performance, therefore BiLSTM is the best model for this dataset. These results indicate that BiLSTM is especially well-suited for analyzing public sentiment towards political figures like Prabowo Subianto, offering significant insights into public discussions surrounding the 2024 Indonesian Presidential Election. This study recommends exploring transformer-based models like BERT or GPT variants to enhance sentiment classification accuracy in this domain.</p> Cici Suhaeni Hari Wijayanto Anang Kurnia Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 83 94 10.29244/ijsa.v8i2p83-94 Comparison Between SARIMA and DeepAR with Optuna Hyperparameter Optimization for Estimating Rice Production Data in Indonesia https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1213 <p>Forecast is a prediction of future events that had taken a significant role in our society especially when facing time-sensitive issues like food availability. Food is a critical aspect in ensuring people's welfare, especially in a country like Indonesia with a large population. Availability and access to rice are a vital need for the people of Indonesia. Rice is not only the main source of carbohydrates, but also has a central role in the cultural and social aspects of Indonesian society. Forecasting can be a strategy to anticipate fluctuations in food demand and supply. Forecasting can be an important instrument for the government and stakeholders to make the right and effective decisions. The growing period of rice which is heavily influenced by seasonality makes DeepAR and SARIMA techniques a good solution to solve this problem. Both methods offer the ability to address features in rice production such as trends, seasonality, and anomaly effects. This study demonstrates that DeepAR, especially when optimized with Optuna, outperforms SARIMA in forecasting rice production in Indonesia, as evidenced by superior performance in key evaluation metrics such as Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE).</p> Muhammad Farhan Zahid Anwar Fitrianto Pika Silvianti Aam Alamudi Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 95 111 10.29244/ijsa.v8i2p95-111 Acne Severity Classification Study Using Convolutional Neural Network Algorithm with MobileNetV2 Architecture https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1211 <p>Data classification is a key technique in machine learning that maps patterns and features of input data into a target class. Significant developments in data classification occur in deep learning with neural networks and Convolutional Neural Networks (CNN) that are able to extract image features automatically. CNN can classify the level of a condition based on image data, one of which is the severity of acne. Acne (acne vulgaris) is a common skin disease with varying severity. This study aims to apply the CNN MobileNetV2 model to classify acne severity based on acne input images. The data consists of 1457 acne images at 4 severity levels divided into 80% training data and 20% test data. MobileNetV2 was used as a feature extractor through transfer learning. Fine-tuning and classification were performed using fully connected layers with ReLU and softmax activation functions. The model was evaluated with a confusion matrix and classification report. The model with a combination of hyperparameter batch size 16 and a learning rate of 0.00001 was the best model that achieved 87.29% accuracy with 89% precision, 84% recall, and 86% F1 score for classifying acne severity.</p> Faadiyah Ramadhani Septian Rahardiantoro Mohammad Masjkur Copyright (c) 2024 ndonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 112 128 10.29244/ijsa.v8i2p112-128 Multi-Objective Optimization by Ratio Analysis (MOORA) Method for Decision Support System in Selecting the Best Electric Car https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1208 <p>Implementasi metode <em>Multi-Objective Optimization by Ratio Analysis</em> (MOORA) telah berhasil diterapkan untuk memilih mobil listrik terbaik. Hasil penelitian menunjukkan bahwa implementasi Metode MOORA berhasil merangking untuk 10 jenis mobil listrik dengan 8 jenis kriteria, yaitu: kapasitas baterai, kecepatan pengisian baterai, fitur kenyamanan, fitur keselamatan, jarak tempuh, kecepatan maksimum, harga, dan tenaga. Penerapan algoritma Moora didasarkan pada 4 tahapan, yaitu: penentuan nilai kriteria, penyusunan matriks keputusan, normalisasi dan optimasi atribut, dan penentuan rangking. Hasil penerapan metode MOORA merangking 10 jenis mobil listrik dengan urutan: Toyota BZ 4X, Hyundai ionic 5 2022, Cherry omodo E5 2024, Wuling cloud EV, Vinvost VF5, Nissan leaf 2021, Kia EV5 2023, BYD Dolphin, Wuling binguo EV, Wuling air EV 2022. Ketika terjadi penambahan dan pengurangan kriteria terjadi perubahan perangkingan. Hasil perangkingan mobil listrik terbaik ditampilkan dalam website dengan pemrograman Javascript dan PHP yang memuat tampilan halaman dashboard, halaman kriteria, halaman data, dan halaman perangkingan. Perhitungan pada sistem website telah divalidasi dengan aplikasi Excell menghasilkan akurasi 100%.</p> Zakiyah Humaira M. Irfan Ariandi Arie Qur’ania Teguh Puja Negara Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 129 131 10.29244/ijsa.v8i2p129-131 Energy Sector Stock Price Forecasting with Time Series Clustering Approach https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1155 <p>Stock investment promises higher returns but carries high risks because unpredictable price fluctuations. Energy sector shows potential due to its highest sectoral index growth in 2022. However, this doesn’t indicate that stock price increases occur evenly among all issuers. Therefore, it’s necessary to analyze clustering of issuers based on similarity of their stock price movements and used for forecasting stock prices at cluster level. This study aims to evaluate performance of clustering energy sector issuers using autocorrelation-based distance and dynamic time warping(DTW), and to forecast stock prices at cluster level. The data used consists weekly closing stock prices. The clustering used hierarchical average linkage method. Stock price forecast for each cluster used ARIMA model and its performance was evaluated using rolling-cross validation. The results showed that DTW distance had the best clustering performance. Energy sector issuers were grouped into four clusters with strong cluster category, indicated by silhouette coefficient &gt;0.71. ARIMA models for each cluster produced MAPE values between 10-20%, categorizing them as good forecasting models. Clusters A and D were recommended for investors because have highest potential for capital gain based on forecasted stock prices. That clusters also consisted of companies with strong fundamentals and dividend policies.</p> Linda Sakinah Rahma Anisa I Made Sumertajaya Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 132 142 10.29244/ijsa.v8i2p132-142 Ordinal Logistic Regression Model of Micro, Small, and Medium-Sized Enterprises Income: A Case Study of Micro, Small and Medium-Sized Enterprises in Surabaya https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1111 <p>Micro, Small, and Medium Enterprises (MSMEs) is a business sector that is able to make a significant contribution to economic recovery in Indonesia. In Surabaya, there are many MSMEs with various fields, both food and non-food sectors which include services, trade, etc. MSMEs actually have great potential to boost the economic growth of the people of Surabaya. Especially during the COVID-19 pandemic, MSMEs owners must be able to strategize how their income can be stable or even bigger. Therefore, it is very important to know what factors can boost MSMEs income in Surabaya. In this study, it will be examined what factors can affect the income of MSMEs in Surabaya. The method used in this study is Ordinal Logistic Regression which aims to determine which independent variables or factors affect the dependent variable which in this case is MSMEs income. Based on the results of the analysis, it can be seen that the variables that affect MSMEs income are MSMEs Location, MSME Activities, and MSME Outreach.</p> <p>&nbsp;</p> <p><strong>Keywords</strong>: ordinal logistic regression, MSMEs, income.</p> Amalia Nur Alifah Almira Ivah Edina Mawanda Almuhayar Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 143 154 10.29244/ijsa.v8i2p143-154 Statistical Downscaling Model with Jackknife Ridge Regression and Modified Jackknife Ridge Regression to Forecast Rainfall https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/936 <p>Statistical downscaling (SD) is a transfer function that connects local scale rainfall data with global scale rainfall. Global-scale rainfall can be obtained from the Global Circulation Model (GCM) output. GCM simulates climate variables in the form of large-scale grids, causing a high correlation between the grids (multicollinearity). The methods used in SD modeling to overcome multicollinearity are Jackknife Ridge Regression (JRR) and Modified Jackknife Ridge Regression (MJR). The method is the development of the Ridge Regression (RR) method. This study aims to predict local rainfall data in Pangkep Regency (response variables) based on local scale GCM output rainfall data (predictor variables) with the JRR and MJR approaches. In addition, K-means cluster technique is used in determining dummy variables to overcome the heterogeneity of the various remaining models. Results using training data (1990-2017 period) show that the MJR method is better at explaining the diversity of data based on a higher R2 value (68%) and a lower Root Mean Square Error / RMSE value (165.57) than the JRR method (R2 amount is 67 and RMSE amount is 167.72). Model validation using data testing (2018 period) also shows the same results, namely MJR is better than JRR. Other than that, the addition of dummy variables can improve the accuracy of the model in estimating rainfall data. Adding a dummy variable to the model results in a high R2 (range between 94% -95%) with a lower RMSE value (range between 66.60-67.69).</p> Sitti Sahriman Dewi Upa Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 155 165 10.29244/ijsa.v8i2p155 - 165 Handling Unbalanced Data with SMOTE Algorithm for Unemployment Classification in Lima Puluh Kota Regency Using CART Method https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1253 <p>Unemployment is a problem that occurs in the labor force, where high unemployment is caused by the low ability of the labor force. A region that is still experiencing unemployment problems in West Sumatera is Lima Puluh Kota Regency. Unemployment in Lima Puluh Kota Regency is caused by the low competence of human resources to fulfill employment market requirements.&nbsp; Based on the results of the Sakernas survey in August 2023, Lima Puluh Kota Regency has more employed labor force than unemployed labor force, so this results in unbalanced data. A method that can overcome unbalanced data is Synthetic Minority Oversampling Technique (SMOTE). SMOTE is a technique with addition of synthetic data in minority class so that the proportion is balanced. Data imbalance conditions need to be handled so as to improve the performance of the classification model. Classification and Regression Trees (CART) is a classification technique with a decision tree method that can obtain the characteristics of a classification. The purpose of this research is to compare the CART model before and after applying SMOTE which can be measured by comparing the highest Area Under Curve (AUC) value. The AUC value in the CART method before SMOTE applied has a value of 62.1% while the AUC value in the CART method after SMOTE applied has a value of 70.2%. Therefore, it can be concluded that the CART classification analysis after SMOTE applied is able to provide better performance compared to the CART classification analysis before SMOTE applied.</p> Aldwi Riandhoko Nonong Amalita Dodi Vionanda Admi Salma Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 166 177 10.29244/ijsa.v8i2p166-177 Implementation of Fuzzy C-Means Algorithm for Clustering Provinces in Indonesia Based on Micro and Small Industry Ratio in Village Areas https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1254 <p>Post-economic crisis, the micro and small industries contribute the most labor compared to other industries. Regional development sourced from small micro industries is a strategic force in developing a country because the development of small micro industries leads to realizing equitable welfare to reduce income inequality. Development in village areas is an important factor for regional development, reducing inequality between regions, and alleviating poverty. However, based on the 2018 PODES survey, there are regional imbalances in Indonesia in the small micro industry which is centralized on Java Island. Therefore, clustering and characteristics of the province were carried out based on the PODES survey of the small micro industry sector. This research uses the Fuzzy C-Means algorithm to cluster 34 provinces in Indonesia based on the ratio of small micro industries in village areas in 2021, to see how the development of small micro industries in village areas in each province in Indonesia. Fuzzy C-Means is one of the data clustering techniques that uses a fuzzy clustering model, where cluster formation is based on a membership degree value that varies between 0 and 1. The Fuzzy C-Means algorithm generates 4 clusters, cluster 1 and 2 represents provinces with high and very high micro and small industry development in village areas and cluster 3 and 4 represents provinces with medium and low micro and small industry development in village areas. The Fuzzy C-Means algorithm produces a good cluster structure with a silhouette coefficient value of 0,6406.</p> Frandito Rahmanesta Zamahsary Martha Dodi Vionanda Zilrahmi Zilrahmi Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 178 190 10.29244/ijsa.v8i2p178-190 Comparison of K-Means and K-Medoids in Clustering Regency/City in West Sumatra Province Based on Environmental Indicators https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1255 <p>The Environmental Quality Index is an index that describes the condition of environmental management results nationally, and generalises from all regencies/cities and provinces in Indonesia. Although the Environmental Quality Index of West Sumatra Province has increased, there are still regencies/cities in West Sumatra Province have decreasing Environmental Quality Index. Therefore, it is necessary to conduct further analysis, one of which is to form a group of regencies/cities into a group according to their similarities or characteristics. This study aims to compare the K-Means and K-Medoids methods in grouping regencies/cities in West Sumatra Province based on environmental quality indicators in 2023. The data used in this research is secondary data, which is orginally the publication of Central Bureau of Statistics namely Sumatera Barat Dalam Angka in 2024. The research compares the K-Means cluster method and the K-Medoids cluster method. It concludes K-Means better than K-Medoids methods based on DB index with three clusters. First cluster has 12 regencies/cities with a high average air quality index, the second cluster has 6 regencies/cities that have small amounts of waste, and the third cluster has 1 city with a high average water quality index and land quality index, but a large amount of waste.</p> <p> </p> <p><strong>Keywords</strong>: Cluster, Comparison, Environmental, K-Means, K-Medoids</p> Silfi Robiati Dina Fitria Dodi Vionanda Dwi Sulistiowati Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 191 201 10.29244/ijsa.v8i2p191-201 Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1265 <p>Water is essential for meeting the basic needs of living organisms. In Indonesia, ensuring safe and quality drinking water is crucial for public health. However, in some regions, particularly in West Java Province, people still rely on unsuitable water sources, which can negatively impact health. The classification of water source suitability can be achieved using machine learning, such as the Extreme Gradient Boosting (XGBoost) model. XGBoost with feature selection is effective in improving prediction accuracy and minimizing overfitting. This study evaluates the performance of the XGBoost model in classifying household drinking water sources in West Java and uses the K-Means algorithm for cluster SHAP values to identify key characteristics of households with safe drinking water. The results show that the XGBoost model, with an accuracy of 77.43% and an F1-Score of 80.17%, successfully classified 4187 households, with 2349 having safe drinking water and 1838 having unsuitable sources. SHAP value analysis identified location, water collection time, and monthly per capita expenditure as significant factors influencing water source suitability. Households with water sources inside the house's fence, a short water collection time, and high monthly per capita expenditure tend to have safe drinking water sources. There are 4 clusters formed, with cluster 1 and cluster 3 needing immediate quality of drinking water sources improvement with cluster 2 as an indicator of success. Cluster 4 consists of households with high expenditure, marking it as a potential household for the government to make water quality improvements.</p> Annisa Permata Sari Billy Denanda Aufadlan Tsaqif Bagus Sartono Aulia Rizki Firdawanti Copyright (c) 2024 Indonesian Journal of Statistics and Its Applications 2024-12-31 2024-12-31 8 2 202 214 10.29244/ijsa.v8i2p202-214