https://journal-stats.ipb.ac.id/index.php/ijsa/issue/feed Indonesian Journal of Statistics and Its Applications 2025-06-30T00:44:00+07:00 Sachnaz Desta Oktarina sachnazdes@apps.ipb.ac.id Open Journal Systems <p><strong>Indonesian Journal of Statistics and Its Applications (<a href="https://issn.brin.go.id/terbit/detail/1510202061" target="_blank" rel="noopener">eISSN:2599-0802</a>) (formerly named <a href="https://journal.ipb.ac.id/index.php/statistika" target="_blank" rel="noopener">Forum Statistika dan Komputasi</a>), </strong><strong>established since 2017</strong><strong>, </strong>publishes scientific papers in the area of statistical science and the applications. The published papers should be research papers with, but not limited to, the following topics: experimental design and analysis, survey methods and analysis, operation research, data mining, statistical modeling, computational statistics, time series and econometrics, and statistics education. All papers were reviewed by peer reviewers consisting of experts and academicians across universities and agencies. This journal is <strong>nationally accredited (SINTA 3)</strong> by Directorate General of Research and Development Strengthening (DGRDS), Ministry of Research, Technology and Higher Education of the Republic of Indonesia No.: <a href="https://stat.ipb.ac.id/main/wp-content/uploads/2024/08/Surat_Pemberitahuan_Hasil_Akreditasi_Jurnal_Ilmiah_Elektronik_Periode_III_Tahun_2019_dan_Lampiran.pdf" target="_blank" rel="noopener">14/E/KPT/2019, dated 10 May 2019</a>. </p> <p><strong>Indonesian Journal of Statistics and Its Applications</strong> is a scientific journal managed by the <strong>Department of Statistics, IPB University</strong>, in collaboration with the <strong>Forum Pendidikan Tinggi Statistika Indonesia</strong> (<a href="https://forstat.org/jurnal/" target="_blank" rel="noopener">FORSTAT</a>) and the <strong>Ikatan Statistisi Indonesia</strong> (<a href="https://isi-indonesia.org/isi/frontend/web/jurnal-ilmiah" target="_blank" rel="noopener">ISI</a>).</p> <p><strong>FORSTAT</strong> Decision Letter: [<a href="https://stat.ipb.ac.id/main/wp-content/uploads/2024/08/SK-Jurnal-Bekerja-Sama-FORSTAT.pdf" target="_blank" rel="noopener">Link to the Decision Letter</a>]</p> <p><strong>Scope:</strong><br />Indonesian Journal of Statistics and Its Applications is a refereed journal committed to Statistics and its applications.</p> <p><strong>Issues</strong> are released in June/July (Issue No. 1), October/November (Issue No. 2), and any Special Issues if applicable.</p> https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1268 Classification of Rice Growth Phase Using Regression Logistic Multinomial Model and K-Nearest Neighbors Imputation on Satellite Data 2024-12-24T17:24:53+07:00 Fayyadh Ghaly fayyadhsaja@gmail.com Yenni Kurniawati yennikurniawati@fmipa.unp.ac.id Nonong Amalita yennikurniawati@fmipa.unp.ac.id Dina Fitria yennikurniawati@fmipa.unp.ac.id <p>One of the efforts made by the government to maintain food security is to provide statistical data on rice production through accurate calculation of harvest areas using the area sampling framework approach. Although area sampling framework surveys produce accurate estimates, the costs required are quite high when applying this method. To overcome this problem, one solution that can be applied is to utilize satellite imagery to monitor the greenness index of plants using the enhanced vegetation index. However, in real conditions, the Landsat-8 optical satellite is susceptible to cloud cover, which results in missing data. This study aims to model the phase of rice plants using the regression logistic multinomial model by utilizing Landsat-8 satellites and k-nearest neighbors imputation handling to overcome missing data. The results showed that the model had varying performance in each phase, with an average balanced accuracy of 66.45%. This figure shows that the model can classify the area sampling framework data imputed using the k-nearest neighbors imputation method well. The model shows optimal performance in the late vegetative and generative phases but is less effective in detecting the harvest, puso, and non-rice paddy phases.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1247 Application of Univariate and Multivariate Long Short Term Memory for World Crude Palm Oil Price Prediction 2024-12-06T11:09:42+07:00 Nabil Izzany nabil.izzany15@gmail.com Mohammad Masjkur masjkur@apps.ipb.ac.id Akbar Rizki akbar.ritzki@apps.ipb.ac.id <p>Time series analysis is essential for predicting economic and other important factors; it can be done univariately or multivariately. Technological developments created long short term memory that can handle vanishing gradients and long-term dependencies. This research will predict the world price of crude palm oil because Indonesia, as the world's largest crude palm oil producer, is strongly influenced by the world crude palm oil price. This study uses monthly data on crude palm oil, soybean oil, and crude oil prices from January 2002 to May 2024 obtained from the World Bank Commodity Price Data. This research applies univariate and multivariate long short term memory to predicting crude palm oil prices. The use of long short term memory is because the data shows non-linear elements and high volatility. The input used for univariate long short term memory is the crude palm oil price, while multivariate long short term memory uses crude palm oil, soybean oil, and crude oil prices. The univariate long short term memory proved to be more effective in the case of world crude palm oil price prediction. This is proven by the lower mean absolute percentage error of 6,574% compared to the multivariate long short term memory of 6,689%. This univariate long short term memory uses a combination of hyperparameters: neuron 32, epoch 100, time steps 1, batch size 64, and learning rate 0,01.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1199 Missing Value Estimation Using Fuzzy C-Means in Classification of Chronic Kidney Disease 2024-12-07T09:53:15+07:00 Raisa Nida Eria eriaraisanida@gmail.com Aam Alamudi aamalamudi@apps.ipb.ac.id Itasia Dina Sulvianti itasiasu@apps.ipb.ac.id Pika Silvianti aamalamudi@apps.ipb.ac.id Septian Rahardiantoro aamalamudi@apps.ipb.ac.id <p>Based on World Health Organization (WHO) the cases of death due to Chronic Kidney Disease (CKD) ranked the 10th worldwide in 2020. CKD need to be done prevent early. History data to identify individuals predisposed to CKD in this research. In this research data contains missing values, therefore using Fuzzy C - Means (FCM) method to address it. The percentage of error in clustering CKD using FCM method is 20,25% and balanced accuracy of 84,80%. The result from classification using Classification and Regression Trees (CART) shows that accuracy value of 97,50%; sensitivity of 100,00%; and specificity of 92,86%. Individual suffer from CKD if having (1) hemoglobin more than or equal 13; <em>spesific gravity </em>1,020 or 1,025; <em>serum creatinine </em>less than 1,3; <em>albumin </em>1 or 2 or 3 or 4 or 5; and <em>sugar </em>0 or 2 or 3 or 4 or 5, (2) hemoglobin more than or equal 13; <em>spesific gravity </em>1,020 or 1,025; and <em>serum creatinine </em>more than or equal 1,3, (3) hemoglobin more than or equal 13 and <em>spesific gravity </em>1,005 or 1,010 or 1,015, (4) <em>hemoglobin </em>less than 13 and <em>red blood cell count </em>less than 5,5.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1289 Early Preeclampsia Detection Using XGBoost-Cox Proportional Hazard Model 2025-06-11T16:07:38+07:00 Arya Wira Syahdwinata arya.wira@ui.ac.id Sarini Abdullah sarini@sci.ui.ac.id <p>Various prognostic models based on survival analysis methods have been proposed to predict the risk of preeclampsia (PE). To develop a more accurate yet interpretable prediction approach, we utilized clinical data from pregnant women collected at a hospital in Jakarta and applied the XGBoost-Cox Proportional Hazard Model (XGB-Cox). This model integrates the predictive power of the XGBoost machine learning algorithm with the Cox Proportional Hazard (Cox-PH) model, which estimates the effect of covariates on event time. Our results show that the XGB-Cox model outperforms the traditional Cox-PH model based on four evaluation metrics: log-likelihood, log-rank test, concordance index (C-index), and Brier score. The XGB-Cox model achieved a higher C-index of 0.8908 compared to 0.7548 for Cox-PH, indicating improved risk discrimination. Kaplan-Meier curves suggest that XGB-Cox provides better separation across risk quartiles. While XGB-Cox generally yields lower Brier Scores, its performance declines at later gestational weeks. The Cox-PH model remains superior in interpretability, offering clear hazard ratios, while XGB-Cox enhances model fitness and still provides meaningful insights into feature importance. Additionally, sensitivity analysis underscores the need to carefully determine the proportion of censored data, as excessive censoring affects model stability. These findings suggest that XGB-Cox provides a robust predictive framework for early PE risk assessment, supporting its potential application in clinical decision-making for maternal healthcare.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1231 Winsorization for Outliers in Clustering Non-Cyclical Stocks with K-Means and K-Medoids 2025-01-31T12:26:37+07:00 Naura Tirza Ardhani nauraatirza@apps.ipb.ac.id Khairil Anwar Notodiputro khairil@apps.ipb.ac.id Sachnaz Desta Oktarina sachnazdes@apps.ipb.ac.id <p>Non-cyclical consumer sector stocks are often chosen by investors because the products in this sector are essential products that always in demand by society. Therefore, the demand for these products tends to be stable and defensive or less affected by economic shocks. However, it does not guarantee that every stock in this sector has good performance, thus it is necessary to group stocks based on their fundamental indicators in the form of financial ratios. This research aims to identify the best method by considering outliers and determining the clusters with the best fundamental performance as a recommendation for investors to make the right investment decisions. The data used in this study is secondary data with observations in the form of 50 non-cyclical consumer sector stocks. The variables used are Earning per Share, Return on Equity, Return on Assets, Debt to Equity Ratio, Price to Earnings Ratio, and Price to Book Value. The clustering results indicated that K-Medoids is the best clustering method, both on the data before and after handling extreme outliers with winsorization approach. However, the optimum number of clusters before and after winsorization are different, with 3 and 6 clusters. Considering the influence of extreme outliers and to get a more informative clustering result, the clustering result after the application of winsorization technique was chosen, which resulted in 6 clusters. Cluster 1, which consists of AALI, GGRM, INDF, and SGRO can be recommended because it has excellent fundamental performance, especially in terms of Earning per Share in 2022.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1252 Comparison of The Singular Spectrum Analysis and SARIMA for Forecasting Rainfall in Padang Panjang City 2024-12-06T11:01:33+07:00 Fadhira Vitasha Putri fadhiravitashaputri@gmail.com Fadhilah Fitri fadhilahfitri@fmipa.unp.ac.id Yenni Kurniawati fadhilahfitri@fmipa.unp.ac.id Zilrahmi Zilrahmi fadhilahfitri@fmipa.unp.ac.id <p>Indonesia is an area with a tropical climate, so it has two seasons, namely the rainy season and the dry season. The rainy season lasts from November to March and during this period rainfall tends to be high in several areas. Padang Panjang City is one of the cities with the smallest area in West Sumatra Province, which has the nickname Rain City. This is because the city of Padang Panjang has cool air with a maximum air temperature of 26.1 °C and a minimum of 21.8 °C, so this city has a fairly high level of rainfall with an average of 300 to 400 mm/year. This article discusses rainfall forecasting for Padang Panjang City by comparing the Singular Spectrum Analysis and Seasonal Autoregressive Integrated Moving Average methods. The data used spans 8 years, from January 2016 to December 2023. Forecasting results are obtained from the best method selected based on the smallest Mean Absolute Percentage Error value. The Singular Spectrum Analysis method has a Mean Absolute Percentage Error value of 5.59% and Singular Spectrum Analysis and Seasonal Autoregressive Integrated Moving Average has a value 7.43%. The best forecasting method is obtained by the Singular Spectrum Analysis method.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1270 Application of Singular Spectrum Analysis in Predicting Rupiah Exchange Yuan 2024-12-24T17:19:22+07:00 Muhammad Hendrawan m.hendrawan01012001@gmail.com Zilrahmi Zilrahmi zilrahmi@fmipa.unp.ac.id Yenni Kurniawati zilrahmi@fmipa.unp.ac.id Dina Fitria zilrahmi@fmipa.unp.ac.id <p>The exchange rate between two countries is the price of the currency used by residents of these countries to trade with each other, the relationship between the Rupiah exchange rate and the Yuan is one of the important aspects in the dynamics of international trade. Therefore, forecasting the exchange rate is important as an effort to predict the exchange rate of Rupiah against Yuan in the future. The method used for forecasting is Singular Spectrum Analysis, namely decomposition and reconstruction. The accuracy of the resulting forecast is measured using the Mean Absolute Percentage Error criterion. The exploration results obtained are forecasting accuracy based on the Mean Absolute Percentage Error value of 2.15% with a window length of 23 which identifies that the forecasting results are accurate and effective. Forecasting is said to be accurate if the Mean Absolute Percentage Error value is lower than 10% and close to 10%</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1261 Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches 2024-12-18T15:01:12+07:00 Reza Felix Citra rezafelix@gmail.com Aji Hamim Wigena rezafelix@gmail.com Bagus Sartono bagusco@apps.ipb.ac.id <p>The rapid evolution of digital technology has transformed the media landscape, making news more accessible while also introducing challenges related to content quality and accuracy. The rise of misinformation and fake news has diminished public trust in traditional media. A method for evaluating the quality and potential impact of news articles prior to publication. By adapting credit risk scoring principles, a model was used to predict the suitability of news content based on factors such as title length, number of images, news category, and publication timing. A variable target was firstly formed using three clustering methods: K-Means, K-Modes, and K-Medoids. The results indicated that K-Means outperformed the other methods, leading us to use its outcomes for determining publication suitability. Subsequently, stepwise logistic regression was applied to implement the credit risk scoring approach, allowing for variable selection and assessment of importance. Ultimately, ten variables were identified to generate a newsworthiness score, with minimum and maximum scores of 997 and 1407, respectively. The average scores for articles deemed publishable and not publishable were 1137 and 1110. A cutoff score of 1123 was established based on these averages, categorizing 6708 articles (57.9%) as suitable for publication. These findings aim to assist media organizations in refining their content curation processes, thereby enhancing the overall quality of news consumption.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1302 Exploring a Large Language Model on the ChatGPT Platform for Indonesian Text Preprocessing Tasks 2025-06-24T09:09:29+07:00 Cici Suhaeni cici_suhaeni@apps.ipb.ac.id Sabrina Adnin Kamila sabrinaadnin@apps.ipb.ac.id Fani Fahira fanifahira@apps.ipb.ac.id Muhammad Yusran muhammadyusran@apps.ipb.ac.id Gerry Alfa Dito gerrydito@apps.ipb.ac.id <p>Preprocessing is a crucial step in Natural Language Processing, especially for informal languages like Indonesian, which contain complex morphology, slang, abbreviations, and non-standard expressions. Traditional rule-based tools such as regex, IndoNLP, and Sastrawi are commonly used but often fall short in handling noisy, user-generated text. This study explores the capability of Large Language Model, particularly ChatGPT-o3, in performing Indonesian text preprocessing tasks, namely text cleaning, normalization, stopword removal, and stemming/lemmatization, and compares it to conventional rule-based approaches. Using two types of datasets, consisting of a small example dataset of five manually constructed sentences and a real-world dataset of 100 tweets about the Indonesian “Makan Bergizi Gratis” program, both preprocessing methods were applied and evaluated. Results show that ChatGPT-o3 performs equally well in text cleaning and significantly better in normalization. However, rule-based methods like IndoNLP and Sastrawi still outperform ChatGPT-o3 in stopword removal and stemming. These findings indicate that while ChatGPT-o3 demonstrates strong contextual understanding and linguistic flexibility, they may underperform in rigid, token-based operations without fine-tuning. This study provides initial insights into using Large Language Models as an alternative preprocessing engine for Indonesian text and highlights the need for hybrid approaches or improved prompt design in future applications.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1246 K-Prototypes Algorithm for School Indexing in Report Card-Based Student Admissions 2025-06-20T12:11:38+07:00 Ervina Dwi Anggrahini ervinadwi@apps.ipb.ac.id Mohammad Masjkur masjkur@apps.ipb.ac.id Utami Dyah Syafitri utamids@apps.ipb.ac.id <p>Institut Pertanian Bogor, also known as IPB University, is a state university that was ranked first as the best university in Indonesia by the Ministry of Research and Technology in 2020. It has three main channels in the new student admission selection system. The selection method is called “Seleksi Nasional Berdasarkan Prestasi”. “Seleksi Nasional Berdasarkan Prestasi” is one of the new student admission pathways at IPB University based on report cards without a test. The selection of new student admissions based on report cards requires creating a school index to assess the quality and commitment of each school by grouping schools among “Seleksi Nasional Berdasarkan Prestasi” applicants. One method that can be used is the K-Prototypes algorithm. K-Prototypes can be used to cluster large and mixed-type data (numeric and categorical) by combining distance measures from two non-hierarchical methods, namely the K-Means and K-Modes algorithms. Based on the analysis, the K-Prototypes algorithm yields three optimal clusters, each with distinct characteristics. Cluster 1 is the lowest cluster because it comprises schools with the lowest quality and commitment to new student admissions at IPB University, as indicated by the report card. Cluster 2 has a quality that is not superior to Cluster 3 but is higher than that of Cluster 1. Cluster 3 is the best cluster because it consists of schools that have high quality and commitment to new student admissions at IPB University through the report card route.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1296 Performance Evaluation of ARDL Model Stacked with Boosted Ridge Regression on Time Series Data with Multicollinearity 2025-06-30T00:44:00+07:00 Amir Abduljabbar Dalimunthe amirdalimunthe@apps.ipb.ac.id Agus Mohamad Soleh agusms@apps.ipb.ac.id Farit Mochamad Afendi fmafendi@apps.ipb.ac <p>Time series data plays a vital role in financial and economic study. Two commonly applied models for such data are Vector Autoregression (VAR) and Autoregressive Distributed Lags (ARDL). Nonetheless, interdependence among explanatory variables often leads to multicollinearity, posing challenges for model reliability. This study investigates the effectiveness of the ARDL model integrated with boosted ridge regression as a method to mitigate multicollinearity. Due to limitations in available empirical data, simulation data will be generated to support the analysis. The research consists of two stages: synthetic data generation and analysis on simulated data. Results suggest that ARDL performs well under various multicollinearity conditions, particularly when the training set is sufficiently large and model structure is correctly specified. For smaller training sets, the ARDL Ridge variant demonstrates improved predictive performance.</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1227 Forecasting Nonlinear Time Series with ARIMA, ANN, and Hybrid Models: A Case Study on Inflation Rate in Sri Lanka 2024-12-12T11:32:44+07:00 W. M. Sudarshana Bandara bandarasudarshana009@gmail.com Withanage Ajith Raveendra De Mel ajith@maths.ruh.ac.lk <p>In time series forecasting, hybrid models combining autoregressive integrated moving average (ARIMA) and artificial neural networks (ANNs) have gained prominence due to their ability to capture both linear and nonlinear patterns within data. ARIMA models are effective at modeling linear relationships, while ANNs are adept at handling complex nonlinear structures. However, each model has its limitations when used independently. This study presents a hybrid model that integrates the strengths of both ARIMA and ANN to forecast the monthly inflation rate in Sri Lanka using historical data from 1988 to 2018. Our findings demonstrate that the proposed hybrid model outperforms the standalone ARIMA and ANN models, particularly in terms of Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). By leveraging the complementary strengths of ARIMA and ANN, this hybrid approach provides a robust forecasting framework for handling the diverse structural complexities of time series data</p> 2025-06-24T00:00:00+07:00 Copyright (c) 2025 Indonesian Journal of Statistics and Its Applications