Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches: Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning

Reza Felix Citra; Aji Hamim Wigena; Bagus Sartono

doi:10.29244/ijsa.v9i1p86-99

PDF

Published: Jun 24, 2025

DOI: https://doi.org/10.29244/ijsa.v9i1p86-99

Keywords:

accuracy credit risk scoring k-means logistic regression silhouette

Reza Felix Citra

Study Program on Statistics and Data Science, IPB University, Indonesia

Aji Hamim Wigena

Study Program on Statistics and Data Science, IPB University, Indonesia

Bagus Sartono

Study Program on Statistics and Data Science, IPB University, Indonesia

Abstract

The rapid evolution of digital technology has transformed the media landscape, making news more accessible while also introducing challenges related to content quality and accuracy. The rise of misinformation and fake news has diminished public trust in traditional media. A method for evaluating the quality and potential impact of news articles prior to publication. By adapting credit risk scoring principles, a model was used to predict the suitability of news content based on factors such as title length, number of images, news category, and publication timing. A variable target was firstly formed using three clustering methods: K-Means, K-Modes, and K-Medoids. The results indicated that K-Means outperformed the other methods, leading us to use its outcomes for determining publication suitability. Subsequently, stepwise logistic regression was applied to implement the credit risk scoring approach, allowing for variable selection and assessment of importance. Ultimately, ten variables were identified to generate a newsworthiness score, with minimum and maximum scores of 997 and 1407, respectively. The average scores for articles deemed publishable and not publishable were 1137 and 1110. A cutoff score of 1123 was established based on these averages, categorizing 6708 articles (57.9%) as suitable for publication. These findings aim to assist media organizations in refining their content curation processes, thereby enhancing the overall quality of news consumption.

Downloads

Download data is not yet available.

How to Cite

1.

Citra RF, Wigena AH, Sartono B. Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches: Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning. IJSA [Internet]. 2025 Jun. 24 [cited 2026 Jan. 11];9(1):86-99. Available from: https://journal-stats.ipb.ac.id/index.php/ijsa/article/view/1261

Issue

Vol. 9 No. 1 (2025)

Section

Articles

References

Abdou, H. A., & Pointon, J. (2011). Credit Scoring, Statistical Techniques and Evaluation Criteria: a Review of The Literature: Credit Scoring, Techniques & Evaluation Criteria: A Literature Review. Intelligent Systems in Accounting, Finance and Management, 18(2–3): 59–88. https://doi.org/10.1002/isaf.325

Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction Of Corporate Bankruptcy. The Journal of Finance, 23(4): 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

Atodaria, Z., & Pentar, S. (2024). Credit Risk Analysis Using Logistic Regression Modeling. 57.

Gharehgozli, A. H., & Zaerpour, N. (2018). Stacking outbound barge containers in an automated deep-sea terminal. Eur. J. Oper. Res., 267: 977–995.

Harrell, F. E. (2015). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. https://doi.org/10.1007/978-3-319-19425-7

Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1): 100. https://doi.org/10.2307/2346830

Huang, Z. (1997). Clustering Large Data Sets with Mixed Numeric and Categorical Values. Retrieved from https://api.semanticscholar.org/CorpusID:3007488

Israel, S., Caspi, A., Belsky, D., Harrington, H., Hogan, S., Houts, R., Moffitt, T. (2014). Credit scores, cardiovascular disease risk, and human capital. Proceedings of the National Academy of Sciences of the United States of America, 111. https://doi.org/10.1073/pnas.1409794111

Kamimura, E. S., Pinto, A. R. F., & Nagano, M. S. (2023). A recent review on optimisation methods applied to credit scoring models. Journal of Economics, Finance and Administrative Science, 28(56): 352–371. https://doi.org/10.1108/JEFAS-09-2021-0193

Karmakar, A. (2023). Machine Learning Approach to Credit Risk Prediction: A Comparative Study Using Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. https://doi.org/10.13140/RG.2.2.31652.14725

Kwak, S. K., & Kim, J. H. (2017). Statistical data preparation: management of missing values and outliers. Korean Journal of Anesthesiology, 70(4): 407. https://doi.org/10.4097/kjae.2017.70.4.407

Lauer, J. (2017). Creditworthy: A History of Consumer Surveillance and Financial Identity in America. https://doi.org/10.7312/laue16808

Muqsith, M. A. (2021). Teknologi Media Baru: Perubahan Analog Menuju Digital. ADALAH, 5(2): 33–40. https://doi.org/10.15408/adalah.v5i2.17932

N., D. & Boitan. (2009). A Cluster Analysis Approach for Bank's Risk Profile: The Romanian Evidence. European Research Studies Journal, XII(Issue 1): 109–118. https://doi.org/10.35808/ersj/213

Onay, C., & Ozturk, E. (2018). A review of credit scoring research in the age of Big Data. Journal of Financial Regulation and Compliance, 26(3): 382–405. https://doi.org/10.1108/JFRC-06-2017-0054

Sanderford, A. R., McCoy, A. P., Keefe, M. J., & Zhao, D. (2014). Adoption Patterns of Energy Efficient Housing Technologies 2000-2010: Builders as Innovators

Sari, P. D., Aidi, M. N., & Sartono, B. (2019). Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine (SVM). International Journal of Engineering and Management Research, 7(4): 393–397.

Sathye, M., & Islam, J. (2011). Adopting a risk-based approach to AMLCTF compliance: the Australian case. Journal of Financial Crime, 18(2): 169–182. https://doi.org/10.1108/13590791111127741

Seitshiro, M. B., & Govender, S. (2024). Credit risk prediction with and without weights of evidence using quantitative learning models. Cogent Economics & Finance, 12(1): 2338971. https://doi.org/10.1080/23322039.2024.2338971

Thomas, L. C., Edelman, D., & Crook, J. N. (2002). Credit scoring and its applications. https://doi.org/10.1137/1.9780898718317

Trinh, L. T. (2024). A comparative analysis of consumer credit risk models in Peer-to-Peer Lending. Journal of Economics, Finance and Administrative Science, 29(58): 346–365. https://doi.org/10.1108/JEFAS-04-2021-0026

Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A Survey of Clustering Ensemble Algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03): 337–372. https://doi.org/10.1142/S0218001411008683

Zhang, Z. (2018). Estimating The Optimal Cutoff Point For Logistic Regression. Retrieved from https://digitalcommons.utep.edu/open_etd/1565

Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning

Abstract

Downloads

References

Most read articles by the same author(s)

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References

Most read articles by the same author(s)