Digital Newsworthiness Scores Model Using a Combination of Unsupervised and Supervised Learning Approaches Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning
Main Article Content
Abstract
The rapid evolution of digital technology has transformed the media landscape, making news more accessible while also introducing challenges related to content quality and accuracy. The rise of misinformation and fake news has diminished public trust in traditional media. A method for evaluating the quality and potential impact of news articles prior to publication. By adapting credit risk scoring principles, a model was used to predict the suitability of news content based on factors such as title length, number of images, news category, and publication timing. A variable target was firstly formed using three clustering methods: K-Means, K-Modes, and K-Medoids. The results indicated that K-Means outperformed the other methods, leading us to use its outcomes for determining publication suitability. Subsequently, stepwise logistic regression was applied to implement the credit risk scoring approach, allowing for variable selection and assessment of importance. Ultimately, ten variables were identified to generate a newsworthiness score, with minimum and maximum scores of 997 and 1407, respectively. The average scores for articles deemed publishable and not publishable were 1137 and 1110. A cutoff score of 1123 was established based on these averages, categorizing 6708 articles (57.9%) as suitable for publication. These findings aim to assist media organizations in refining their content curation processes, thereby enhancing the overall quality of news consumption.
Downloads
Article Details
References
Abdou, H. A., & Pointon, J. (2011). Credit Scoring, Statistical Techniques and Evaluation Criteria: a Review of The Literature: Credit Scoring, Techniques & Evaluation Criteria: A Literature Review. Intelligent Systems in Accounting, Finance and Management, 18(2–3): 59–88. https://doi.org/10.1002/isaf.325
Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction Of Corporate Bankruptcy. The Journal of Finance, 23(4): 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
Atodaria, Z., & Pentar, S. (2024). Credit Risk Analysis Using Logistic Regression Modeling. 57.
Gharehgozli, A. H., & Zaerpour, N. (2018). Stacking outbound barge containers in an automated deep-sea terminal. Eur. J. Oper. Res., 267: 977–995.
Harrell, F. E. (2015). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. https://doi.org/10.1007/978-3-319-19425-7
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1): 100. https://doi.org/10.2307/2346830
Huang, Z. (1997). Clustering Large Data Sets with Mixed Numeric and Categorical Values. Retrieved from https://api.semanticscholar.org/CorpusID:3007488
Israel, S., Caspi, A., Belsky, D., Harrington, H., Hogan, S., Houts, R., Moffitt, T. (2014). Credit scores, cardiovascular disease risk, and human capital. Proceedings of the National Academy of Sciences of the United States of America, 111. https://doi.org/10.1073/pnas.1409794111
Kamimura, E. S., Pinto, A. R. F., & Nagano, M. S. (2023). A recent review on optimisation methods applied to credit scoring models. Journal of Economics, Finance and Administrative Science, 28(56): 352–371. https://doi.org/10.1108/JEFAS-09-2021-0193
Karmakar, A. (2023). Machine Learning Approach to Credit Risk Prediction: A Comparative Study Using Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. https://doi.org/10.13140/RG.2.2.31652.14725
Kwak, S. K., & Kim, J. H. (2017). Statistical data preparation: management of missing values and outliers. Korean Journal of Anesthesiology, 70(4): 407. https://doi.org/10.4097/kjae.2017.70.4.407
Lauer, J. (2017). Creditworthy: A History of Consumer Surveillance and Financial Identity in America. https://doi.org/10.7312/laue16808
Muqsith, M. A. (2021). Teknologi Media Baru: Perubahan Analog Menuju Digital. ADALAH, 5(2): 33–40. https://doi.org/10.15408/adalah.v5i2.17932
N., D. & Boitan. (2009). A Cluster Analysis Approach for Bank's Risk Profile: The Romanian Evidence. European Research Studies Journal, XII(Issue 1): 109–118. https://doi.org/10.35808/ersj/213
Onay, C., & Ozturk, E. (2018). A review of credit scoring research in the age of Big Data. Journal of Financial Regulation and Compliance, 26(3): 382–405. https://doi.org/10.1108/JFRC-06-2017-0054
Sanderford, A. R., McCoy, A. P., Keefe, M. J., & Zhao, D. (2014). Adoption Patterns of Energy Efficient Housing Technologies 2000-2010: Builders as Innovators
Sari, P. D., Aidi, M. N., & Sartono, B. (2019). Credit Scoring Analysis using LASSO Logistic Regression and Support Vector Machine (SVM). International Journal of Engineering and Management Research, 7(4): 393–397.
Sathye, M., & Islam, J. (2011). Adopting a risk-based approach to AMLCTF compliance: the Australian case. Journal of Financial Crime, 18(2): 169–182. https://doi.org/10.1108/13590791111127741
Seitshiro, M. B., & Govender, S. (2024). Credit risk prediction with and without weights of evidence using quantitative learning models. Cogent Economics & Finance, 12(1): 2338971. https://doi.org/10.1080/23322039.2024.2338971
Thomas, L. C., Edelman, D., & Crook, J. N. (2002). Credit scoring and its applications. https://doi.org/10.1137/1.9780898718317
Trinh, L. T. (2024). A comparative analysis of consumer credit risk models in Peer-to-Peer Lending. Journal of Economics, Finance and Administrative Science, 29(58): 346–365. https://doi.org/10.1108/JEFAS-04-2021-0026
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A Survey of Clustering Ensemble Algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03): 337–372. https://doi.org/10.1142/S0218001411008683
Zhang, Z. (2018). Estimating The Optimal Cutoff Point For Logistic Regression. Retrieved from https://digitalcommons.utep.edu/open_etd/1565