Predicting Healthcare Service Utilization Using Supervised Machine Learning Methods: Model Comparison and Feature Selection

Toy, Ahmet; Ebrahim, Endris; Mohammed, Abdelah

doi:10.1155/int/5502261

Predicting Healthcare Service Utilization Using Supervised Machine Learning Methods: Model Comparison and Feature Selection

Toy A., Ebrahim E. A., Mohammed A. A.

International Journal of Intelligent Systems, cilt.2026, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 2026 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1155/int/5502261
Dergi Adı: International Journal of Intelligent Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Applied Science & Technology Source, Compendex, INSPEC, zbMATH, Engineering Source (EBSCO), Materials Science & Engineering Collection (ProQuest), Technology Collection (ProQuest)
Anahtar Kelimeler: classification, feature selection, health services, machine learning, model performance, prediction
Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

Machine learning (ML) techniques offer a powerful way to examine the variables and the influence of community-based health insurance (CBHI) usage on healthcare service utilization, providing insightful information for health policy. Researchers cannot objectively evaluate the performance of the ML model without suitable statistical measurements, which are among the most significant features of ML platforms. Statistical inference and prediction of healthcare service utilization, as well as the identification of the associated key factors, are essential for policy action and planning among CBHI user and nonuser community groups. The demonstration included seven distinct supervised ML methods: random forests, decision trees, support vector machines, k-nearest neighbors, logistic regression, naive Bayes, and gradient boosting. We identified the most important parameters influencing healthcare service use through feature selection. The predictive performance of seven supervised classification ML models was evaluated using accuracy, precision, recall, the F1 score, and the area under the curve (AUC)–ROC metrics, along with the relative importance of all variables. Thus, we identified the best ML classification method for predicting health service utilization. The study also identified four key features influencing community utilization of healthcare services. The CBHI, chronic illness, under-five children, and the wealth index were significant and influential features. The fitted ML models displayed irreproachable accuracy, precision, recall, and F1 scores (ranging from 0.444 to 0.795), as well as a good AUC (ranging from 0.642 to 0.745), suggesting balanced performance across measurement parameters. Gradient boosting achieved the highest prediction accuracy among the models and continuously outperformed the other methods. Additionally, random forests and logistic regression demonstrated strong classification performance. Researchers need to apply statistical concepts from ML to identify the best predictive model, accounting for feature relative influence, for better decision-making. Healthcare service utilization was higher among community members who used the CBHI than among households that did not. Thus, it is recommended that managers and leaders in the health sector of the study area intensify their efforts to expand the CBHI’s community membership.