AI-AGE tim je predstavio rad pod nazivom „nterpretable ML for Diabetes and Prediabetes Screening Using Self-Reported Health Indicators“ autora S. Lazić, S. Cakića, I. Rubežić Lukić, N. Popović i T. Popovića na 30. godišnjoj konferenciji o informacionim tehnologijama IT 2026. Ovo je bio dio mentorskih aktivnosti i napora vezanih za razvoj mladih istraživača.

ABSTRACT – Early identification of type 2 diabetes (T2D) and prediabetes enables timely interventions, yet screening often relies on self-reported data rather than laboratory testing. This work compares lightweight Machine Learning (ML) models: Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Multilayer Perceptron (MLP) trained on 21 self-reported indicators from the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset for three-class classification (no diabetes, prediabetes, diabetes). We propose a screening-oriented evaluation where a probability threshold is selected to achieve a target sensitivity (recall) of 0.80. LightGBM achieves balanced accuracy of 0.52 and precision of 0.33 at the target sensitivity, with 38% of cases flagged. Tree SHapley Additive exPlanations (TreeSHAP) highlight general health status, age category, body mass index (BMI), and hypertension as dominant predictors. A FastAPI web application provides individual risk estimates and instance-level explanations. The pipeline demonstrates feasibility of interpretable, calibrated screening from non-laboratory data.


