Hybrid decision tree-based machine learning models for diabetes prediction

Volume 8, Issue 1, February 2024     |     PP. 1-30      |     PDF (1518 K)    |     Pub. Date: January 9, 2024
DOI: 10.54647/isss120327    21 Downloads     64132 Views  


Efijemue Oghenekome Paul, Department of Computer Science, Austin Peay State University, Clarksville USA

Due to the ever-increasing incidence of diabetes, effective screening strategies are needed for early diagnosis and intervention. This study proposes a novel approach that harnesses the power of artificial intelligence (AI) to predict diabetes risk. Using machine learning techniques and a database with demographic, clinical and lifestyle variables, the proposed model achieves the best accuracy in predicting the probability of developing diabetes. The prediction model uses advanced feature selection and cross-validation techniques to improve reliability and generalizability. Integrating AI into diabetes prediction paves the way for earlier healthcare, enabling personalized intervention and ultimately reducing the burden of diabetes on individuals and healthcare systems.

Diabetes prediction, Machine learning, Incidence of diabetes, Integrating AI into diabetes

Cite this paper
Efijemue Oghenekome Paul, Hybrid decision tree-based machine learning models for diabetes prediction , SCIREA Journal of Information Science and Systems Science. Volume 8, Issue 1, February 2024 | PP. 1-30. 10.54647/isss120327


[ 1 ] Aich, S., Al-Absi, A. A., Hui, K. L., Lee, J. T. and Sain, M. (2018). A classification approaches with different feature sets to predict the quality of different types of wine using machine learning techniques, 2018 20th International conference on advanced communication technology (ICACT), IEEE, pp. 139–143.
[ 2 ] Aiswarya Iyer, S. Jeyalatha and Ronak Sumbaly,” Diagnosis of Diabetes Using Classification Mining Techniques”, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.5, No.1, January 2015.
[ 3 ] Andrea Mechelli and Sandra Vieira “Machine learning”, Method and Application: https://doi.org/10.1016/C2017-0-03724-2, 2020
[ 4 ] Aurit, S., Kleffner, A. and Robinson, E. (2021). Final project proposal: Statistical learning imbalanced classification and prediction of wine quality, red 94: 92–6.
[ 5 ] Ayush Anand and Divya Shakti,” Prediction of Diabetes Based on Personal Lifestyle Indicators”, 1st International Conference on Next Generation Computing Technologies, 978-1-4673-6809-4, September 2015.
[ 6 ] Barnaghi, P., Sheth, A. and Henson, C. (2013). From data to actionable knowledge: Big data challenges in the web of things, IEEE Intelligent Systems 28(6): 6–11.
[ 7 ] Basu, M. Bilenko, and R. J. Mooney, “A probabilistic framework for semi-supervised clustering,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’04. New York, NY, USA: ACM, 2004, pp. 59–68.
[ 8 ] Bhardwaj, P., Tiwari, P., Olejar Jr, K., Parr, W. and Kulasiri, D. (2022). A machine learning application in wine quality prediction, Machine Learning with Applications 8: 100261.
[ 9 ] Brokamp, C., Jandarov, R., Hossain, M. and Ryan, P. (2018). Predicting daily urban fine particulate matter concentrations using a random forest model, Environmental science &technology 52(7): 4173–4179.
[ 10 ] Caissie, A. F., Riquier, L., De Revel, G. and Tempere, S. (2021). Representational and sensory cues as drivers of individual differences in expert quality assessment of red wines, Food Quality and Preference 87: 104032
[ 11 ] Canizo, B. V., Escudero, L. B., Pellerano, R. G. and Wuilloud, R. G. (2019). Data mining approach based on chemical composition of grape skin for quality evaluation and traceability prediction of grapes, Computers and Electronics in Agriculture 162: 514–522.
[ 12 ] Cardoso Schwindt, V., Coletto, M. M., D´ıaz, M. F. and Ponzoni, I. (2022). Could modelling and machine learning techniques be useful to predict wine aroma? Food and Bioprocess Technology pp. 1–19.
[ 13 ] Chandra-Sely, Chui, K. T., Fung, D. C. L., Lytras, M. D., & Lam, T. M,. “Predicting At-risk University Students in a Virtual Learning Environment via a Machine Learning Algorithm”. Computers in Human Behavior (2016). https://doi.org/10.1016/j.chb.2018.06.032.
[ 14 ] Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique, Journal of artificial intelligence research 16: 321–357.
[ 15 ] Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K. et al. (2015). Xgboost: extreme gradient boosting, R package version 0.4-2 1(4): 1–4.
[ 16 ] Dahal, K., Dahal, J., Banjade, H. and Gaire, S. (2021). Prediction of wine quality using machine learning algorithms, Open Journal of Statistics 11(2): 278–289.
[ 17 ] Elder, J. (n.d). Introduction to Machine Learning and Pattern Recognition. Available at LASSONDE University EECS Department York website: http://www.eecs.yorku.ca/course_archive/2011-12/F/4404/
[ 18 ] Gauri D. Kalyankar, Shivananda R. Poojara and Nagaraj V. Dharwadkar,” Predictive Analysis of Diabetic Patient Data Using Machine Learning and Hadoop”, International Conference On I-SMAC,978-1-5090-3243-3,2017.
[ 19 ] Georgieva, P. and Rocha, E. (n.d.). Machine learning in wine classification.
[ 20 ] Gupta, M. and Vanmathi, C. (n.d.). A study and analysis of machine learning techniques in predicting wine quality, International Journal of Recent Technology and Engineering.
[ 21 ] Humar Kahramanli and Novruz Allahverdi,” Design of a Hybrid System for the Diabetes and Heart Disease”, Expert Systems with Applications: An International Journal, Volume 35 Issue 1-2, July, 2008.
[ 22 ] Nawaz, R., Thompson, P., & Ananiadou, “Identification of Manner in BioEvents”. In LREC (pp. 3505-3510). May, 2011
[ 23 ] Nithya and Dr. V. Ilango,” Predictive Analytics in Health Care Using Machine Learning Tools and Techniques”, International Conference on Intelligent Computing and Control Systems, 978-1-5386-2745-7,2017.
[ 24 ] Patil, R.C. Joshi and Durga Toshniwal,” Association Rule for Classification of Type-2 Diabetic Patients”, ICMLC '10 Proceedings of the 2010 Second International Conference on Machine Learning and Computing.
[ 25 ] Priyanka Rajendra, Shahram Latifi, “Prediction of diabetes using logistic regression and ensemble techniques”, Department of Electrical and Computer Engineering, UNLV, Las Vegas, Nevada, United States, October 2021: https://doi.org/10.1016/j.cmpbup.2021.100032
[ 26 ] Rajesh and V. Sangeetha, “Application of Data Mining Methods and Techniques for Diabetes Diagnosis”, International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 3, September 2012.
[ 27 ] Saravana kumar N M, Eswari T, Sampath P and Lavanya S,” Predictive Methodology for Diabetic Data Analysis in Big Data”, 2nd International Symposium on Big Data and Cloud Computing,2015
[ 28 ] Siddhartha Bhattacharyya , Koyel Chakraborty , Surbhi Bhatia , Jan Platos, Rajib Bag, Aboul Ella Hassanien “Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers-, A study to show how popularity is affecting accuracy in social media”, Applied Soft Computing, Volume 97, Part A, December 2020, 106754
[ 29 ] Somayeh Najafi-Ghobadi, Hosein Tarhsaz, Sharareh Parami, and Leili Tapak, “Machine Learning-based Classifiers for the Prediction of Low Birth Weight”, Published online 2023 Jan 31. doi: 10.4258/hir.2023.29.1.54
[ 30 ] Taiwo, O. A. (2010). Types of Machine Learning Algorithms, New Advances in Machine Learning, Yagang Zhang (Ed.), ISBN: 978-953-307-034-6, InTech, University of Portsmouth United Kingdom. Pp 3 – 31.
[ 31 ] Ruiz-Alejos , and B. E. Reddy, “Semi-supervised single-link clustering method,” in Computational Intelligence and Computing Research (ICCIC), 2016 IEEE International Conference on. IEEE, 2016, pp. 1–5. https://doi.org/10.1109/ICCIC.2016.7919689.
[ 32 ] Mani Butwall and Shraddha Kumar,” A Data Mining Approach for the Diagnosis of Diabetes Mellitus using Random Forest Classifier”, International Journal of Computer Applications, Volume 120 - Number 8,2015.
[ 33 ] Clement Odooh , Regina Robert, Efijemue Oghenekome Paul " A Review Of Data Intelligence Applications Within HealthCare Sector In The United States" , International Journal on Soft Computing (IJSC),Vol.14,No.4,2023. DOI: 10.5121/ijsc.2023.14401