Int J Performability Eng ›› 2021, Vol. 17 ›› Issue (11): 926-937.

### A Highly Robust Heterogenous Deep Ensemble Assisted Multi-Feature Learning Model for Diabetic Mellitus Prediction

Sandeep Honnurappa and Bevoor Krishnappa Raghavendra

1. BGS Institute of Technology, Mandya, Visvesvaraya Technological University, Belagavi, 571448, India
• Submitted on  ;  Revised on  ; Accepted on
• Contact: *E-mail address: sandeep.h.edu@gmail.com

Abstract: In the present work we propose a novel heterogeneous deep ensemble based multi-feature learning environment for diabetic mellitus prediction. The overall proposed model was designed in such manner that it addresses the key at hand problems like data or the class imbalance, low accuracy and lack of consensus. To achieve it, a multi-level enhancement approach where to address the problem of class-imbalance was performed, data sampling with 95% of confidence interval is performed. Different sampling approaches were applied such as random-sampling, down-sampling and synthetic minority oversampling technique (SMOTE). Once sample data is retrieved, we performed feature selection using different algorithms like Wilcoxon Significant Test, also called significant predictor test (SPR), Univariate Logistic Regression based feature selection (ULOGR), Cross-Correlation Analysis (CRA), Principle Component Analysis (PCA), Gini Score based significant feature selection (GSFR) and Information Gain based features (IGFR). The key purpose of applying different feature selection methods was to retain most suitable features for high accuracy with low computation. In the subsequent phase, we designed a first-of-its kind heterogenous deep ensemble model using Decision Tree (DT), Artificial Neural Network (ANN) with Radial Basis Function (RBF) and Levenberg Marquardt (LM) learning methods, Probabilistic Neural Network (PNN) and Support Vector Machine (SVM) algorithms as the base classifier. For the ensemble decision, Maximum Voting Ensemble (MVE) and Best Trained Ensemble (BTE) were applied for two-class classification, which predicts each sample of the Pima Indian dataset as diabetic or non-diabetic. The simulation-based performance comparison in terms of accuracy (91.56%), F-measure (0.91) and AUC (0.91) confirmed superiority of the proposed system over major existing approaches.