Int J Performability Eng ›› 2021, Vol. 17 ›› Issue (8): 733-740.doi: 10.23940/ijpe.21.08.p9.733740

Previous Articles    

K-means Under-Sampling for Hypertension Prediction using NHANES Dataset

Kajal Dwivedi, Ramanathan Lakshmanan, and Rajeshkannan Regunathan   

  1. School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, 632014, India
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: kajal.dwivedi2019@vitstudent.ac.in

Abstract: Supervised machine learning algorithms are extensively used in various sectors to extract useful patterns from a large dataset. In real-life applications such as in the medical dataset, it is common to have less instances of a positive class compared to negative classes. This leads to the biased performance of machine learning algorithms towards the negative class and misclassifies positive class instances. However, this problem becomes worse with class-overlap. In this paper, a novel framework K-means under-sampling (KUS) is proposed to solve the class imbalance and class overlap problem together to improve the classifier's performance for the diagnosis of hypertension using the NHANES dataset. KUS improves the visibility of minority class instances to the classifiers. The results show that the KUS improved the performance of the classifiers and the results also show that KUS performed better as compared to other resampling algorithms.

Key words: hypertension, class-imbalance, class-overlap, K-means, Silhouette score