Int J Performability Eng ›› 2021, Vol. 17 ›› Issue (3): 263-275.doi: 10.23940/ijpe.21.03.p2.263275

• Original article • Previous Articles     Next Articles

Arrhythmia Classification Algorithm based on SMOTE and Feature Selection

Tianhao Wanga,b,  Peng Chenc,  Tianjiazhi Baoc,  Jiaheng Lid,  and Xiaosheng Yuc,*()   

  1. a Big Data Research Center of China Three Gorges University, Yichang, 443002, China
    b Yichang Big Data Center, Yichang, 443000, China
    c College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, China
    d Three Gorges Navigation Authority, Yichang, 443000, China
  • Contact: Yu Xiaosheng E-mail:yuxiaosheng@ctgu.edu.cn
  • Supported by:
    the National Key Research and Development Program of China(2016YFC0802500);NSFC-Xinjiang Joint Fund(U1703261)

Abstract:

An arrhythmia is commonly deemed as a life-threatening disease. It is better to detect symptoms of arrhythmia earlier, as this can be more beneficial for relevant treatment. Presently, classification research on arrhythmias by machine learning is mainly dependent on data extracted from ECGs. However, some defects can still be found in arrhythmia data, such as class imbalance, strong correlation among features and high dimensions. All these defects have the potential to incur classification inaccuracy. In an attempt to solve the above problems, an arrhythmia classification algorithm is proposed here based on SMOTE and feature selection. Firstly, dataset oversampling was performed by SMOTE to erase class imbalance; then, K-part Lasso was utilized to select the existing redundant features; finally, recursive feature elimination (RFE) and random forest (RF) are combined together to form a feature selection method, RF-RFE, for the purpose of selecting optimal features. In this way, feature sub-sets were acquired and further adopted to carry out classified evaluations and comparisons of four classification algorithms. It has been demonstrated by UCI arrhythmia datasets-based experiments that 89 of the 279 features in the raw data are selected by the proposed arrhythmia classification algorithm. Such selected features that serve as the optimal feature sub-set are used for classification. Moreover, the accuracy of the RF classification reaches 98.68%.

Key words: arrhythmia, class imbalance, SMOTE, Lasso regression, random forest