Arrhythmia Classification Algorithm based on SMOTE and Feature Selection

Where, rand(0,1) refers to a random number in an interval [0,1].

Figure 1.

Figure 1. Schematic diagram of class imbalance influence

2.2. Lasso Regression

Lasso regression was raised by Tibshirani 16 in 1996. As a compression estimate, it is commonly used to deal with problems of high-dimensional feature selection 17. Moreover, Lasso regression is also a modified model of ordinary linear regression. By virtue of a L1-norm penalty coefficient, a constraint condition is proposed for regression coefficients. The basic model can be expressed in Equation (2) below:

(2)

Where y_irefers to a response variable x_i = (x_i1,x_i2,…,x_ik) to a k-dimensional independent variable, and β _j to a regression coefficient corresponds to the j_th independent variable. In this equation, s is a constraint value for the sum of regression coefficients and s≥0. In this case, regression coefficients of some variables with a strong correlation and insignificant influence on response variables can be reduced to 0 by lowering the value of s. Once the regression coefficients that correspond to a variable drop to 0, such a variable should be deleted.

In terms of Lasso regression, there now exist three methods for determining values of s, including cross validation, extended cross validation and Mallows’s Cp 18. In this study, the value of s is confirmed by means of cross validation. Basic thoughts of cross validation are the same as those of extended cross validation. To be specific, a sufficiently high k value is firstly determined; then, values of 0, k/n, 2k/n and 3k/n, etc. are taken for s; afterwards, an expected error of s set at a certain value is determined by cross validation; finally, s with the minimum expected error can be found through comparison 19.

2.3. Random Forest

Random forest was put forward by Breiman 20 in 2001. As a classification algorithm of ensemble learning, its essence is to assemble multiple decision trees denoted by {h(X,θ_k),k=1,…}, where {θ_k} represents independent identically distributed random variables and k is the number of decision trees. Given that an independent variable X has been set, a voting algorithm is utilized to perform fusion of classification results generated by each decision tree; in this way, the optimal classification result can be achieved. Regarding specific implementation procedures of random forest, they are described below:

1) In a raw dataset, K sets of samples are taken with replacement by means of Bootstrap sampling; in this scenario, the sample capacity is identical to the raw data capacity. Samples that are not taken are written as out-of-bag data. Subsequently, K decision tree models are constructed based on such K sets of samples.

2) The raw data are assumed to contain N features. For each decision tree, m (m≤N) are extracted from such N features; then, optimal features are selected by means of information entropy to fulfill node splitting and classification.

3) Regarding new data classification results produced by all decision trees, fusion is carried out by a voting algorithm for the purpose of obtaining an optimal classification result.

Thanks to its capability of feature importance measurement, random forest turns into an important feature selection approach 21. Moreover, feature importance in classification is measured by three specific methods of random forest, including permutation importance, Gini importance and z-score. Among them, permutation importance is the most widely applied feature importance measurement method. Its basic principle can be detailed as follows. When a feature X_j is used to predict Y, the corresponding prediction error may increase after the value of X_j changes randomly. This signifies that X_j is a critical feature for Y prediction. It is assumed that Bootstrap sampling has been carried out. The samples are denoted as s=1, 2, 3…, ntree, and the corresponding out-of-bag data as ; in these conditions, permutation importance metrics I(X_j) can be worked out in line with the following procedures when random forest is adopted 22:

1) Value of s is set at 1, that is s=1, and a decision tree T_s is constructed by using samples.

2) The decision tree T_s is utilized to classify , and the number of samples correctly classified is noted down and denoted by .

3) Perturbation of feature X_j in the out-of-bag data is conducted, taking down the processed data ; and, decision tree T_s is adopted to classify the processed data; in addition, the number of samples correctly classified is noted down and denoted by .

4) Steps (1)-(3) are implemented for s equal to 2, 3,..., n_tree one by one.

5) Permutation importance metrics I(X_j) of the feature X_j can be expressed in Equation (3):

(3)

3. An Arrhythmia Classification Algorithm based on SMOTE and Feature Selection

Arrhythmia data usually have some defects of high dimensions, class imbalance and strong correlations of features. In terms of conventional arrhythmia classification algorithms, great improvements are made specifically to feature selection and classification ways; however, there have not been any investigations over problems associated with arrhythmia data. In this paper, an arrhythmia classification algorithm is raised based on SMOTE and feature selection with the goal of settling the above defects. Specifically, UCI arrhythmia datasets are selected as the research object; and, filling of data with missing values is implemented, which is followed by data oversampling by means of SMOTE. In this way, the problem of class imbalance in the dataset can be effectively improved. As far as feature selection is concerned, the selection effect of features affecting classification accuracy is not satisfied due to limitations of Lasso regression, although redundant features can be rapidly screened out. Considering this, K-part Lasso feature selection is combined with RF-RFE to select optimal features. Furthermore, efficiency of RF-RFE is also substantially bettered as massive redundant features have been removed during feature selection by means of Lasso regression. With the help of such two feature selection approaches, 89 of 279 features were applied in random forest; and, the obtained results were compared with those generated by three classifiers (logistic regression, KNN and SVM). As for procedures of data pre-processing and feature selection, they are presented in Figure 2 below.

Figure 2.

Figure 2. Flow chart of data pre-processing and feature selection

3.1. Feature Selection based on K-part Lasso

As L1-norm penalty coefficients are used for Lasso regression, such a method is capable of making regression coefficients of correlated features and features with insignificant influence reduce to 0; then, features with such coefficients equal to 0 are deleted. In this way, feature selection is completed. However, high dimensionality of data may lead to high computation overhead during feature selection by means of Lasso regression, and a problem of overfitting may also occur due to a small sample size.

On the basis of the work [23], a feature selection method based on K-part Lasso is applied in arrhythmia classification. As for concrete a flow chart, it is given in Figure 3. Corresponding fundamental principles of the proposed approach can be described as follows. The feature set is uniformly and randomly divided into K sub-sets; after that, feature selection is conducted for each feature subset by virtue of Lasso and feature subsets selected from K subsets are combined together; then, the combined feature subset is subjected to Lasso feature selection again; finally, the optimal feature subset is acquired. Without a doubt, feature selection by K-part Lasso not only effectively lowers computation overhead incurred by arrhythmia data feature selection, but also reduces the risk of overfitting.

3.2 Feature Selection based on Recursive Feature Elimination and Random Forest

3.2.1 Recursive and Non-Recursive Feature Elimination

Presently, most feature selection approaches combined with random forest are based on non-recursion. In this aspect, representative methods include backward elimination based on importance measurement 24 and backward elimination based on both importance measurement and error rate 25. At the time of dealing with data with weakly correlated features, non-recursive feature selection based on random forest shows rather high efficiency and accuracy. However, a large error can be incurred if such a method is utilized to process data with strongly correlated features. In terms of arrhythmia datasets, their features are frequently of strong correlation. Here, correlations of features have been weakened by means of Lasso regression-based feature selection. Due to limitations of Lasso regression, it cannot be guaranteed that all correlations of features are entirely eliminated. For this reason, a big deviation may be caused by random forest based non-recursive feature selection once such a method is adopted to select features in arrhythmia data. For the purpose of comparing differences between recursive and non-recursive feature elimination, specific procedures of both algorithms are respectively presented below, as is their performance in a simple feature selection case.

Figure 3.

Figure 3. Feature selection based on K-part Lasso

With regard to non-recursive feature elimination, it consists of four major steps:

1)Calculating importance of all features;

2)Training the classifier;

3)Eliminating features with the minimum importance;

4)Do not stop repeating steps 2 and 3 until the accuracy of the classifier reaches the required value.

In terms of recursive feature selection, its major difference from non-recursive feature selection lies in that: each time a classifier is trained, the former needs to measure the importance of features selected. The corresponding procedures are listed as follows.

1)Training the classifier;

2)Calculating importance of all features;

3)Eliminating features with the minimum importance;

4)Do not stop repeating steps 1-3 until all features are eliminated; after that, an optimal feature subset is selected to make sure that accuracy of the classifier can satisfy the requirement.

A simple case is given in the work of Gregorutti [21], proving advantages of recursive feature elimination under the circumstance that relevant features are correlated. Firstly, two correlated features (i.e., V1 & V2) with a category label correlation coefficient of 0.7 are presented; and, the correlation coefficient between such two features is equal to 0.9. Subsequently, four independent features (i.e., V3-V6) are defined and their category label correlation coefficient is 0.6. At last, three uncorrelated features (i.e., V7-V9) are incorporated to ensure experimental validity. When the algorithm starts to work, that is, before elimination of any feature, the importance of all features is shown in Figure 4(a). In this context, feature importance of V1 and V2, which are highly correlated to the category label, is below that of independent features V3-V6. In Figure 4(b), feature importance after two uncorrelated features are deleted is presented. After V2 is also eliminated in line with a principle of "with features of the least importance eliminated", feature importance of 5 remaining features is illustrated in Figure 4(c). As can be observed from this figure, such 5 features have no correlations; and, the importance of V1 is above that of the other four independent features. On this basis, V1 is defined as the most important feature by the recursive feature elimination algorithm. However, if non-recursive feature selection is performed, the most important feature V1 may be eliminated before the other four independent features V3-V6 are. This may place a great influence on feature selection effects.

(a) (b) (c)

Figure 4.

Download| PPT slide

Figure 4. Flow chart for selection of nine features based on recursive feature elimination

3.2.2. Recursive Feature Elimination and Random Forest

As a feature selection method, recursive feature elimination (RFE) is raised on the basis of support vector machine (SVM) 26. By virtue of SVM, weight at various dimensions of a corresponding hyperplane can be deemed as metrics of feature importance. According to values of these weights, relevant features can be ranked. Afterwards, classification accuracy is compared by means of repeatedly eliminating features of the minimum weight. In this way, feature selection is completed. This is a method named support vector machine-recursive feature elimination, that is SVM-RFE for short 27. In conformity with random forest, out-of-bag data of each decision tree are adopted to work out permutation importance of features corresponding to this decision tree 28. Subsequently, permutation importance of each feature can be obtained through calculations based on an integration method. By integrating massive results of a weak classifier (i.e., a decision tree), such a method is able to reduce deviations caused by a single classifier because of samples. Considering that random forest can also perform feature importance measurement with the help of permutation importance, recursive feature elimination is also applicable to it. The basic procedures are described as follows:

1)Training the random forest;

2)Calculating permutation importance of each feature;

3)Eliminating a feature with the least permutation importance;

4)Do not stop repeating steps (1)-(3) until all features are eliminated;

5)Selecting a feature that can produce the best random forest-based classification effects.

3.3. Classifier Selection

In order to make sure that feature subsets selected during feature selection is unbiased to classifiers, four classifiers (i.e., KNN, SVM, logistic regression and random forest) are adopted here to compare arrhythmia classification.

KNN is a common classification algorithm dependent on a thought of neighbors. Its fundamental principle is to classify samples according to features of several similar neighbors. In terms of SVM, one of the commonly used traditional classification algorithms, it fulfills sample segmentation on a hyperplane so as to classify the samples. Traditionally, logistic regression is specific to binary classification problems. In this consideration, it is inapplicable to arrhythmia classification. However, logistic regression can be optimized into multi-class classifier by softmax, enabling it to be also suitable for classification of arrhythmia data. As for random forest, it has been concretely introduced in Part 2.3. As far as processing of raw arrhythmia data is concerned, performance of such four classifiers is listed in Table 1. Clearly, the accuracy of random forest reaches 73.89% and its mean absolute error (MAE) turns out to be 1.8. Regarding the other three classifiers, their accuracy is 66.59%, 60.00% and 61.88% respectively. Without a doubt, random forest as a classifier based on the integration method outperforms the other three mainstream classifiers in arrhythmia classification. For this reason, random forest is selected as the principal classifier of arrhythmia classification in this research.

Table 1. Metrics comparison for classifiers

Performance Metrics	Classification algorithm
Performance Metrics	SVM	KNN	Logistic	RF
Accuracy（%）	66.59	60.00	61.88	73.89
MAE	2.19	2.62	2.52	1.80

4. Experimental analysis

4.1. Datasets and Pre-Processing

4.1.1. Introduction to Datasets

An arrhythmia dataset in this paper is selected from the UCI database. Such a dataset contains 452 sample data that are divided into 16 classes. Among these classes, Class 01 is a normal class, while another 15 classes represent arrhythmia of diverse severity. Moreover, sample sizes of classes 11, 12 and 13 are all equal to 0, as given in Table 2. In addition, some classes stand for arrhythmias that are rarely seen, which further leads to a small sample size. This is a major reason why arrhythmia data becomes imbalanced. The arrhythmia dataset consists of 279 features. The first four features are standard attributes of age, sex, height and weight; and, the remaining attributes are extracted from data recorded in a standard 12-lead recorder, including QRS duration, P, Q and T waves. Additionally, not only do arrhythmia data show a severe defect of class imbalance and also contain lots of features, but features are strongly correlated due to the data themselves. Under such circumstances, data processing by virtue of a conventional feature selection approach and classification algorithms may lead to rather large errors.

Table 2. Arrhythmia datasets

Class No.	Arrhythmia types	Sample size
01	Normal	245
02	Ischemic changes	44
03	Old Anterior Myocardial Infarction	15
04	Old Inferior Myocardial Infarction	15
05	Sinus tachycardy	13
06	Sinus bradycardy	25
07	Ventricular Premature Contraction	3
08	Supraventricular Premature Contraction	2
09	Left bundle branch block	9
10	Right bundle branch block	50
11	1. degree AtrioVentricular block	0
12	2. degree AV block	0
13	3. degree AV block	0
14	Left ventricule hypertrophy	4
15	Atrial Fibrillation or Flutter	5
16	Others	22

4.1.2. Data Pre-Processing

For missing values in the dataset, they are processed by combining linear regression prediction and mean value filling. Due to multiple features in the dataset, samples that are missed can be predicted by means of linear regression. Nevertheless, the actual value cannot be accurately predicted based on the corresponding predicted value because of a small training sample size. Therefore, the predicted value obtained by linear regression should be replaced with a mean value when such a predicted value goes beyond an interval formed by the minimum and the maximum values of the existing data in the sample.

Here, the SMOTE algorithm is utilized to carry out oversampling of the minority class. As can be observed from Table 2, the number of arrhythmia samples is extremely small in Classes 07, 08, 14 and 15. Due to limitations of such a small sample size, the number of neighbors can be only set at 1 in the event that oversampling of the entire sample dataset is directly completed by SMOTE. Apparently, such a method of configuring the neighbor count fails to be sound. It may place influence on minority classes with a large sample size. In this context, a segmentation based SMOTE method is adopted in this paper to take out data in Classes 07, 08, 14 and 15; then, oversampling is conducted for these four classes as well as the remaining classes. As a result, a SMOTE defect of data fitting inaccuracy caused by a comparatively small number of neighbors selected can be avoided.

As arrhythmia data contains massive dimensional data, the min-max normalization strategy is selected here to perform data pre-processing, as expressed in Equation (4). Hopefully, influence of dimensions on classification accuracy can be thus alleviated.

(4)

Where x* refers to the normalized data and x to raw data; in addition, min and max respectively represent the minimum and the maximum values of the raw data.

4.2. Experimental results

4.2.1. Experimental indexes

To verify the feasibility and effects of the proposed algorithm, it is evaluated according to three metrics: accuracy, Kappa coefficients and MAE. As for accuracy, it can be defined by the following Equation (5):

(5)

Where TP (i.e., true positive) refers to the number of patient samples correctly predicted, TN (i.e., true negative) to that of healthy population samples correctly predicted, FP (i.e., false positive) to that of healthy population samples that have been erroneously predicted as samples of patients, and FN (i.e., false negative) to that of patient samples erroneously predicted as samples of the healthy population. In terms of the Kappa coefficient, it can be used for both consistency and classification accuracy measurement. MAE, that is mean absolute error, is an expression of the mean value of absolute errors. It has the potential to preferably reflect an error between the predicted and the true values, as written in Equation (6) below:

(6)

Where m is the total sample size; z_i and y_i respectively stand for the predicted and the true classes of samples.

4.2.2. Algorithm Effects

Both Lasso and RF-RFE, as two feature selection approaches, are adopted here to select 89 from 279 features of the arrhythmia data. The selected 89 features can be utilized to achieve the highest classification accuracy. For the purpose of ensuring both validity and accuracy of the experiment, it is performed in strict accordance with 10-fold cross validation. As can be seen from Table 3, performance of four classification algorithms are all improved after feature selection and class imbalance elimination from the raw data. To be specific, accuracy and MAE of random forest are up to 98.68% and 0.078 respectively, superior to those of the other three algorithms. Through comparisons among data presented in Tables 1 and 3, it is clear that the accuracy of random forest is elevated by about 25%, rising from 73.89% to 98.68%, after SMOTE and feature selection algorithms are applied.

With the goal of determining parameters used in the experiment, the number of decision trees in the random forest is set by a method proposed by Liaw and Wiener, 2002. In detail, a default value m_tree is selected first; then, classification effects generated by such a default value are compared with those produced based on values of m_tree/2 and 2*m_tree; finally, a value that can result in the optimal classification effect is adopted as a parameter setting of random forest in the experiment. In this paper, the default value denoted as m_tree is defined to be 500, that is, m_tree=500. In order to prevent the default value from being excessively large, control groups of m_tree equal to 25, 50 and 100 are also established for comparison. Corresponding experimental results are given in Figure 5. Clearly, the maximum accuracy of random forest is achieved in a condition of m_tree=500. For this reason, the number of decision trees, as an experiment parameter, is set at 500.

Table 3. Metrics comparison for the proposed algorithm

SMOTE and Feature Selection	Performance Metrics	Classification algorithm
SMOTE and Feature Selection	Performance Metrics	SVM	KNN	Logistic	RF
Yes	Accuracy (%)	96.79	94.06	94.54	98.68
	Cohen’s kappa	0.9653	0.9438	0.9408	0.9857
	MAE	0.22	0.31	0.44	0.078

Figure 5.

Figure 5. Classification accuracy of random forest in a condition of various parameters

SVM is a machine learning method of binary classification and its performance is only second to random forest. In addition, a multi-classification method of SVM is now implemented via one-vs-one (OVO) 30 or one-vs-rest (OVR) 31 ways. In this study, SVM is applied in arrhythmia multi-classification by OVO. Currently, there are three widely applied kernels (i.e., Poly, RBF and Linear) in SVM. Here, classification effects of SVM with such three kernels are compared and the corresponding comparison results are listed in Table 4. As can be observed from this figure, the accuracy of linear SVM reaches a peak of 96.79%; by contrast, the accuracy of RBF and poly SVM turn out to be only 78.94% and 63%. If a proper kernel is selected, performance of SVM with this kernel may be substantially improved.

Table 4. Accuracy comparison for SVM with diverse kernels

Kernel	Accuracy (%)
Poly	63.44
Linear	96.79
RBF	78.94

Accuracy of logistic regression and KNN algorithms are 94.54% and 94.06% respectively. In addition, the number of neighbors in KNN also has a significant influence on classification performance of this algorithm. In this paper, classification accuracies of KNN are presented in the case of K=5, 10, 15, 20 or 25, as shown in Figure 6. Through comparison of these accuracies in a condition of different neighbor counts, k is designed to be 5 here.

Figure 6.

Figure 6. Accuracy of KNN in a condition of different neighbor counts

In order to prove the influence of class imbalance on classification algorithms, confusion matrixes of random forest are compared before and after application of the proposed algorithm. According to Figure 7, where all data are normalized and ratios below 0.01 are not displayed, horizontal and vertical axes represent 13 classes of the arrhythmia data. It can be clearly observed from Figure 7(a) that most samples are categorized into Class 01 (the normal class) when the problem of class imbalance is not processed. This means that abnormal samples are erroneously classified into normal samples. As proven in Literature [12], the cost of such incorrect classification is huge. In Figure 7(b), class imbalance adjustment and feature selection result in both a great accuracy increase on the whole and on the classification improvement of each class.

(a) (b)

Figure 7.

Download| PPT slide

Figure 7. Confusion matrix comparison of random forest before and after adoption of SMOTE/feature selection

4.2.3. Effect Comparison

According to two traditional feature selection algorithms, features are selected by principal component analysis (PCA) and mutual information (MI) 29. The number of these features as well as their classification accuracy in SVM and random forest are all listed in Table 5 below. In terms of PCA, 80 are selected from 279 features and are used for classification, generating an accuracy of 70.89% in SVM. Once mutual information is combined with random forest, 65 features are selected from them and the corresponding accuracy obtained reaches 77.02%. By contrast to the above two traditional feature selection methods, accuracy of the proposed algorithm proposed respectively increases by 27.79% and 21.66%.

Table 5. Comparison of the proposed algorithm and traditional feature selection approaches in accuracy and number of features

Algorithm	Features	Accuracy (%)
PCA+SVM	80	70.89
MI+RF	65	77.02
Proposed	89	98.68

Table 6. Accuracy comparison between the proposed algorithm and other algorithms under study

Algorithm	Train-test split	Number of selected Features	Accuracy (%)
Proposed	10-fold CV	89	98.68
MLP NN	3-fold CV		88.24
Filter+Wrapper+Knn	20-fold CV	60	73.80
Filter+Wrapper+SVM	20-fold CV	148	68.80
RF+SVM	90-10	94	92.07
GA+C4.5	75-25	—	78.76

With the goal of verifying classification effects of the proposed algorithm, its results are compared with those obtained by arrhythmia classification algorithms that select the UCI dataset. As revealed in Table 6, the arrhythmia classification algorithm combined with MLPNN achieves an accuracy of 88.24% in Literature [4]. In Literature [7], an arrhythmia classification algorithm is put forward in conjunction with filter and wrapper methods of feature selection. Firstly, the modified F-score method is used to filter out useless features; then, forward sequence search is conducted to find out an optimal feature sub-set for KNN (accuracy: 73.80%) and SVM (accuracy: 68.80%) based classification. An arrhythmia classification algorithm combined with the wrapper method and SVM is raised by Mustaqeem et al. [9]. To be specific, features are firstly selected by random forest and z-score, which is followed by SVM based classification. In this way, its accuracy reaches 92.07%. In opinions of Ayar, arrhythmia feature selection and classification can be fulfilled by combining GA and C4.5 together 32. Although arrhythmias are a multi-class problem, it only attaches great importance to two indexes. One is being attacked by a disease and the other is being healthy. Therefore, two models are established in this paper, that is, a binary classification model and a multi-class model. While the former involves all cases of being attacked by a disease and being healthy, generating an accuracy of 86.96%, the latter covers all arrhythmia cases and has an accuracy of 78.76%. Compared with the method proposed in Literature [9] (accuracy: 92.07%) and other SVM-based methods put forward in Literature [7], accuracy of the proposed algorithm here is respectively elevated by 6.61% and 29.88%.

4.2.4. Ablation Experiments

We performed ablation experiments on the algorithm to further check the efficacy of our proposed algorithm and divided the experiments into four groups: ( 1) the original data group (data has not been processed), (2) SMOTE+Lasso, (3) SMOTE+RF-RFE, and (4) our proposed algorithm. To ensure the experiment's fairness, both experiments use RF as the classification algorithm while the experiments use 10-fold cross validation. As revealed in Table 7, compared with the other three methods, our proposed algorithm uses the least number of features (83), obtains the highest classification accuracy (98.68%) and specificity (99.48%), and also retains a high degree of sensitivity.

Table 7. Accuracy comparison between the proposed algorithm

Algorithm	Features	Accuracy (%)	Sensitivity (%)	Specificity (%)
Origin	279	72.52	36.52	89.64
smote+RF-RFE	122	98.59	98.46	99.48
smote+Lasso	102	98.53	98.68	99.23
Proposed	89	98.68	98.61	99.48

As shown in Table 8, the time spent between our proposed algorithm and traditional RF-RFE in the selection stage of the function is given. The amount of time spent on our proposed algorithm is around 1/5 of the conventional methods. This also shows that the approach of using the combination of the two feature selection methods will significantly reduce the time for conventional wrapper RF-RFE to pick the best combination of features under the assumption of maintaining the accuracy of the classification. To summarize, our proposed approach not only guarantees classification accuracy, but also selects the best features.

Table 8. Time spent in feature selection phase

Algorithm	Time (s)
smote+RF-RFE	123
Proposed	24.19

5. Conclusion

In order to settle problems of class imbalance and excessively high dimensions that exist in arrhythmia classification, an arrhythmia classification algorithm based on SMOTE and feature selection is raised in this paper. Influence of class imbalance on classification is clearly manifested in Figure 7. In a context of class imbalance, a large number of abnormal samples are erroneously classified into the normal class. Subsequent to SMOTE-based oversampling and feature selection, classification accuracy of each class improves wholly. Additionally, due to strong correlations of features in arrhythmia data, it is less likely for a conventional feature selection approach to accurately and rapidly select an optimal feature. For this consideration, two feature selection methods Lasso and RF-RFE are adopted for this research because they can be preferably adaptable to features with strong correlations. In the end, classification is carried out by random forest. 89 of the 279 features in the arrhythmia data are selected and used for classification by the proposed algorithm. The classification accuracy of random forest is up to 98.68%.

In the future, research will still focus on resolving problems of class imbalance associated with arrhythmia data. For this purpose, cost-sensitive learning and the modified SMOTE methods may be adopted. If conditions are proper, more real data and life-related features (e.g., living areas and dietary habits) will be incorporated and used for empirical research. Hopefully, both efficiency and accuracy of classification can be further improved.

Reference

By original order

By published year

By cited within times

By Impact factor

[1].

Yang

B.F.and

Benzhi

Advances in the study of arrhythmogenic mechanisms

Journal of International Pharmaceutical Research, 37(2), pp.81-8, 2010.

[Cited within: 2]

[2].

Deng

and Fu

Heartbeat-based end-to-end classification of arrhythmias

Journal of Southern Medical University, 39(9), pp. 1071-1077, 2019.

[3].

Guvenir

H.A.

, Acar

, Demiroz

and Cekin

A supervised machine learning algorithm for arrhythmia analysis. In Computers in Cardiology 1997, pp.433-436, 1997.

[4].

Raut

R.D.and

Dudul

S.V.

Arrhythmias classification with MLP neural network and statistical analysis. In 2008 First International Conference on Emerging Trends in Engineering and Technology, pp.553-558, 2008.

[5].

Zhao

, Hong

and Sun

Cardiac arrhythmia classification based on multi-features and support vector machines

Journal of Biomedical Engineering, 28(2), pp.292-295, 2011.

[6].

Mitra

and Samanta

R.K.

Cardiac arrhythmia classification using neural networks with selected features

Procedia Technology, 10, pp.76-84, 2013.

[7].

Niazi

K.A.K.

, Khan

S.A.

, Shaukat

and Akhtar

Identifying best feature subset for cardiac arrhythmia classification. In 2015 Science and Information Conference ( SAI), pp.494-499, 2015.

[Cited within: 3]

[8].

Mustaqeem

, Anwar

S.M.

, Majid

and Khan

A.R.

Wrapper method for feature selection to classify cardiac arrhythmia

In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp.3656-3659, 2017.

[9].

Mustaqeem

, Anwar

S.M.and

Majid

Multiclass classification of cardiac arrhythmia using improved feature selection and SVM invariants

Computational and mathematical methods in medicine, 2018.

[Cited within: 2]

[10].

Yang

, Xiao

, and Cai

Multicollinearity and its treatment in multivariate analysis. Chinese Journal of Health Statistics, 29(4), pp.620-624, 2012.

[11].

Zhao

, Zhang

X.F.and

ZHANG

L.J.

Overview of imbalanced data classification

Computer Science, 45(S1), pp.22-27, 2018.

[12].

Chawla

N.V.

, Bowyer

K.W.

, Hall

L.O.and

Kegelmeyer

W.P.

SMOTE: synthetic minority over-sampling technique

Journal of artificial intelligence research, 16, pp.321-357, 2002.

[13].

Y.X.

, Chai

, Hu

Y.Q.and

YIN

H.P.

Review of imbalanced data classification methods

Control and Decision, 34(4), pp.673-688, 2019.

[14].

Chawla

N.V.

, Japkowicz

and Kotcz

Special issue on learning from imbalanced data sets

ACM SIGKDD explorations newsletter, 6(1), pp.1-6, 2004.

[15].

Shi

H.H.

, Chen

Y.W.and

Chen

Summary of research on SMOTE oversampling and its improved algorithms

CAAI transactions on intelligent systems, 14, pp. 1073-1083, 2019.

[16].

Tibshirani

Regression shrinkage and selection via the lasso

Journal of the Royal Statistical Society: Series B (Methodological), 58(1), pp.267-288, 1996.

[17].

, Du

, Nie

, Xiong

, Huang

and Li

Summary of feature selection methods

Computer Engineering and Applications, 55(24), pp.10-19, 2019.

[18].

Mallows

C.L.

Some comments on Cp.

Technometrics, 42(1), pp.87-94, 2000.

[19].

Yao

and Cai

Study on wine evaluation based on LASSO regression

Journal of Yunnan Agricultural University, 31(2), pp.294-302, 2016.

[20].

Breiman

Random forests

Machine learning, 45(1), pp.5-32, 2001.

[21].

Gregorutti

, Michel

and Saint-Pierre

Correlation and variable importance in random forests

Statistics and Computing, 27(3), pp.659-678, 2017.

[22].

Yao

, Yang

, and Zhan

Feature selection algorithm based on random forest

Journal of Jilin University (Engineering and Technology Edition), 44(1), pp.137-141, 2014.

[23].

Shi

, Hu

and Yu

K-part Lasso based on feature selection algorithm for high-dimensional data

Computer Engineering and Applications, 48(1), pp.157-161, 2012.