Int J Performability Eng ›› 2017, Vol. 13 ›› Issue (4): 446-457.doi: 10.23940/ijpe.17.04.p12.446457

• Original articles • Previous Articles     Next Articles

Cervical Cancer Diagnosis based on Random Forest

Guanglu Suna, b, Shaobo Lia, Yanzhen Caoa, and Fei Langb   

  1. aSchool of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
    bResearch Center of Information Security & Intelligent Technology, Harbin University of Science and Technology, Harbin, 150080, China


Cervical cancer, with an annually increasing incidence rate, is becoming the leading cause of death among women in China. However, studies have shown that the early detection and accurate diagnosis of cervical cancer contribute to the long survival of cervical cancer patients. The machine learning method is a good substitute for manual diagnosis in the analysis of Pap smear cervical cell images, reflecting its effective and accurate classification. In the present study, a framework for cervical cancer diagnosis is presented based on a random forest (RF) classifier with ReliefF feature selection. Using preprocessing, segmentation, and feature extraction, 20 features were extracted. In the feature selection phase, 20 features were ranked according to weight using ReliefF. In the classification phase, the RF method was used as a classifier, and different dimensions of features were selected to train the classifier. To examine the efficacy of the proposed method, the Herlev data set collected at Herlev University Hospital was used, in which 917 Pap smear images were categorized into two classes: normal and abnormal. After a 10-fold cross validation, the experimental results showed that the best classification performance was obtained with the top 13 features based on the RF classifier, which were better than Naive Bayes, C4.5, and Logistic Regression. The accuracy was 94.44%, and the AUC value was 0.9804. The results also confirmed the effectiveness of cytoplasm features in the classification.

Submitted on January 29, 2017; Revised on April 12, 2017; Accepted on June 23, 2017
References: 47