Cervical Cancer Diagnosis based on Random Forest

doi:10.23940/ijpe.17.04.p12.446457

Abstract

Abstract:

Cervical cancer, with an annually increasing incidence rate, is becoming the leading cause of death among women in China. However, studies have shown that the early detection and accurate diagnosis of cervical cancer contribute to the long survival of cervical cancer patients. The machine learning method is a good substitute for manual diagnosis in the analysis of Pap smear cervical cell images, reflecting its effective and accurate classification. In the present study, a framework for cervical cancer diagnosis is presented based on a random forest (RF) classifier with ReliefF feature selection. Using preprocessing, segmentation, and feature extraction, 20 features were extracted. In the feature selection phase, 20 features were ranked according to weight using ReliefF. In the classification phase, the RF method was used as a classifier, and different dimensions of features were selected to train the classifier. To examine the efficacy of the proposed method, the Herlev data set collected at Herlev University Hospital was used, in which 917 Pap smear images were categorized into two classes: normal and abnormal. After a 10-fold cross validation, the experimental results showed that the best classification performance was obtained with the top 13 features based on the RF classifier, which were better than Naive Bayes, C4.5, and Logistic Regression. The accuracy was 94.44%, and the AUC value was 0.9804. The results also confirmed the effectiveness of cytoplasm features in the classification.

Submitted on January 29, 2017; Revised on April 12, 2017; Accepted on June 23, 2017
References: 47

Guanglu Sun, Shaobo Li, Yanzhen Cao, and Fei Lang. Cervical Cancer Diagnosis based on Random Forest [J]. Int J Performability Eng, 2017, 13(4): 446-457.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References 0

	1. J. Albert, E. Aliu, H. Anderhub, P. Antoranz, A. Armada, M. Asensio, and J. Becker, “Implementation of the random forest method for the imaging atmospheric Cherenkov telescope MAGIC,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 588, no. 3, pp. 424-432, 2008
	2. A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145-1159, 1997
	3. C. Bergmeir, M. G. Silvente, and J. M. Benítez, “Segmentation of cervical cell nuclei in high-resolution microscopic images: A new algorithm and a web-based software framework,” Computer Methods & Programs in Biomedicine, vol. 107, no. 3, pp.497–512, 2012
	4. L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001
	5. M. P. Coleman, J. Esteve, P. Damiecki, A. Arslan, and H. Renard, “Trends in cancer incidence and mortality,” IARC scientific publications, 1992.
	6. P. S. Chandran, N. B. Byju, R. U. Deepak, R. R. Kumar, S. Sudhamony, P. Malm, and E. Bengtsson, “Cluster detection in cytology images using the cellgraph method,” In Information Technology in Medicine and Education (ITME), 2012 International Symposium on, vol. 2, pp. 923-927, August, 2012
	7. Y. F. Chen, P. C. Huang, K. C. Lin, H. H. Lin, L. E. Wang, C. C. Cheng, and J. Y. Chiang, “Semi-automatic segmentation and classification of pap smear cells,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 1, pp. 94-108, 2014
	8. L. Denny, M. Quinn, and R. Sankaranarayanan, “Screening for cervical cancer in developing countries,” Vaccine, 2006
	9. R. Díaz-Uriarte and S. A. De Andres, “Gene selection and classification of microarray data using random forest,” BMC bioinformatics, vol. 7, no. 1, pp. 1, 2006
	10. R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern classification,” New York: Wiley, vol. 2, 1973
	11. A. Gen?tav, S. Aksoy, and S. ?nder, “Unsupervised segmentation and classification of cervical cell images,” Pattern Recognition, vol. 45, no. 12, pp. 4151-4168, 2012
	12. R. T. Greenlee, T. Murray, S. Bolden, and P. A. Wingo, “Cancer statistics, 2000,” CA: a cancer journal for clinicians, vol. 50, no. 1, pp. 7-33, 2000
	13. D. W. Hosmer Jr and S. Lemeshow, “Applied logistic regression,” John Wiley & Sons, 2004
	14. G. Holmes, A. Donkin, and I. H. Witten, Holmes, “Weka: A machine learning workbench.” in Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on ,pp. 357-361, December, 1994
	15. N. M. Harandi, S. Sadri, N. A. Moghaddam, and R. Amirfattahi, “An automated method for segmentation of epithelial cervical cells in images of ThinPrep,” Journal of medical systems, vol. 34, no. 6, pp. 1043-1058, 2010
	16. R. Hummel, “Image enhancement by histogram transformation,” Computer graphics and image processing, vol. 6, no. 2, pp. 184-195, 1977
	17. T. K. Ho, “The random subspace method for constructing decision forests,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, no. 8, pp. 832-844, 1998
	18. T. hankong, N. Theera-Umpon, and S. Auephanwiriyakul, “Automatic cervical cell segmentation and classification in Pap smears,” Computer methods and programs in biomedicine, vol. 113, no. 2, pp. 539-556, 2014
	19. A. Jemal, M. M. Center, C. DeSantis, and E. M. Ward, “Global patterns of cancer incidence and mortality rates and trends,” Cancer Epidemiology Biomarkers & Prevention, vol. 19, no. 8, pp. 1893-1907, 2010
	20. D. Kong, C. Ding, H. Huang, and H. Zhao, “Multi-label relieff and f-statistic feature selections for image annotation,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2352-2359, IEEE, June, 2012
	21. K. K. Kandaswamy, G. Pugalenthi, M. K. Hazrati, K. U. Kalies and T. Martinetz, “BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection”, BMC bioinformatics, vol. 12, no. 1, pp. 345, 2011
	22. M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risks from highly imbalanced data using random forest,” BMC medical informatics and decision making, vol. 11, no. 1, pp. 1, 2011
	23. R. R. Kumar, V. A. Kumar, and P. N. Sharath Kumar, “Detection and removal of artifacts in cervical cytology images using support vector machine,” IT in Medicine and Education (ITME), 2011 International Symposium on, vol. 1, pp. 717-721, 2011
	24. S. Kumar, L. Jena, K. Mohod, S. Daf, and A. K. Varma, “Virtual screening for potential inhibitors of high-risk human papillomavirus 16 E6 protein,” Interdisciplinary Sciences: Computational Life Sciences, vol. 7, no. 2, pp. 136-142, 2015
	25. K. Li, Z. Lu, W. Liu, and J. Yin, “Cytoplasm and nucleus segmentation in cervical smear images using Radiating GVF Snake,” Pattern Recognition, vol. 45, no. 4, pp. 1255-1264, 2012
	26. W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, “iDNA-Prot: identification of DNA binding proteins using random forest with grey model,” PloS one, vol. 6, no. 9, pp. e24756, 2011
	27. A. Mohan, M. D. Rao, S. Sunderrajan, G. Pennathur, “Automatic classification of protein structures using physicochemical parameters,” Interdisciplinary Sciences: Computational Life Sciences, vol. 6, no. 3, pp. 176-186, 2014
	28. A. H. Mbaga, and P. Zhijun, “Pap Smear Images Classification for Early Detection of Cervical Cancer,” International Journal of Computer Applications, vol.118, no. 7, 2015
	29. J. H. Moore and B. C. White, “Tuning ReliefF for genome-wide genetic analysis.” in European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, pp. 166-175, April, 2007
	30. L. Martin and M. Exbrayat, “Pap-smear classification” Technical University of Denmark-DTU, 2003
	31. P. Malm, B. N. Balakrishnan, V. K. Sujathan, R. Kumar, and E. Bengtsson, “Debris removal in Pap-smear images,” Computer methods and programs in biomedicine, vol. 111, no. 1, pp. 128-138, 2013
	32. Y. Marinakis, G. Dounias, and J. Jantzen, “Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification,” Computers in Biology and Medicine, vol. 39, no. 1, pp.69-78, 2009
	33. J. Norup, “Classification of Pap-smear data by tranduction neuro-fuzzy methods” Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2005
	34. M. Peker, A Arslan, B. Sen, F. V. Celebi, and A. But, “A novel hybrid method for determining the depth of anesthesia level: Combining ReliefF feature selection and random forest algorithm (ReliefF+ RF).” in Innovations in Intelligent SysTems and Applications (INISTA), 2015 International Symposium on, pp. 1-8, September, 2015
	35. M. E. Plissiti and C. Nikou, “Cervical cell classification based exclusively on nucleus features,” Image Analysis and Recognition. Springer Berlin Heidelberg, pp. 483-490 ,2012
	36. M. E. Plissiti, C. Nikou and, A. Charchanti, “Watershed-based segmentation of cell nuclei boundaries in Pap smear images,” Information Technology and Applications in Biomedicine (ITAB), 2010 10th IEEE International Conference on. pp. 1-4, 2010
	37. M. E. Plissiti, C. Nikou, and A. Charchanti, “Automated Detection of Cell Nuclei in Pap Smear Images Using Morphological Reconstruction and Clustering,” IEEE Transactions on Information Technology in Biomedicine A Publication of the IEEE Engineering in Medicine & Biology Society, vol. 15, no .2, pp. 233-241, 2011
	38. J. R. Quinlan, “C4.5: programs for machine learning,” Elsevier, 2014
	39. M. Robnik-?ikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,A” Machine learning, vol.53, no. 1-2, pp. 23-69, 2003
	40. P. Sobrevilla, E. Montseny, F. Vaschetto, and E. Lerma, “Fuzzy-based analysis of microscopic color cervical pap smear images: nuclei detection,” International Journal of Computational Intelligence and Applications, vol. 9, no. 03, pp. 187-206, 2010
	41. S. Saha, M. Pal, A. Konar, and D. Bhattacharya, “Automatic Gesture Recognition for Health Care Using ReliefF and Fuzzy kNN.” In Information Systems Design and Intelligent Applications, pp. 709-717, 2015
	42. S. N. Sulaiman, N. Ashidi, M. Isa, and N. H. Othman, “Semi-automated pseudo colour features extraction technique for cervical cancer's pap smear images,” International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 15, no. 3, pp. 131-143, 2011
	43. V.Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, “Random forest: a classification and regression tool for compound classification and QSAR modeling,” Journal of chemical information and computer sciences, vol. 43, no. 6, pp. 1947-1958, 2003
	44. V. M. Valdespino and V. E. Valdespino, “Cervical cancer screening: state of the art,” Current Opinion in Obstetrics and Gynecology, vol. 18, no. 1, pp. 35-40, 2006
	45. J. Wu, H. Liu, X. Duan, Y. Ding, H. Wu, Y. Bai, and X. Sun, “Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature,” Bioinformatics, vol. 25, no. 1, pp. 30-35, 2009
	46. K. Q. Ye, “Indicator function and its application in two-level factorial designs,” Annals of Statistics, pp. 984-994, 2003
	47. J Yue, Z Li, L Liu, and Z. Fu, “Content-based image retrieval using color and texture fused features,” Mathematical and Computer Modelling, vol. 54, no. 3, pp. 1121-1127, 2011