Int J Performability Eng ›› 2020, Vol. 16 ›› Issue (7): 1038-1045.doi: 10.23940/ijpe.20.07.p6.10381045

Previous Articles     Next Articles

A Novel Submitochondrial Localization Predictor based on Gradient Boosting Algorithm and Dataset Balancing Treatment

Jinchao Zhao, Yinping Jin, Xi Lin, and Xiao Wang*   

  1. School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: pandaxiaoxi@163.com

Abstract: Mitochondria are universal in eukaryotes. Abnormalities in their location will lead to a wide range of human sicknesses, especially neurodegenerative diseases. Correctly identifying submitochondrial location is extremely critical and contributes to disease pathogenesis and drug design. Even with some important results in predicting the location of sub-subcellular structures, many problems remain. A mitochondrion has four submitochondrial compartments, but various available research ignores the intermembrane space. The publicly available benchmark datasets are unbalanced. Few researchers considered the matter of skewed data before classification, which will cause bias for some categories. In such a scenario, we present a novel predictor, called CatBoost-SubMito, for protein submitochondrial location prediction. To capture valuable information of a protein, the pseudo-amino acid composition approach is exploited to acquire feature vectors. Next, the synthetic minority oversampling technique method is used to decline the effects produced by unbalanced datasets. Finally, feature vectors are fed into the CatBoost classifier. The predictor is tested on three benchmark datasets (SM424-18, SubMitoPred, and M4-585). Experimental consequences indicate that our predictor surpasses state-of-the-art predictors.

Key words: mitochondria, imbalance problem, CatBoost, mitochondrial intermembrane space