%A Xiao Chen, Li Han, Meng Leng, and Xiao Pan %T Similarity based on the Importance of Common Features in Random Forest %0 Journal Article %D 2019 %J Int J Performability Eng %R 10.23940/ijpe.19.04.p12.11711180 %P 1171-1180 %V 15 %N 4 %U {https://www.ijpe-online.com/CN/abstract/article_4059.shtml} %8 2019-04-20 %X In the existing methods for calculating the similarity between samples in random forests, the only case considered is where different samples fall on the same leaf node of the decision tree. The cases where there are leaf nodes in different positions of the decision tree or the sample falls on different leaves are neglected, thus affecting the accuracy of the similarity. In this paper, firstly, according to the difference of the leaf nodes in different positions of the decision tree, the importance of the sample features to which the leaf nodes belong are used as an attribute to describe the similarity. Secondly, for the case that the samples fall on different leaf nodes, the common features between samples are taken as another attribute to describe the similarity. Therefore, the measure method SICF (similarity between samples based on the importance of common features) is proposed. Finally, it is applied to the K-nearest neighbor classification algorithm, and the validity and correctness of the similarity are verified by the OOB index. The experimental results show that for the UCI data set, compared with two classical methods, the similarity SICF achieves better classification results.