Incremental Integration Algorithm based on Incremental RLID3

doi:10.23940/ijpe.19.01.p25.252260

Abstract

Abstract:

In the research process of ID3 algorithm, some deficiencies were found. RLID3 algorithm is on the improvement of ID3 algorithm in terms of the number of leaf nodes. RLID3 algorithm uses ensemble learning method to integrate multiple incremental RLID3 model and the predictive ability of the algorithm is further improved.Incre_RLID3 is an incremental learning algorithm that is based on a decision tree constructed by RLID3. It adjusts construction of the tree using incremental data set. The goal of this algorithm is to use the new data on the basis of the original decision tree. In order to further improve the accuracy of the algorithm, this paper proposes an ensemble algorithm PAR_WT. The basic idea of this algorithm is to use the data set to generate multiple RLID3 decision tree. Then, the test samples are classified by each decision tree. Finally, combined with the PAR_WT algorithm and Incre_RLID3 algorithm, an incremental ensemble algorithm Incre_RLID3_ENM algorithm with incremental learning ability is obtained.

Key words: incremental integration, ensemble learning, incremental learning, decision tree

Hongbin Wang, Lei Hu, Xiaodong Xie, Lianke Zhou, and Huafeng Li. Incremental Integration Algorithm based on Incremental RLID3 [J]. Int J Performability Eng, 2019, 15(1): 252-260.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

Figures/Tables 10

Table 1

Algorithm 1

Update RLID3Tree (Root node of a RLID3 decision tree, Incremental data set)"

1. Input: Root node of a RLID3 decision tree, Incremental data set
2. Output: adjusted decision tree
3. declare a variable curNode, decision tree is used to represent the nodes in the traversal algorithm
4. set the curNode to the root node of the input RLID3 decision tree
5. adding the incremental data set to the sample feature code of the curNode
6. if (The classification of each sample is the same in the sample feature code set of the curNode) {//All samples are divided into similar
7. curNode is set to a leaf node, namely: the curNode attribute is set to null
8. class attribute value of curNode is set to the classification of the sample feature code set of the curNode
9. }else{//The samples belong to the category is not exactly the same
10. if (curNode is a leaf node) {//The attribute of the curNode is null
11. using the feature sample code set of the curNode, according to the formula (1), choosing the attribute with the maximum value of decision tree optimization ratio, the attribute of the curNode is set to the attribute
12. if (decision tree optimization ratio of the attribute of the curNode is 0) {
13. set the curNode to the leaf node, namely: the attribute of the curNode is set to null
14. class attribute value of the curNode is set to the most class of sample feature code set
15. }else{//The maximum value of decision tree optimization ratio is not 0
16. the attribute of the curNode as the attribute of the root node, call the RLID3 algorithm makeRLID3Tree (Sample feature set of the curNode, the attribute of the curNode) to build RLID3 tree}
17. }else{//the curNode is a branch node
18. using sample feature set of the curNode. According to the formula (1), selecting the attribute of the maximum value of decision tree optimization ratio, denoted as A
19. if(decision tree optimization ratio of A is 0){//
20. the curNode is set to leaf node
21. class attribute value of the curNode is set to the highest value in sample feature code set of the curNode
22. }else{//decision tree optimization ratio of A is not 0
23. The attribute of the curNodeis the same as A.
24. while(sample：sample feature code set of the curNode){
25. add sample to the childrenTrainData[sample partition attribute's value number] array
26. }
27. for (j = 0; j< the number of the curNode’s available; j++)
28. the node that corresponds to the first j value is used as the root node of the sub tree,
29. call RLID3 algorithm,
30. updateRLID3Tree(the node that corresponds to the first j value, childrenTrainData[j]), subtree traversal
31. }
32. }else{//the attribute of curNode is not the same as A
33. the node that corresponds to the first j value is used as the root node of the sub tree,
34. call RLID3 algorithm,
35. updateRLID3Tree(the node that corresponds to the first j value, childrenTrainData[j]), subtree traversal
36. }
37. }
38. }
39. }

Algorithm 1

Algorithm 3-1

Algorithm 3-2

Algorithm 2

Algorithm 4

Table 2

Figure 1

Figure 2

Figure 3

References 11

[1]	C. L. Cassano, A. J. Simon, L. Wei, C. Fredrickson, Z. H. Fan , “Use of Vacuum Bagging for Fabricating Thermoplastic Microfluidic Devices,” Lab on a Chip, Vol. 15, No. 1, pp. 62, 2015 doi: 10.1039/c4lc00927d pmid: 4256099
[2]	Y. Chen, J. Cao, S. Feng, Y. Tan , “An Ensemble Learning based Approach for Building Airfare Forecast Service,” inProceedings of IEEE International Conference on Big Data, pp. 964-969, 2015 doi: 10.1109/BigData.2015.7363846
[3]	B. Gu, V. S. Sheng, Z. Wang, D. Ho, S. Osman, S. Li , “Incremental Learning for ν-Support Vector Regression,” Neural Networks, Vol. 67, No. C, pp. 140, 2015 doi: 10.1016/j.neunet.2015.03.013 pmid: 25933108
[4]	Y. Guo, L. Jiao, S. Wang, S. Wang, F. Liu, K. Rong , et al., “A Novel Dynamic Rough Subspace based Selective Ensemble,” Pattern Recognition, Vol. 48, No. 5, pp. 1638-1652, 2015 doi: 10.1016/j.patcog.2014.11.001
[5]	B. Han, B. He, N. Rui, M. Ma, S. Zhang, M. Li , et al., “LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data,” Neurocomputing, Vol. 149, pp. 285-294, 2015 doi: 10.1016/j.neucom.2014.01.069
[6]	L. Hu, C. Shao, J. Li, H. Ji , “Incremental Learning from News Events,” Knowledge-based Systems, Vol. 89, No. C, pp. 618-626, 2015 doi: 10.1016/j.knosys.2015.09.007
[7]	N. Li, Y. Jiang, Z. H. Zhou , “Multi-Label Selective Ensemble,” inProceedings of International Workshop on Multiple Classifier Systems, pp. 76-88, 2015
[8]	D. Peressutti, W. Bai, T. Jackson, M. Sohal, A. Rinaldi, D. Rueckert , et al., “Prospective Identification of CRT Super Responders using a Motion Atlas and Random Projection Ensemble Learning,” inProceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 493-500, 2015 doi: 10.1007/978-3-319-24574-4_59
[9]	J.C. Schlimmer and D. H. Fisher , “A Case Study of Incremental Concept Induction,” in Proceedings of National Conference on Artificial Intelligence, pp. 496-501, Philadelphia, Pa, August 11-15, 1986
[10]	X. Z. Wang, H. J. Xing, Y. Li, Q. Hua, C. R. Dong, W. Pedrycz , “A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning,” IEEE Transactions on Fuzzy Systems, Vol. 23, No. 5, pp. 1638-1654, 2015 doi: 10.1109/TFUZZ.2014.2371479
[11]	M. S. Zia and M. A. Jaffar ,“An Adaptive Training based on Classification System for Patterns in Facial Expressions using SURF Descriptor Templates, ” Kluwer Academic Publishers, 2015 doi: 10.1007/s11042-013-1803-3

No.	Outlook	Humidity	Play
1	sunny	high	no
2	sunny	high	no
3	overcast	high	yes
4	rainy	high	yes

Name	Instances	Attributes	Classifications	Missing Values
Letter	20000	16	26	No
HIV	6590	8	2	No
Nursery	12960	8	5	No
Tic-Tac-Toe	958	9	2	No
Connect-4	67557	42	3	No
Chess	3196	36	2	No
Splice	3190	61	3	No
Mushroom	8124	22	2	Yes
Lymph	148	18	4	No
Breast-w	699	10	2	Yes

[1]	Ashu Mehta, Navdeep Kaur, and Amandeep Kaur. Addressing Class Imbalance in Software Fault Prediction using BVPC-SENN: A Hybrid Ensemble Approach [J]. Int J Performability Eng, 2025, 21(2): 94-103.
[2]	Vikas Kumar, Charu Wahi, Bharat Bhushan Sagar, and Manisha Manjul. Ensemble Learning Based Intrusion Detection for Wireless Sensor Network Environment [J]. Int J Performability Eng, 2024, 20(9): 541-551.
[3]	Shikha Singh, Sumit Badotra, and Nitin Arvind Shelke. Applying Machine Learning Techniques for Comparative Analysis of Various Diseases [J]. Int J Performability Eng, 2024, 20(6): 379-390.
[4]	Hannousse Abdelhakim and Talha Zied. A Hybrid Ensemble Learning Approach for Detecting Bots on Twitter [J]. Int J Performability Eng, 2024, 20(10): 610-620.
[5]	Rakesh Kumar, Sunny Arora, Ashima Arya, Neha Kohli, Vaishali Arya, and Ekta Singh. Ensemble Learning for Appraising English Text Readability using Gompertz Function [J]. Int J Performability Eng, 2023, 19(6): 388-396.
[6]	Neha Kohli and Tapas Kumar. Envisaging Alzheimer’s Disease Stage through Fuzzy Rank-Based Ensemble of Transfer Learning Models [J]. Int J Performability Eng, 2023, 19(6): 397-406.
[7]	K. Eswara Rao, G. Appa Rao, and P. Sankara Rao. A Weighted Ada-Boosting Approach for Software Defect Prediction using Characterized Code Features Associated with Software Quality [J]. Int J Performability Eng, 2022, 18(11): 798-807.
[8]	Sandeep Honnurappa and Bevoor Krishnappa Raghavendra. A Highly Robust Heterogenous Deep Ensemble Assisted Multi-Feature Learning Model for Diabetic Mellitus Prediction [J]. Int J Performability Eng, 2021, 17(11): 926-937.
[9]	Zhaoyue Zhang, An Zhang, Cong Sun, and Shanmei Li. Forecasting Airport Surface Traffic Congestion based on Decision Tree [J]. Int J Performability Eng, 2020, 16(5): 738-746.
[10]	Shibo Wang, Yong Li, Wenbo Mi and Ying Liu. Software Defect Prediction Incremental Model using Ensemble Learning [J]. Int J Performability Eng, 2020, 16(11): 1771-1780.
[11]	Minghua Jia, Xiaodong Wang, Yue Xu, Zhanqi Cui, and Ruilin Xie. Testing Machine Learning Classifiers based on Compositional Metamorphic Relations [J]. Int J Performability Eng, 2020, 16(1): 67-77.
[12]	Sizhe Huang, Huosheng Xu, and Xuezhi Xia. Imbalanced Remote Sensing Ship Image Classification [J]. Int J Performability Eng, 2019, 15(6): 1709-1715.
[13]	Xie Wei, Xu Huoxi, Peng Liping, and Lan Zhigao. Short-Range Wireless Network Communication and Application based on Decision Tree Algorithm [J]. Int J Performability Eng, 2018, 14(11): 2561-2573.