Applying an Improved Elephant Herding Optimization Algorithm with Spark-based Parallelization to Feature Selection for Intrusion Detection
- Hui Xu, Qianqian Cao, Heng Fu, and Hongwei Chen
With the growth of the intrusion data scale model, irrelevant or redundant features in high-dimensional intrusion detection data leads to slow processing speed of the intrusion detection algorithm, and the consumption of the algorithm in time and space will increase as the feature dimensions increase. In view of good classification performance of the Elephant Herding Optimization (EHO) algorithm in reducing feature redundancy, this paper introduces the EHO algorithm into feature selection for intrusion detection. Since the basic EHO algorithm tends to fall into a local optimum and lacks strong search ability, the classification performance and dimensional reduction ability of the algorithm are severely limited. Therefore, an Improved Elephant Herding Optimization (IEHO) algorithm is proposed in this paper to search the feature space and find the optimal feature subset, so that the feature number is minimized while the classification performance is maximized. As the scale of intrusion data grows, the large amount of redundant information in the intrusion data will cause the improved algorithm to process slowly. Thus, in this case, the improved algorithm is considered to be parallelized to relieve the pressure of single-machine operation. This paper then proposes a Spark-based distributed parallel IEHO algorithm for intrusion detection, and a feature selection method based on this algorithm for intrusion detection is discussed. The feature selection in a distributed environment can improve the running efficiency of the IEHO algorithm, so as to reduce the running time of the algorithm under the premise of ensuring classification accuracy. As for the experimental validation, both UCI and KDD CUP99 datasets are used to verify the feature selection for intrusion detection. Compared with the classical PSO, MFO, and EHO algorithms, the feature selection by the binary IEHO algorithm is improved by 4.16%, 1.42%, and 0.98%, respectively, and the classification performance is also significantly improved. Compared with the stand-alone version of the IEHO algorithm, the classification efficiency of the parallel IEHO algorithm based on Spark for intrusion feature selection is significantly improved, and the acceleration ratio is increased by two orders of magnitude.