Int J Performability Eng ›› 2020, Vol. 16 ›› Issue (11): 1721-1731.

### Unknown Protocol Data Frame Classification Algorithm based on Improved K-Means

Zhiguo Liua,b and Changqing Rena,b,*

1. aDalian Economic and Technological Development Zone, Dalian University, 116622, China;
bCommunication and Network Laboratory, Dalian University, Dalian, 116622, China
• Submitted on  ;  Revised on  ; Accepted on
• Contact: *E-mail address: liuzhiguo_dldx@163.com

Abstract: Aiming at the problem of low efficiency and low accuracy in the classification of multiple unknown protocol data frames in an unknown network environment, this paper proposes a k-means clustering algorithm based on information entropy and density. First, according to the characteristics of the protocol data frame in the bitstream form, the Euclidean distance between the data frames is weighted by using information entropy. Then, the high-density data frame set is determined by the statistics of the density of each data frame, and the cluster center point is determined in this set by the maximum and minimum distance criterion. Finally, the ratio of the distance in the protocol cluster to the distance between the clusters is used to determine the number of unknown protocols. Simulation results show that this method can cluster the data frames of unknown bitstream protocol quickly and accurately.