Username   Password       Forgot your password?  Forgot your username? 


Similarity Entropy-Based Self-Adaptive String Outlier Detection Method

Volume 13, Number 4, July 2017 - Paper 10 - pp. 427-436
DOI: 10.23940/ijpe.17.04.p10.427436

Ou Ye and Zhanli Li

School of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an, 710054, Shannxi, China

(Submitted on February 27, 2017; Revised on April 2, 2017; Accepted on May 17, 2017)


Although a large variety of outlier detection techniques have been developed, the algorithms pay less attention to the impact of structure factor on semantics for string data, and the threshold is difficult to be given automatically with unknown distribution law of string data, so the accuracy of string outlier detection is difficult to be ensured. This paper presents a similarity entropy-based self-adaptive string outlier detection method to address this issue. Firstly, semantic similarity is calculated by matrix computation based on word matching, and structure similarity is calculated by considering the structure factors. On this basis, string data is mapped into similarity cells, and they are detected to identify outlier data by using similarity distance. In order to reduce the sensitivity problem of threshold, the similarity entropy histogram is constructed to determine the dynamic threshold. The simulation experiments are conducted to prove the feasibility and rationality of this method, and the results show that this method can reduce sensitivity problem of threshold and ensure accuracy.


References: 13

1.    Nurunnabi A, West G, Belton D, “Outlier detection and robust normal-curvature estimation in mobile laser scanning 3D point cloud data”, Pattern Recognition, vol. 48, no. 4, pp. 1404-1419, 2015.
2.    Barnabe-Lortie V, Bellinger C, Japkowicz N, “Smoothing gamma ray spectra to improve outlier detection”, in Proceedings of the International Conference on Computational Intelligence for Security and Defense Applications, pp. 1-8, 2014.
3.    Pardo M C, Hobza T, “Outlier detection method in GEEs”, Biometrical Journal, vol. 56, no. 5, pp. 838-850, 2014.
4.    Hido S, Tsuboi Y, Kashima H, et al, “Statistical outlier detection using direct density ratio estimation”, Knowledge and information systems, vol. 26, no. 2, pp. 309-336, 2011.
5.    Zhang Y, Hamm N A S, Meratnia N, et al, “Statistics-based outlier detection for wireless sensor networks”, International Journal of Geographical Information Science, vol. 26, no. 8, pp. 1373-1392, 2012
6.    Marateb H R, Rojas-Martínez M, Mansourian M, et al, “Outlier detection in high-density surface electromyographic signals”, Medical & biological engineering & computing, vol. 50, no. 1, pp. 79-89, 2012.
7.    Kontaki M, Gounaris A, Papadopoulos A N, et al, “Continuous monitoring of distance-based outliers over data streams”, in Proceedings of the International Conference on Data Engineering, pp. 135-146, 2011.
8.    Cassisi C, Ferro A, Giugno R, et al, “Enhancing density-based clustering: Parameter reduction and outlier detection”, Information Systems, vol. 38, no. 3, pp. 317-330, 2013.
9.    Orair G H, Teixeira C H C, Meira Jr W, et al, “Distance-based outlier detection: consolidation and renewed bearing”, in Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1469-1480, 2010.
10.    Zhen Yang, Minghui Zhang, “Research of algorithm formining outlier based on double distance application in coal mining”, Manufacturing Automation, vol. 5, pp. 40-42, 2013.
11.    Shicai Fan, “The Outlier Detection Based on Semantics”, Inner Mongolia Coal Economy, vol. 7, pp. 19-21, 2011.
12.    Yang Cong, Junsong Yuan, Yandong Tang, “Video Anomaly Search in Crowded Scenes via Spatio-Temporal Motion Context”, IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1590-1599, 2013.
13.    Li Guo-Hui, Du Xiao-Kun, Hu Fang-Xiao, Yang Bing, Tang Xiao-Hong, “Structure Matching Method Based on Functional Dependencies”, Journal of Software, vol. 20, no. 10, pp. 2667-2678, 2009.


Click here to download the paper.

Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

This site uses encryption for transmitting your passwords.