Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (7): 1947-1954.doi: 10.23940/ijpe.19.07.p22.19471954

Previous Articles     Next Articles

Efficiently Retrieving Differences Between Remote Sets using Counting Bloom Filter

Xiaomei Tiana,b,*, Huihuang Zhaoa,b, Yaqi Suna,b, and Xiaoman Lianga,b   

  1. a College of Computer Science and Technology, Hengyang Normal University, Hengyang, 421002, China
    b Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, 421002, China
  • Submitted on ;
  • Contact: * E-mail address: tianxm@hynu.edu.cn
  • About author:Xiaomei Tian received her Ph.D. in 2013 from Hunan University. She is currently a professor at Hengyang Normal University. Her main research interests include trusted systems and networks.Huihuang Zhao received his Ph.D. in 2010 from Xidian University. He is currently an associate professor at Hengyang Normal University. His main research interests include compressive sensing, machine learning, and image processing.Yaqi Sun received her M.Sc. degree in 2013 from Guilin University of Technology. Her main research interests include character recognition and image processing.Xiaoman Liang is a professor at Hengyang Normal University. His main research interests include wireless networks and machine learning.
  • Supported by:
    This work is supported by the National Science Foundation of China (No. 61503128, 61772182, 61772178), Science and Technology Plan Project of Hunan Province (No. 2016TP1020), Open Fund Project of Hunan Provincial Key Laboratory of Intelligent Information Processing and Application for Hengyang Normal University, Scientific Research Fund of Hunan Provincial Education Department (No. 16C0226), and Hunan Natural Science Foundation (No. 2017JJ4012).

Abstract: Retrieving differences between remote sets is widely used in set reconciliation and data deduplication. Set reconciliation and data deduplication between two nodes are widely used in various network applications. The basic idea of the difference retrieving problem is that each member of a node pair has an object set and seeks to find all differences between the two remote sets. There are many methods for retrieving difference sets, such as the standard Bloom filter (SBF), counting Bloom filter (CBF), and invertible Bloom filter (IBF). In these methods, based on the standard Bloom filter or its variants, each node represents its objects using a standard Bloom filter or other Bloom filter, which is then exchanged. A receiving node retrieves different objects between the two sets according to the received SBF, CBF, or IBF. We propose a new algorithm for retrieving differences that finds differences between remote sets using counting Bloom filters' deletion operation. The theoretical analyses and experimental results show that the differences can be retrieved efficiently. Only a very small number of differences are missing in the retrieving process, and this false negative rate can be decreased to 0% by adjusting the counting Bloom filter's parameters.

Key words: counting Bloom filter, deletion operation, retrieving differences, data deduplication, set reconciliation