Username   Password       Forgot your password?  Forgot your username? 


Collision Analysis and an Efficient Double Array Construction Method

Volume 14, Number 4, April 2018, pp. 647-655
DOI: 10.23940/ijpe.18.04.p7.647655

Lianyin Jiaa, Wenyan Chena, Jiaman Dinga, Xiaohui Yuanb, Binglin Shena, and Mengjuan Lic

aFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China
bCollege of Engineering, University of North Texas, Denton, 76203, USA
cYunnan Normal University, Kunming, 650500, China

(Submitted on January 2, 2018; Revised on February 13, 2018; Accepted on March 26, 2018)


Trie is fundamental to many applications, such as natural language processing, string similarity search and join, but it suffers from a high space overhead. Double array (DA) provides a new way to reduce the space overhead but suffers from low construction efficiency. There is an urgent demand to promote the construction efficiency of DA while maintaining a low memory overhead. To address this problem, we reveal that the collisions generated during DA construction process mainly contribute to the low construction efficiency. Based on this analysis, a partition double array (PDA) is proposed in this paper. PDA can reduce the number of collisions as well as the cost of handling collisions in DA, so higher construction efficiency is guaranteed. Experiments on real dataset indicates that PDAs have a construction efficiency 15x higher than DA. We also obtain a bonus 2.7x higher retrieval efficiency compared with DA.


References: 17

    1. A. V. Aho and M. J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," in Comm. ACM, 2010.
    2. J. I. Aoe, "An Efficient Digital Search Algorithm by Using a Double-Array Structure," IEEE Transactions on Software Engineering, vol. 15,no. 9, pp. 1066-1077, 1989.
    3. J. I. Aoe, K. Morimoto, and T. Sato, An Efficient Implementation of Trie Structures: John Wiley & Sons, Inc., 1992.
    4. B. Ding, H. Wang, R. Jin, J. Han, and Z. Wang, "Optimizing Index for Taxonomy Keyword Search," in ACM SIGMOD International Conference on Management of Data, 2012, pp. 493-504.
    5. J. Feng, J. Wang, and G. Li, "Trie-Join: A Trie-Based Method for Efficient String Similarity Joins," The VLDB Journal, vol. 21,no. 4, pp. 437-461, 2012.
    6. E. Fredkin, "Trie Memory," Communications of the Acm, vol. 3,no. 9, pp. 490-499, 1960.
    7. M. Fuketa, H. Kitagawa, T. Ogawa, K. Morita, and J. I. Aoe, "Compression of Double Array Structures for Fixed Length Keywords," Information Processing & Management An International Journal, vol. 50,no. 5, pp. 796-806, 2014.
    8. K. Huang, G. Xie, Y. Li, and A. X. Liu, "Offset Addressing Approach to Memory-Efficient Ip Address Lookup," Proceedings - IEEE INFOCOM, vol. 42,no. 4, pp. 306-310, 2011.
    9. L. Y. Jia, J. Q. Xi, M. J. Li, Y. Liu, and D. C. Miao, "Eti: An Efficient Index for Set Similarity Queries," Frontiers Of Computer Science, vol. 6,no. 6, pp. 700-712, 2012.
    10. S. Kanda, M. Fuketa, K. Morita, and J. I. Aoe, "A Compression Method of Double-Array Structures Using Linear Functions," Knowledge & Information Systems, vol. 48,no. 1, pp. 55-80, 2016.
    11. Lai Yang, Lida Xu, and Zhongzhi Shi, "An Enhanced Dynamic Hash Trie Algorithm for Lexicon Search," Enterprise Information Systems, vol. 6,no. 4, pp. 419-432, 2012.
    12. J. Lee and H. Lim, "Multi-Stride Decision Trie for Ip Address Lookup," Ieie Transactions on Smart Processing & Computing, vol. 5,no. 5, pp. 331-336, 2016.
    13. G. Li, J. He, D. Deng, and J. Li, "Efficient Similarity Join and Search on Multi-Attribute Data," presented at the Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 2015.
    14. S. Niu, Y. Liu, and X. Song, "Speeding up Double-Array Trie Construction for String Matching," Communications in Computer & Information Science, vol. 320,no. pp. 572-579, 2013.
    15. J. Y. Norimatsu, M. Yasuhara, T. Tanaka, and M. Yamamoto, "A Fast and Compact Language Model Implementation Using Double-Array Structures," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 15,no. 4, pp. 1-27, 2016.
    16. S. L. Wang, H. P. Zhang, and B. Wang, "Research of Optimization on Double-Array Trie and Its Application," Journal of Chinese Information Processing, vol. 20,no. 5, pp. 24-30, 2006.
    17. S. Yata, M. Oono, K. Morita, M. Fuketa, T. Sumitomo, and J. I. Aoe, "A Compact Static Double-Array Keeping Character Codes," Information Processing & Management, vol. 43,no. 1, pp. 237-247, 2007.


      Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

      This site uses encryption for transmitting your passwords.