Username   Password       Forgot your password?  Forgot your username? 

An XML Streaming Data Processing Method based on Forest Transducer

Volume 13, Number 5, September 2017 - Paper 7  - pp. 620-632
DOI: 10.23940/ijpe.17.05.p7.620632

Zhixue Hea,b,*, Husheng Liaoa

aCollege of Computer Science, Beijing University of Technology, Beijing, 100124, China
bCollege of Computer, North China Institute of Aerospace Engineering, Langfang, 065000, China

(Submitted on March 28, 2017; Revised on July 2, 2017; Accepted on August 10, 2017)


XML is the de facto standard for data representation and exchanging on web. The query processing technique of XML streaming data is a hotspot in current research. Focused on the characteristics of processing semi-structure XML streaming data such as the stream arriving continuously, requiring to be read sequentially and only once into memory, the querying must be processed on the fly, a method of processing XPath query based on forest transducer is proposed. Firstly, conversion rules of forest transducer are defined for XPath query. And then the transducer is driven by input streaming data nodes. Stack and abstract syntax tree are applied to implement match and state transformation in running procedure. The relationships between state functions and intermediate results are kept by the abstract syntax tree, and the query results are output in reducing process. Finally, the experimental results show that our approach is effective and efficient on this problem, and outperforms about 30 percent of the state-of-the-art algorithms especially for large processed data. At the same time, memory usage is nearly constant. This method resolves the balance between time and space complexity, and it is a useful reference for other methods.


References: 29

    1. M. Altinel and M. J. Franklin, “Efficient Filtering of XML Documents for Selective Dissemination of Information,” in Proceedings of the 26th International Conference on Very Large Data Bases, pp.53-64, Cairo, Egypt, September 2000
    2. C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski, “Streaming XPath Processing with Forward and Backward Axes,” in Proceedings of the 19th International Conference on Data Engineering, pp.455–466, Bangalore, India, March 2003
    3. S. Boag, D. Chamberlin, M.F. Fernández, D. Florescu, J. Robie, and J. Siméon, “XQuery 1.0: An XML Query Language (Second Edition): W3C Recommendation,” (2015-09-07)., 2016
    4. D. Brownell and D. Megginson, “SAX: Simple API for XML,” (2004-04-27).,  2016
    5. N. Bruno, L. Gravano, N. Koudas, and D. Srivastava, “Navigation-vs. Index-based XML Multi-query Processing,” in Proceedings of the 19th International Conference on Data Engineering, pp.139−150, Bangalore, India, March 2003
    6. C. Chan, P. Felber, M. Garofalakis, and R. Rastogi, “Efficient Filtering of XML Document with XPath Expressions,” The VLDB Journal, vol. 11, no. 4,pp. 354-379, December 2002
    7. Y. Chen, S. B. Davidson, and Y. Zheng, “An Efficient XPath Query Processor for XML Streams,” in Proceedings of the 22nd International Conference on Data Engineering, pp.79-79, Atlanta, GA, USA, April 2006
    8. J. Clark and S. Derose, “XML Path Language (XPath) Version 1.0: W3C Recommendation,” (2015-09-07)., 2016
    9. Y. L. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer, “Path Sharing and Predicate Evaluation for High-performance XML Filtering,” ACM Transactions on Database Systems, vol. 28, no. 4, pp.467-516, December 2003
    10. J. Gao, D. Q. Yang, S. W. Tang, and T. J. Wang, “Tree Automata based Efficient XPath Evaluation over XML Data Stream”. Journal of Software, vol. 16, no. 2, pp.223-232, February 2005
    11. X. Q. Gong, W.N.Qian, Y. Yan, and A. Zhou, “Bloom Filter-based XML Packets Filtering for Millions of Path Queries,” in Proceedings of the 21st International Conference on Data Engineering, pp.890-901, Tokyo, Japan, April 2005
    12. G. Gou and R. Chirkova, “Efficient Algorithms for Evaluating XPath over Streams,” in Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp.269-280, Beijing, China, June 2007
    13. A. Gupta and D. Suciu, “Stream Processing of XPath Queries With Predicates,” in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp.419−430, San Diego, California, USA, June 2003
    14. S. Hakuta , S. Maneth , K. Nakano , and H. Iwasaki. “XQuery Streaming by Forest Transducer,” in Proceedings of the IEEE 30th International Conference on Data Engineering, pp. 952-963, Chicago, USA, March 2014.
    15. W.S. Han, H.F. Jiang, H. Howard, and Q. Li, “StreamTX: Extracting Tuples from Streaming XML Data,” in Proceedings of the VLDB Endowment, pp.289-300, Auckland, New Zealand, August 2008
    16. V. Josifovski, M. Fontoura, and A. Barta, “Querying XML Streams,” The VLDB Journal, vol. 14, no. 2, pp. 197-210, April 2005
    17. J. Kwon, P. Rao, B. Moon, and S. Lee, “FiST: Scalable XML Document Filtering by Sequencing Twig Patterns,” in Proceedings of the 31st International Conference on Very Large Data Bases, pp.217-228, Trondheim, Norway, August 2005
    18. M. Ley. DBLP XML Records.
    19. D. Olteanu, “SPEX: Streamed and Progressive Evaluation of XPath,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 7, pp.934-949, July 2007
    20. M. Onizuka, “Processing XPath Queries with Forward and Downward Axes over XML Streams,” In Proceedings of the 13th International Conference on Extending Database Technology, pp.27-38, Lausanne, Switzerland, March 2010
    21. F. Peng and S. C. Sudarshan, “XSQ: A Streaming XPath Engine,” ACM Transactions on Database Systems, vol. 30, no. 2, pp.577-623, June 2005
    22. T. Perst and H. Seidl, “Macro Forest Transducers,” Information Processing Letters, vol. 89, no. 3, pp.141-149, February 2004
    23. A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse, “XMark: A Benchmark for XML Data Management,” In Proceedings of the 28th international conference on Very Large Data Bases, pp.974-985, Hong Kong, China, August 2002
    24. M. Schmidt, S. Scherzinger, and C. Koch, “Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming XQuery Evaluation,” in Proceedings of the 23rd International Conference on Data Engineering, pp.236-245, Istanbul, Turkey, April 2007
    25. A. Taylor, M. Marcus, and B. Santorini, “The Penn Treebank: An Overview,” In Treebanks, pp.5-22, Springer Netherlands, 2003
    26. X. D. Wu, , X. Q. Zhu, G. Q. Wu, and W. Ding, “Data Mining with Big Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp.97-107, January 2014
    27. X. Y. Wu, and D. Theodoratos, “A Survey on XML Streaming Evaluation Techniques,” The VLDB Journal, vol. 22, no. 2, pp.177-202, April 2013
    28. W. D. Yang, Q. M. Wang, and B. L. Shi, “Complex Twig Pattern Query Processing over XML Streams,” Journal of Software, vol. 18, no. 4, pp.893-904, April 2007
    29. W. D. Yang, and B. L. Shi, “A Survey of XML Stream Management,” Journal of Computer Research and Development, vol. 46, no. 10, pp.1721-1728, October 2009



      Click here to download the paper.

      Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

      This site uses encryption for transmitting your passwords.