A Distributed Frequent Itemset Mining Algorithm for Uncertain Data

doi:10.23940/ijpe.19.10.p27.28052816

Abstract

Abstract:

With the rapidly expansion of big data in all domains, it has become a major research topic to improve the performance of mining frequent patterns in massive uncertain datasets in recent years. Most conventional frequent pattern mining approaches take expect, probability, or weight as one single factor of item support, and algorithms that consider both probability and weight are unable to balance execution efficiency under the circumstances of big data. Therefore, we propose a distributed frequent itemset mining algorithm for uncertain data: Dfimud. Firstly, Dfimud calculates the maximum probability weight value of 1-items and prunes the items whose value is less than the given threshold. Secondly, to reduce the times of scanning the datasets, a distributed Dfimud-tree structure inspired by FP-Tree is designed to mine frequent patterns. Finally, experiments on publicly available UCI datasets demonstrate that Dfimud achieves more optimal results than other related approaches across various metrics. In addition, the empirical study also shows that Dfimud has good scalability.

Key words: data mining, uncertain data, frequent itemset, distributed framework

Jiaman Ding, Haibin Li, Yang Yang, Lianyin Jia, and Jinguo You. A Distributed Frequent Itemset Mining Algorithm for Uncertain Data [J]. Int J Performability Eng, 2019, 15(10): 2805-2816.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References 21.

1.	Y. Song, H. Z. Wang, J. Z. Li,H. Gao, “MapReduce for Big Data Analysis: Benefits, Limitations and Extensions,” in Proceedings of International Conference of Pioneering Computer Scientists,Engineers and Educators, pp. 453-457, 2016
2.	Ö. M. Soysal, E. Gupta,H. Donepudi, “A Sparse Memory Allocation Data Structure for Sequential and Parallel Association Rule Mining,” The Journal of Supercomputing, Vol. 72, No. 2, pp. 347-370, 2016
3.	C. W. Lin, W. Gan, P. Fournier-Viger,T. P. Hong, “Efficient Mining of Weighted Frequent Itemsets in Uncertain Databases,” in Proceedings of 12th International Conference, pp. 236-250, Springer International Publishing, 2016
4.	M. R. Karim, M. Cochez, O. D. Beyan, C. F. Ahmed,S. Decker, “Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach,”Information Sciences, Vol. 432, pp. 278-300, 2018
5.	C. K. Chui, B. Kao,H. Edword, “Mining Frequent Itemsets from Uncertain Data,” in Proceedings of 11th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Vol.4426, pp. 47-58, Springer-Verlag, Nanjing, China, 2007
6.	C. H. Chee, J. Jaafer, A. Izzatdin, M. H. Hasan,W. Yeoh, “Algorithms for Frequent Itemset Mining: A Literature Review,” Artificial Intelligence Review, Vol. 36, No. 3, pp. 1-9, 2018
7.	L. Wang, D. W. Cheung, R. Cheung, S. D. Lee,X. Yang, “Efficient Mining of Frequent Item Sets on Large Uncertain Datasets,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 12, pp. 2170-2183, 2012
8.	X. Sun, L. Lim,S. Wang, “An Approximation Algorithm of Mining Frequent Itemsets from Uncertain Dataset,” International Journal of Advancements in Computing Technology, Vol. 4, No. 3, pp. 42-49, 2012
9.	C. K. Leung, C. L. Carmicheal,B. Hao, “Efficient Mining of Frequent Patterns from Uncertain Data,” inProceedings of 17th IEEE International Conference on Data Mining Workshops, pp. 489- 494, IEEE Computer Society, Washington, DC, USA, 2007
10.	W. L.Chun and P. H. Tzung, “A New Mining Approach for Uncertain Datasets using CUFP-Trees,” Expert Systems with Applications, Vol. 39, No. 4, pp. 4084-4093, 2012
11.	T. Bernecker, H. P. Kriegel, M. Renz, F. Verhein,A. Zufle, “Probabilistic Frequent Itemset Mining in Uncertain Datasets,” inProceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119-128, ACM, Paris, France, 2009
12.	L. Sun, R. Cheng, D. W. Cheung,J. Cheng, “Mining Uncertain Data with Probabilistic Guarantees,” inProceedings of 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273-282, New York, NY, USA: Kdd, 2010
13.	U. Yun and J. J. Leggett, “WFIM: Weighted Frequent Itemset Mining with a Weight Range and a Minimum Weight,” inProceedings of Siam International Conference on Data Mining, pp. 636-640, 2005
14.	U. Yun and J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequential Datasets,” inProceedings of 3th International IEEE Conference on Intelligent Systems, pp. 512-517, IEEE, London, UK, 2006
15.	G. C. Lan, T. P. Hong, Y. L. Hong, S. L. Wang,C. W. Tsai, “Enhancing the Efficiency in Mining Weighted Frequent Itemsets,” in Proceedings of the2013 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1104-1108, 2013
16.	C. W. Lin, W. Gan, P. Fournier-Viger,T. P. Hong, “RWFIM: Recent Weighted-Frequent Itemsets Mining,”Engineering Applications of Artificial Intelligence, Vol. 45, pp. 18-32, 2015
17.	C. W. Lin, W. Gan, P. Fournier-Viger, H. C. Chao,T. P. Hong, “Efficiently Mining Frequent Itemsets with Weight and Recency Constraints,” Applied Intelligence, Vol. 47, No. 3, pp. 769-792, 2017
18.	C. W. Lin, W. Gan, P. Fournier-Viger, T. P. Hong,V. S. Tseng, “Weighted Frequent Itemset Mining over Uncertain Datasets,” Applied Intelligence, Vol. 44, No. 1, pp. 232-250, 2016
19.	H. Qiu, R. Gu, C. Yuan,Y. Huang, “YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark,” IEEE International Parallel and Distributed Processing Symposium Workshops, IEEE Computer Society, Washington, DC, USA, pp. 1664-1671, 2014
20.	S. Rathee, A. Kashyap,M. Kaul, “R-Apriori: An Efficient Apriori based Algorithm on Spark,”The 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, pp. 27-34, ACM, Melbourne, Australia, 2015
21.	K. K.Sethi and D. Ramesh, “HFIM: A Spark-based Hybrid Frequent Itemset Mining Algorithm for Big Data Processing,”Journal of Supercomputing, Vol. 73, pp. 1-17, 2017

[1]	Manpreet Singh, Gauri Jindal, Akshita Oberoi, and Rohan Dhangar. Improving Crime Detection Through Geo-MDA: A Hybrid Linear Regression Approach in Data Mining [J]. Int J Performability Eng, 2024, 20(8): 469-477.
[2]	Anita Agárdi, László Kovács, and Tamás Bányai. Using Time Series and Classification in Vehicle Routing Problem [J]. Int J Performability Eng, 2021, 17(1): 14-25.
[3]	Pan Liu and Wulan Huang. Incremental Data Mining-based Software Failure Detection [J]. Int J Performability Eng, 2020, 16(8): 1279-1288.
[4]	Chunqiao Mi. Student Performance Early Warning based on Data Mining [J]. Int J Performability Eng, 2019, 15(3): 822-833.
[5]	Jinchao Zhao, Shuaichao Wei, and Qiuwen Zhang. Effective Intra Mode Prediction of 3D-HEVC System based on Big Data Clustering and Data Mining [J]. Int J Performability Eng, 2019, 15(12): 3219-3226.
[6]	Yanhua Wang, Yaqiu Liu, and Weipeng Jing. Hadoop-based Parallel Algorithm for Data Mining in Remote Sensing Images [J]. Int J Performability Eng, 2019, 15(11): 2860-2870.
[7]	Chaoyang Ji. A Heuristic Collaborative Filtering Recommendation Algorithm based on Book Personalized Recommendation [J]. Int J Performability Eng, 2019, 15(11): 2936-2943.
[8]	Tiandong Shao and Chunming Zhang. Architectural Design Model based on BIM Management System Model and Data Mining [J]. Int J Performability Eng, 2018, 14(11): 2574-2580.
[9]	Yanping Jiang. Elderly Health Care Interventions under the Mode of Smart Sports Rehabilitation and the Background of Big Data [J]. Int J Performability Eng, 2018, 14(11): 2581-2589.