Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (10): 2805-2816.doi: 10.23940/ijpe.19.10.p27.28052816

• Orginal Article • Previous Articles     Next Articles

A Distributed Frequent Itemset Mining Algorithm for Uncertain Data

Jiaman Dingab*, Haibin Liab, Yang Yangab, Lianyin Jiaab, and Jinguo Youab   

  1. aFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China
    bArtificial Intelligence Key Laboratory of Yunnan Province, Kunming, 650500, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: Ding Jiaman
  • About author:

    * Corresponding author. E-mail address: tjoman@126.com

  • Supported by:
    The paper is supported by the National Natural Science Foundation of China (No 51467007, 61562054, and 61462050)

Abstract:

With the rapidly expansion of big data in all domains, it has become a major research topic to improve the performance of mining frequent patterns in massive uncertain datasets in recent years. Most conventional frequent pattern mining approaches take expect, probability, or weight as one single factor of item support, and algorithms that consider both probability and weight are unable to balance execution efficiency under the circumstances of big data. Therefore, we propose a distributed frequent itemset mining algorithm for uncertain data: Dfimud. Firstly, Dfimud calculates the maximum probability weight value of 1-items and prunes the items whose value is less than the given threshold. Secondly, to reduce the times of scanning the datasets, a distributed Dfimud-tree structure inspired by FP-Tree is designed to mine frequent patterns. Finally, experiments on publicly available UCI datasets demonstrate that Dfimud achieves more optimal results than other related approaches across various metrics. In addition, the empirical study also shows that Dfimud has good scalability.

Key words: data mining, uncertain data, frequent itemset, distributed framework