Int J Performability Eng ›› 2022, Vol. 18 ›› Issue (3): 188-200.doi: 10.23940/ijpe.22.03.p5.188200

Previous Articles     Next Articles

TUMKFCM-ELM: An Unsupervised Multiple Kernelized Fuzzy C-Means Extreme Learning Machine Approach for Heterogeneous Datasets

Ankit R. Munea,* and Sohel A. Bhurab   

  1. aBNCOE, Pusad, 445215, India;
    bJIT, Nagpur, 441111, India
  • Contact: * E-mail address: mune.ankit@gmail.com

Abstract: Heterogeneity is one of the critical aspects of big data that results in data integration challenges that big data analysis. Heterogeneous data types are also necessary for preprocessing to be unified. The heterogeneity of the benchmark data is summed up when, along with their sampling rate and storage policy, the data type can be indicated. Currently, Kernelized Fuzzy C-Means clustering methodology gained favor in the researching area where numerous functions generated by the kernel are employed in a similarity measure rather than a Euclidean distance, utilized in the traditional Fuzzy C-Means clustering methodology. This methodology also has inconsistencies in the effectiveness like the conventional Fuzzy C-Means clustering methodology because the initial cluster centers are created herein, too, based on the randomized user-defined membership values of objects. This current study presents a modified strategy for eliminating and improving the overall efficiency of the random selection of the Kernelized Fuzzy C-Means clustering approach. This work aims to implement Three-Phase Unsupervised Multiple Kernels Fuzzy C-Means Extreme Learning Machine (TUMKFCM-ELM) approach. Here, we have done work in Three-Phases of this approach: Data Preprocessing (1st phase) and Unsupervised Multiple Kernels Fuzzy C-Means (2nd phase) clustering technique to determine the centroids and membership matrix followed by data preprocessing. These centroids and membership values are updated until the stop criterion is met and obtain the final clusters. At last, ELM has been applied as the 3rd phase of this proposed method to achieve optimal coefficient. Multiple heterogeneous datasets have been collected from numerous sources for this simulation and show an Explanatory data analysis and cluster distributions. We have compared the proposed approach with the previous TUMK-ELM methodology using Accuracy, NMI and purity three validity metrics. This work visualizes the results after the clustering performance and comparable performance in terms of effectiveness. It also provides results in terms of time cost.

Key words: heterogeneous data, unsupervised clustering, multiple kernel functions, kernelized fuzzy C-means, ELM, NMI, Purity