TUMKFCM-ELM: An Unsupervised Multiple Kernelized Fuzzy C-Means Extreme Learning Machine Approach for Heterogeneous Datasets

doi:10.23940/ijpe.22.03.p5.188200

Abstract

Abstract: Heterogeneity is one of the critical aspects of big data that results in data integration challenges that big data analysis. Heterogeneous data types are also necessary for preprocessing to be unified. The heterogeneity of the benchmark data is summed up when, along with their sampling rate and storage policy, the data type can be indicated. Currently, Kernelized Fuzzy C-Means clustering methodology gained favor in the researching area where numerous functions generated by the kernel are employed in a similarity measure rather than a Euclidean distance, utilized in the traditional Fuzzy C-Means clustering methodology. This methodology also has inconsistencies in the effectiveness like the conventional Fuzzy C-Means clustering methodology because the initial cluster centers are created herein, too, based on the randomized user-defined membership values of objects. This current study presents a modified strategy for eliminating and improving the overall efficiency of the random selection of the Kernelized Fuzzy C-Means clustering approach. This work aims to implement Three-Phase Unsupervised Multiple Kernels Fuzzy C-Means Extreme Learning Machine (TUMKFCM-ELM) approach. Here, we have done work in Three-Phases of this approach: Data Preprocessing (1st phase) and Unsupervised Multiple Kernels Fuzzy C-Means (2nd phase) clustering technique to determine the centroids and membership matrix followed by data preprocessing. These centroids and membership values are updated until the stop criterion is met and obtain the final clusters. At last, ELM has been applied as the 3rd phase of this proposed method to achieve optimal coefficient. Multiple heterogeneous datasets have been collected from numerous sources for this simulation and show an Explanatory data analysis and cluster distributions. We have compared the proposed approach with the previous TUMK-ELM methodology using Accuracy, NMI and purity three validity metrics. This work visualizes the results after the clustering performance and comparable performance in terms of effectiveness. It also provides results in terms of time cost.

Key words: heterogeneous data, unsupervised clustering, multiple kernel functions, kernelized fuzzy C-means, ELM, NMI, Purity

Ankit R. Mune and Sohel A. Bhura. TUMKFCM-ELM: An Unsupervised Multiple Kernelized Fuzzy C-Means Extreme Learning Machine Approach for Heterogeneous Datasets [J]. Int J Performability Eng, 2022, 18(3): 188-200.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1. Ottaway B.S.Mixed Data Classification in Archaeology. ArchéoSciences, revue d'Archéométrie, vol. 5, no. 1, pp. 139-144, 1981.
2. Jin, R. and Liu, H.SWITCH: A Novel Approach to Ensemble Learning for Heterogeneous Data. In European Conference on Machine Learning, Springer, Berlin, Heidelberg, pp. 560-562, 2004.
3. Chen M., Mao S., andLiu Y.Big data: A Survey. Mobile networks and applications, vol. 19, no. 2, pp. 171-209, 2014.
4. Jirkovský, V. and Obitko, M.Semantic Heterogeneity Reduction for Big Data in Industrial Automation. ITAT, vol. 1214, 2014.
5. Wang L.Heterogeneous Data and Big Data Analytics. Automatic Control and Information Sciences, vol. 3, no. 1, pp. 8-15, 2017
6. Breckels L.M., Holden S.B., Wojnar D., Mulvey C.M., Christoforou A., Groen A., Trotter M.W., Kohlbacher O., Lilley K.S., andGatto L.Learning from heterogeneous data sources: an application in spatial proteomics. PLoS computational biology, vol. 12, no. 5, pp. e1004920, 2016.
7. Mune, A.R. and Bhura, S.A.An Analysis of Heterogeneous Data with Extreme Learning via Unsupervised Multiple Kernels. In2nd International Conference on Data, Engineering and Applications (IDEA), IEEE, pp. 1-7, 2020.
8. Abdullin, A. and Nasraoui, O.Clustering Heterogeneous Data Sets. In2012 Eighth Latin American Web Congress, IEEE, pp. 1-8, 2012.
9. Wei M., Chow T.W., andChan R.H.Clustering Heterogeneous Data with K-means by Mutual Information-based Unsupervised Feature Transformation. Entropy, vol. 17, no. 3, pp. 1535-1548, 2015.
10. Xiang L., Zhao G., Li Q., Hao W., andLi F.TUMK-ELM: a Fast Unsupervised Heterogeneous Data Learning Approach. IEEE Access, vol. 6, pp. 35305-35315, 2018.
11. Valdés J.J.Extreme Learning Machines with Heterogeneous Data Types. Neurocomputing, vol. 277, pp. 38-52, 2018.
12. Dey L., Verma I., Khurdiya A., andBharadwaja H.S.A Framework to Integrate Unstructured and Structured Data for Enterprise Analytics. InProceedings of the 16th International Conference on Information Fusion, IEEE, pp. 1988-1995, 2013.
13. Liu F., Zhang G., andLu J.Heterogeneous Unsupervised Domain Adaptation based on Fuzzy Feature Fusion. In2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp. 1-6, 2017.
14. Zhao G., Xiang L., Zhu C., andLi F.Two-stage Unsupervised Multiple Kernel Extreme Learning Machine. In2018 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1-6, 2018.
15. Ali N., Neagu D., andTrundle P.Classification of Heterogeneous Data based on Data Type Impact on Similarity. In UK Workshop on Computational Intelligence, Springer, Cham, pp. 252-263, 2018.
16. Zhu C., Cao L., andYin J.Unsupervised Heterogeneous Coupling Learning for Categorical Representation. IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 533-549, 2020.
17. Chaudhuri A., Samanta D., andSarma M.Two-stage Approach to Feature Set Optimization for Unsupervised Dataset with Heterogeneous Attributes. Expert Systems with Applications, vol. 172, pp. 114563, 2021.
18. Huang G.B., Zhou H., Ding X., andZhang R.Extreme Learning Machine for Regression and Multiclass Classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2, pp. 513-529, 2011
19. Bache, K. and Lichman, M.UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/index.php, accessed by February 2022.