Int J Performability Eng ›› 2023, Vol. 19 ›› Issue (10): 663-675.doi: 10.23940/ijpe.23.10.p4.663675

Previous Articles     Next Articles

Customer Churn Analysis using Spark and Hadoop

Priyanshu Verma, Ishan Sharma, Sonia Deshmukh, and Rohit Vashisht*   

  1. KIET group of Institutions, Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India
  • Contact: * E-mail address: rohit.vashisht@kiet.edu

Abstract: Predicting Customer churn is one of the telecommunication industry's biggest challenges. Why did their customers quit using their product, site, service, or subscription? Machine learning with Spark and Hadoop has considerably increased the ability to predict customer behaviours. The most popular predictive models, such as logistic regression, Binary Classification Evaluator, and Multi Classification Evaluator, have been used in the prediction process. Enhancing and outfit approaches are used on the training dataset to examine the impact on model effectiveness. Additionally, to further optimize the hyperparameters and produce the models, a K-fold cross-validation method is utilized to train the dataset. Finally, the test data were examined by the AUC-ROC curve and confusion matrix. In this research, an adaptation of Spark and Hadoop frameworks is made to predict customer churn. The data is pre-processed, feature analyses are performed, and the feature selection is carried out using the Vector Assembler algorithm. This study aims to analyse customer behaviors by using a dataset.

Key words: Hadoop and Spark, Machine Learning, Logistic regression, Random Forest, Vector Assembler, Binary Classification Evaluator