A Comparison of Customer Churn Vector Embedding Models with Deep Learning

Main Article Content

Dinne Ratj
Tjeng Wawan Cenggoro
Namira Mufida Adien
Ni Putu Putri Ardhia Paramita
Nabila Putri Sina
Gregorius Natanael Elwirehardja
Bens Pardamean


In the telecommunication industry, Deep learning has been utilized for churn prediction. Some companies have used sophisticated deep learning techniques to predict churn, which yielded good results. However, future studies are still required to evaluate several deep learning mechanisms as only SoftMax Loss has been used so far. By comparing customer churn vector embedding models with several methods, including SoftMax Loss, Large Margin Cosine Loss, Semi-Supervised Learning, and a combination of Large Margin Cosine Loss and Semi-Supervised Learning, we continue our previous research to apply deep learning in predicting customer churn in the telecommunications industry in this paper. The use of Large Margin Cosine Loss has been proven in face recognition which can increase the discrimination between vectors embedding in different classes. Understanding how mixing unlabeled and labeled input might alter developing algorithms and learning behavior that benefit from this combination are the goals of semi-supervised learning. This approach successfully encouraged feature discrimination in customer behavior as well as improved the overall accuracy of the model. Large Margin Cosine Loss in this study achieved 83.74% of the F1 Score compared to other methods. It was further demonstrated that the produced vectors for churn prediction are discriminative by examining the cluster's similarity and the t-SNE plot.

Article Details

How to Cite
D. Ratj, “A Comparison of Customer Churn Vector Embedding Models with Deep Learning”, AJSE, vol. 23, no. 1, pp. 1 - 10, Apr. 2024.


[1] L. Yan, M. Fassino, and P. Baldasare, “Predicting customer behavior via calling links,” in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., pp. 2555–2560, 2005.
[2] M. C. Mozer, R. Wolniewicz, D. B. Grimes, E. Johnson, and H. Kaushansky, “Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry,” IEEE Trans Neural Netw, vol. 11, no. 3, pp. 690–696, 2000.
[3] A. A. Khan, S. Jamwal, and M. M. Sepehri, “Applying data mining to customer churn prediction in an internet service provider,” Int J Comput Appl, vol. 9, no. 7, pp. 8–14, 2010.
[4] R. Mattison, The telco churn management handbook. Lulu. com, 2006.
[5] C. Kirui, L. Hong, W. Cheruiyot, and H. Kirui, “Predicting customer churn in mobile telephony industry using probabilistic classifiers in data mining,” International Journal of Computer Science Issues (IJCSI), vol. 10, no. 2 Part 1, p. 165, 2013.
[6] O. Adwan, H. Faris, K. Jaradat, O. Harfoushi, and N. Ghatasheh, “Predicting customer churn in telecom industry using multilayer preceptron neural networks: Modeling and analysis,” Life Sci J, vol. 11, no. 3, pp. 75–81, 2014.
[7] X. Liu et al., “A semi-supervised and inductive embedding model for churn prediction of large-scale mobile games,” in 2018 ieee international conference on data mining (icdm), pp. 277–286, 2018.
[8] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowl Based Syst, vol. 46, pp. 109–132, 2013.
[9] T. W. Cenggoro, R. A. Wirastari, E. Rudianto, M. I. Mohadi, D. Ratj, and B. Pardamean, “Deep learning as a vector embedding model for customer churn,” Procedia Comput Sci, vol. 179, pp. 624–631, 2021. doi: 10.1016/j.procs.2021.01.048.
[10] H. Wang et al., “Cosface: Large margin cosine loss for deep face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5265–5274, 2018.
[11] N. Castle, “What is Semi-Supervised Learning,” Oracle DataScience. Com, 2018.
[12] N. Dominic, Daniel, T. W. Cenggoro, A. Budiarto, and B. Pardamean, “Transfer learning using inception-resnet-v2 model to the augmented neuroimages data for autism spectrum disorder classification,” Communications in Mathematical Biology and Neuroscience, vol. 2021, no. art. 39, 2021. doi: 10.28919/cmbn/5565.
[13] B. Pardamean, H. Soeparno, A. Budiarto, B. Mahesworo, and J. Baurley, “Quantified self-using consumer wearable device: predicting physical and mental health,” Healthcare Informatics Research, vol. 26, no. 2, pp. 83–92, 2020. doi: 10.4258/hir.2020.26.2.83.
[14] H. Prabowo, T. W. Cenggoro, A. Budiarto, A. S. Perbangsa, H. H. Muljo, and B. Pardamean, “Utilizing mobile-based deep learning model for managing video in knowledge management system,” IJIM, vol. 12, no. 6, pp. 62-73, 2018. doi: 10.3991/ijim.v12i6.8563.
[15] S. Agrawal, A. Das, A. Gaikwad, and S. Dhage, “Customer churn prediction modelling based on behavioural patterns analysis using deep learning,” in 2018 International conference on smart computing and electronic enterprise (ICSCEE), pp. 1–6, 2018
[16] S. Cao, W. Liu, Y. Chen, and X. Zhu, “Deep learning based customer churn analysis,” in 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–6, 2019.
[17] F. Castanedo, G. Valverde, J. Zaratiegui, and A. Vazquez, “Using deep learning to predict customer churn in a mobile telecommunication network,” Erişim adresi: https://www. wiseathena. com/pdf/wa_dl. pdf. Erişim tarihi, vol. 30, 2014.
[18] M. Karanovic, M. Popovac, S. Sladojevic, M. Arsenovic, and D. Stefanovic, “Telecommunication services churn prediction-deep learning approach,” in 2018 26th Telecommunications Forum (TELFOR), pp. 420–425, 2018.
[19] S. Kumar and M. Kumar, “Predicting customer churn using artificial neural network,” in International Conference on Engineering Applications of Neural Networks, pp. 299–306, 2019.
[20] A. Mishra and U. S. Reddy, “A novel approach for churn prediction using deep learning,” in 2017 IEEE international conference on computational intelligence and computing research (ICCIC), pp. 1–4, 2017.
[21] E. Lee et al., “Game data mining competition on churn prediction and survival analysis using commercial game log data,” IEEE Trans Games, vol. 11, no. 3, pp. 215–226, 2018.
[22] X. Liu et al., “Micro-and macro-level churn analysis of large-scale mobile games,” Knowl Inf Syst, vol. 62, no. 4, pp. 1465–1496, 2020.
[23] E. Domingos, B. Ojeme, and O. Daramola, “Experimental analysis of hyperparameters for deep learning-based churn prediction in the banking sector,” Computation, vol. 9, no. 3, p. 34, 2021.
[24] P. S. Patil and N. V Dharwadkar, “Analysis of banking data using machine learning,” in 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pp. 876–881, 2017.
[25] M. Chen, “A MapReduce-based Artificial Neural Network Churn Prediction for Music Streaming Service,” IJCSNS, vol. 22, no. 1, p. 55, 2022.
[26] J. Zhou, J. Yan, L. Yang, M. Wang, and P. Xia, “Customer churn prediction model based on lstm and cnn in music streaming,” DEStech Transactions on Engineering and Technology Research, vol. 5, 2019.
[27] A. Dingli, V. Marmara, and N. S. Fournier, “Comparison of deep learning algorithms to predict customer churn within a local retail industry,” Int J Mach Learn Comput, vol. 7, no. 5, pp. 128–132, 2017.
[28] “Churn in Telecom’s dataset,” 2020. https://www.kaggle.com/becksddf/churn-in-telecoms-dataset
[29] L. Yan, R. H. Wolniewicz, and R. Dodier, “Predicting customer behavior in telecommunications,” IEEE Intell Syst, vol. 19, no. 2, pp. 50–58, 2004.
[30] A. Budiarto, R. Rahutomo, H. N. Putra, T. W. Cenggoro, M. F. Kacamarga, and B. Pardamean, “Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering,” Procedia Comput Sci, vol. 179, pp. 40–46, 2021. doi: 10.1016/j.procs.2020.12.007.
[31] K. Muchtar, F. Rahman, T.W. Cenggoro, A. Budiarto, B. Pardamean, “An Improved Version of Texture-based Foreground Segmentation: Block-based Adaptive Segmenter,” Procedia Computer Science, vol. 135, pp. 579–586, 2018. doi: 10.1016/j.procs.2018.08.228.
[32] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J Big Data, vol. 6, no. 1, pp. 1–54, 2019.
[33] R.E. Caraka, R.C. Chen, H. Yasin, Suhartono, Y. Lee, and B. Pardamean, “Hybrid vector autoregression feedforward neural network with genetic algorithm model for forecasting space-time pollution data,” Indonesian Journal of Science and Technology, vol. 6, no. 1, pp. 243–268, 2021. doi: 10.17509/ijost.v6i1.32732.
[34] L. der Maaten and G. Hinton, “Visualizing data using t-SNE.,” Journal of machine learning research, vol. 9, no. 11, 2008.
[35] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448–456, 2015.
[36] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.