An Efficient Credit Card Fraud Detection Model

Document Type : Original Article

Authors

1 MSA University 6 October, Egypt

2 Departement of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University Menoufia, Egypt

Abstract

online transaction is the most popular mode of payment over the internet. Financial institutes such as banking organization provides various online services for customers such as e-commerce and e-cash. Credit card is one of the supreme conventional methods of online transaction. In recent times, criminal can use illegal ways to carry out fraud transaction by credit card over e-services. Due to the growing volume of electronic payments, the monetary strain of credit-card fraud is turning into a substantial challenge for financial institutions and service providers, thus forcing them to continuously improve their fraud detection systems. Therefore, there is a serious need to develop efficient credit card fraud detection for mortgage companies, financial institutes, credit card enterprises, and banking system.  One of the most common problems in with building credit card detection model is imbalanced data sets. The data set can be imbalanced when the examples of one class significantly outnumber the examples of the other one, i.e., classification becomes very tough as the result may get biased by the dominating class values. In this paper, we will apply two techniques to get rid of this problem for credit card data. Then, the imbalanced dataset used in developing an efficient credit card fraud detection model. In the proposed model, different machine learning algorithms are used such as K-Nearest Neighbor (KNN), Logistic Regression (REG), Latent Dirichlet Allocation (LDA), Classification And Regression Tree (CART), and Naïve Bayes (NB). The results show that LDA gives 99.9543 which is best accuracy results while CART algorithm gives 99.9797 which is the higher accuracy in case of up-sampling, and finally, in down-sampling, LDA gives 94.9238 which is the higher accuracy result.

[1]      Yap B.W., Rani K.A., Rahman H.A.A., Fong S., Khairudin Z., Abdullah N.N. (2014) An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. In: Herawan T., Deris M., Abawajy J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore
[2]      Hartono, Hartono & Sitompul, Opim & Tulus, Tulus & Nababan, Erna. (2018). Biased support vector machine and weighted-SMOTE in handling class imbalance problem. International Journal of Advances in Intelligent Informatics. 4. 21. 10.26555/ijain.v4i1.146.
[3]      Kotsiantis, Sotiris & Kanellopoulos, D & Pintelas, P. (2005). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering. 30. 25-36.
[4]      Pattanayak, Sanjibani & Rout, Minakhi. (2018). Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models. 10.1007/978-981-10-6875-1_2.
[5]      López García, Pedro & Masegosa, Antonio & Onieva, Enrique & Osaba, Eneko. (2018). Ensemble and Fuzzy Techniques Applied to Imbalanced Traffic Congestion Datasets: A Comparative Study. 10.1007/978-3-319-91641-5_16.
[6]      Rout, Neelam. (2018). Handling Imbalanced Data: A Survey.
[7]      Ali, Haseeb & Salleh, Mohd & Saedudin, Rohmat & Hussain, Kashif & Mushtaq, Muhammad. (2019). Imbalance class problems in data mining: A review. Indonesian Journal of Electrical Engineering and Computer Science. 14. 10.11591/ijeecs.v14.i3.pp1552-1563.
[8]      Kaya, Heysem & Karpov, Alexey. (2017). Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold. 10.21437/Interspeech.2017-653.
[9]      More, Ajinkya. (2016). Survey of resampling techniques for improving classification performance in unbalanced datasets.
[10]   Ali, Aida & Shamsuddin, Siti Mariyam & Ralescu, Anca. (2015). Classification with class imbalance problem: A review. 7. 176-204.
[11]   Dal Pozzolo, Andrea & Caelen, Olivier & Bontempi, Gianluca. (2015). When is Undersampling Effective in Unbalanced Classification Tasks?. 10.1007/978-3-319-23528-8_13.
[12]   Satyasree, K & Murthy, J. (2013). An exhaustive literature review on class imbalance problem. Int. J. Emerg. Trends Technol. Comput. Sci.. 2. 109-118.
[13]   Zhaoke, Huang & Yang, Chunhua & Chen, Xiaofang & Huang, Keke & Xie, Yongfang. (2019). Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis. Neural Computing and Applications. 10.1007/s00521-019-04208-7.
[14]   Credit Card Fraud, https://www.kaggle.com/samkirkiles/credit-card-fraud/data     [last accessed: 9-4-2018]
[15]   Jurgovsky, Johannes & Granitzer, Michael & Ziegler, Konstantin & Calabretto, Sylvie & Portier, Pierre-Edouard & He, Liyun & Caelen, Olivier. (2018). Sequence Classification for Credit-Card Fraud Detection. Expert Systems with Applications. 100. 10.1016/j.eswa.2018.01.037.
Volume 28, ICEEM2019-Special Issue
ICEEM2019-Special Issue: 1st International Conference on Electronic Eng., Faculty of Electronic Eng., Menouf, Egypt, 7-8 Dec.
2019
Pages 332-336