Credit_Card_Fraud_Detection_Using_Machin.

The Best WritersThirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 71Available Online at www.ijcsmc.comInternational Journal of Computer Science and Mobile ComputingA Monthly Journal of Computer Science and Information TechnologyISSN 2320 088X IMPACT FACTOR: 7.056IJCSMC, Vol. 10, Issue. 4, April 2021, pg.71 79CREDIT CARD FRAUD DETECTIONUSING MACHINE LEARNINGMr. Thirunavukkarasu.M1; Achutha Nimisha2; Adusumilli Jyothsna31Assistant Professor, Dept. of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India([email protected]) 2Student, Dept. of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India ([email protected])3Student, Dept. of CSE, SCSVMV (Deemed to be University), Kanchipuram, TamilNadu, India ([email protected])DOI: 10.47760/ijcsmc.2021.v10i04.011———————————————————————***———————————————————————Abstract- This Project is focused on credit card fraud detection in real world scenarios. Nowadays credit card frauds aredrastically increasing in number as compared to earlier times. Criminals are using fake identity and various technologies totrap the users and get the money out of them. Therefore, it is very essential to find a solution to these types of frauds. In thisproposed project we designed a model to detect the fraud activity in credit card transactions. This system can provide most ofthe important features required to detect illegal and illicit transactions. As technology changes constantly, it is becomingdifficult to track the behavior and pattern of criminal transactions. To come up with the solution one can make use oftechnologies with the increase of machine learning, artificial intelligence and other relevant fields of information technology; it becomes feasible to automate this process and to save some of the intensive amounts of labor that is put into detecting creditcard fraud. Initially, we will collect the credit card usage data-set by users and classify it as trained and testing dataset using arandom forest algorithm and decision trees. Using this feasible algorithm, we can analyze the larger data-set and user providedcurrent data-set. Then augment the accuracy of the result data. Proceeded with the application of processing of some of theattributes provided which can find affected fraud detection in viewing the graphical model of data visualization. Theperformance of the techniques is gauged based on accuracy, sensitivity, and specificity, precision. The results is indicatedconcerning the best accuracy for Random Forest are unit 98.6% respectively.Keywords— Random forest algorithm, Criminal transactions, Credit card1. INTRODUCTIONNowadays Credit card usage has been drastically increased across the world, now people believe ingoing cashless and are completely dependent on online transactions. The credit card has made the digitaltransaction easier and more accessible. A huge number of dollars of loss are caused every year by thecriminal credit card transactions. Fraud is as old as mankind itself and can take an unlimited variety ofdifferent forms. The PwC global economic crime survey of 2017 suggests that approximately 48% oforganizations experienced economic crime. Therefore, there’s positively a necessity to unravel the matter of credit card fraud detection. Moreover, the growth of new technologies provides supplementary ways inwhich criminals may commit a scam. The use of credit cards is predominant in modern day society andcredit card fraud has been kept on increasing in recent years. Huge Financial losses have been fraudulenteffects on not only merchants and banks but also the individual person who are using the credits. Fraudmay also affect the reputation and image of a merchant causing non-financial losses that. For example, if acardholder is a victim of fraud with a certain company, he may no longer trust their business and choose acompetitor. Fraud Detection is the process of monitoring the transaction behavior of a cardholder toThirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 72detect whether an incoming transaction is authentic and authorized or not otherwise it will be detected asillicit. In a planned system, we are applying the random forest algorithm for classifying the credit carddataset. Random Forest is an associate in the nursing algorithmic program for classification and regression.Hence, it is a collection of decision tree classifiers. The random forest has an advantage over the decisiontree as it corrects the habit of over fitting to their training set. A subset of the training set is sampledrandomly so that to train each individual tree and then a decision tree is built, each node then splits on afeature designated from a random subset of the complete feature set. Even for large data sets with manyfeatures and data instances, training is extremely fast in the random forest and because each tree istrained independently of the others. The Random Forest algorithm has been found to provide a goodestimate of the generalization error and to be resistant to overfitting.1.1 ADVANTAGES Random Forest selects the best feature rather than the most important feature among a random subset of data resulting in a better model. Thus having a binary classification of fraud i.e. positive case (value 1) and non-fraud i.e. negative case (value 0) for the target category in the transaction amount.There are various fraudulent activities detection techniques has implemented in credit card transactionshave been kept in researcher minds to methods to develop models based on artificial intelligence, datamining, fuzzy logic and machine learning. Credit card fraud detection is a very troublesome, but also apopular problem to solve. In our proposed system we built the credit card fraud detection using Machinelearning.With the advancement of machine learning techniques. Machine learning has been recognized as a no-hitlive for fraud detection. A great deal of data is transferred throughout on-line transaction processes,resulting in a binary result: genuine or fraudulent. Online businesses are able to identify fraudulenttransactions accurately because they receive chargebacks on them. Within the sample fraudulent datasets,features are constructed. These area unit information points like the age and price of the client account, aswell as the origin of the credit card. There are many options and everyone contributes, to varying extents,towards the fraud probability. Note, the degree within which every feature contributes to the fraud scoreisn’t determined by a fraud analyst, but is generated by the artificial intelligence of the machine which isdriven by the training set. So, in regard to the card fraud, if the use of cards to commit fraud is proven tobe high, the fraud weighting of a transaction that uses a credit card will be equally so. However, if this wereto diminish, the contribution level would parallel. Simply put, these models self-learn while not expressprogramming like with manual review. Credit card fraud detection using Machine learning is done bydeploying the classification and regression algorithms. We use a supervised learning algorithm such asRandom forest algorithm to classify the fraud card transaction online or by offline. Random fore st is anadvanced version of the Decision tree. The random forest has better efficiency and accuracy than the othermachine learning algorithms. Random forest aims to reduce the previously mentioned correlation issue bychoosing only a subsample of the feature space at each split. Essentially, it aims to make the trees de-correlated and prune the trees by setting a stopping criterion for node splits1.2 SCOPE OF THE PROPOSED WORKIn this proposed project we designed a protocol or a model to detect the fraud activity in credit cardtransactions. This system is capable of providing most of the essential features required to detectfraudulent and legitimate transactions. As technology changes, it becomes difficult to track the Modelingand pattern of fraudulent transactions. With the rise of machine learning, artificial intelligence and otherrelevant fields of information technology, it becomes feasible to automate this process and to save someof the intensive amount of labour that is put into detecting credit card fraud.Thirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 732. SOFTWARE AND HARDWARE REQUIREMENT2.1 Hardware OS Windows 7, 8 and 10 (32 and 64 bit)  RAM 4GB2.2 Software Python  Anaconda3. SYSTEM ARCHITECTUREFig 1 System Architecture4. LITERATURE SURVEYFraudulent Detection in Credit Card System Using SVM & Decision Tree (Vijayshree B. Nipane, Poonam S.Kalinge, Dipali Vidhate, Kunal War, Bhagyashree P. Deshpande): With growing advancement in theelectronic commerce field, fraud is spreading all over the world, causing major financial losses. In thecurrent scenario, Major cause of financial losses is credit card fraud; it not only affects tradesperson butalso individual clients. Decision tree, Genetic algorithm, Metalearning strategy, neural network, HMM arethe presented methods used to detect credit card frauds. In contemplating system for fraudulentdetection, artificial intelligence concept of Support Vector Machine (SVM) & decision tree is being used tosolve the problem. Thus by the implementation of this hybrid approach, financial losses can be reduced togreater extent.Machine Learning Based Approach to Financial Fraud Detection Process in Mobile Payment System(Dahee Choi and Kyungho Lee): Mobile payment fraud is the unauthorized use of mobile transactionthrough identity theft or credit card stealing to fraudulently obtain money. Mobile payment fraud is a fastgrowing issue through the emergence of smartphone and online transition services. In the real world, ahighly accurate process in mobile payment fraud detection is needed since financial fraud causes financialloss. Therefore, our approach proposed the overall process of detecting mobile payment fraud based onmachine learning, supervised and unsupervised method to detect fraud and process large amounts offinancial data. Moreover, our approach performed sampling process and feature selection process for fastThirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 74processing with large volumes of transaction data and to achieve high accuracy in mobile paymentdetection. F-measure and ROC curve are used to validate our proposed model. 5. PURPOSE OF THEPROJECT We propose a Machine learning model to detect fraudulent credit card activities in onlinefinancial transactions. Analyzing fake transactions manually is impracticable due to vast amounts of dataand its complexity. However, adequately given informative features, could make it is possible usingMachine Learning. This hypothesis will be explored in the project. To classify fraudulent and legitimatecredit card transaction by supervised learning Algorithm such as Random forest. To help us to getawareness about the fraudulent and without loss of any financially.5. PURPOSE OF THE PROJECTWe propose a Machine learning model to detect fraudulent credit card activities in online financialtransactions. Analyzing fake transactions manually is impracticable due to vast amounts of data and itscomplexity. However, adequately given informative features, could make it is possible using MachineLearning. This hypothesis will be explored in the project.To classify fraudulent and legitimate credit card transaction by supervised learning Algorithm such asRandom forest. To help us to get awareness about the fraudulent and without loss of any financially.5.1 PACKAGESWhich are being used for data exploration, pro processing and for using random forest algorithm are: NumPy: For simple arrays.  Pandas: For reading the file.  SciKit: Learn- for pre-processing.  Matplotlib or Seaborn: For plotting and representing confusion matrix colour format.  Tensor flow: For matrix format.6. MODULES Data collection  Data pre-processing  Feature extraction  Evaluation model6.1 Data Collection:Data used in this paper is a set of product reviews collected from credit card transactions records. Thisstep is concerned with selecting the subset of all available data that you will be working with. ML problemsstart with data preferably, lots of data (examples or observations) for which you already know the targetanswer. Data for which you already know the target answer is called labelled data.Thirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 75Fig. 2: Importing python packages for data exploration, preprocessing and for using random6.2 Data Pre-processingPre-processing is the process of three important and common steps as follows: Formatting: It is the process of putting the data in a legitimate way that it would be suitable to work with. Format of the data files should be formatted according to the need. Most recommended format is.csv files. Cleaning: Data cleaning is a very important procedure in the path of data science as it constitutes the major part of the work. It includes removing missing data and complexity with naming category and so on.For most of the data scientists, Data Cleaning continues of 80% of work. Sampling: This is the technique of analyzing the subsets from whole large datasets, which could provide a better result and help in understanding the behavior and pattern of data in an integrated way6.3 Data ExplorationFig. 3: Data exploration6.3.1 Pre-processing with python commandsSTEP 1:Fig. 4: Pre-processingThirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 76STEP2:Fig. 5: Preprocessing Step 2STEP 3: Acquired trained and testing dataset from the large datasetFig. 6: Training and testing dataFig. 7: Process of training and testing data extraction6.4 Data visualizationData Visualisation is the method of representing the data in a graphical and pictorial way, datascientists depict a story by the results they derive from analysing and visualising the data. The best toolused is Tableau which has many features to play around with data and fetch wonderful results.Thirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 776.5 Feature extractionFeature extraction is the process of studying the behavior and pattern of the analyzed data anddraw the features for further testing and training. Finally, our models are trained using the Classifieralgorithm. We use classify module on Natural Language Toolkit library on Python. We use the labelleddataset gathered. The rest of our labelled data will be used to evaluate the models. Some machine learningalgorithms were used to classify pre-processed data. The chosen classifiers were Random forest. Thesealgorithms are very popular in text classification tasks.6.6 Evaluation modelModel Evaluation is an essential part of the model development process. It helps to find the best modelthat represents our data and how well the selected model will work in the future. Evaluating modelperformance with the data used for training is not acceptable in data science because it can effortlesslygenerate overoptimistically and over fitted models. To avoid overfitting, evaluation methods such as holdout and cross-validations are used to test to evaluate model performance. The result will be in thevisualized form. Representation of classified data in the form of graphs. Accuracy is well-defined as theproportion of precise predictions for the test data. It can be calculated easily by mathematical calculationi.e. dividing the number of correct predictions by the number of total predictions.7. ALGORITHM7.1 Random ForestRandom forest is a supervised machine learning algorithm based on ensemble learning. Ensemblelearning is an algorithm where the predictions are derived by assembling or bagging different models orsimilar model multiple times. The random forest algorithm works in a similar way and uses multiplealgorithm i.e. multiple decision trees, resulting in a forest of trees, hence the name “Random Forest”. Therandom forest algorithm can be used for both regression and classification tasks.7.1.1 Advantages of using random forest The random forest algorithm is not biased and depends on multiple trees where each tree is trained separately based on the data, therefore biasedness is reduced overall. It’s a very stable algorithm. Even if a new data point is introduced in the dataset it doesn’t affect the overall algorithm rather affect the only a single tree. It works well when one has both categorical and numerical features.  The random forest algorithm also works well when data possess missing values, or when it’s not been scaled properly. Thus, using this Random forest algorithm and decision trees algorithm we have extractedthe accurate percentage of detection of fraud from the given dataset by studying its behavior. A confusionmatrix is basically a summary of prediction results or a table which is used to describe the performance ofthe classifier on a set of test data where true values are known. It provides visualization of an algorithm’s performance and allows easy identification of classes. Thus, resulting in the computing of mostperformance measures by giving insights not only the errors being made by the classification model butalso tells the type of errors being made. Trained Data and Testing Data is represented in a confusion matrixwhich portrays: TP: True Positive which denotes the real data where customers are subjected to fraud and are used for training and were accurately predicted. TN: True Negative denotes the data which was not predicted and doesn’t match with the data which was subjected to the fraud. FP: False Positive is predicted but there is no possibility of the data to be subjected to the fraud.  FN: False Negative is not predicted but there is an actual possibility of the data who is subjected to fraud. Comprehensive 2D/3D plottingThirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 78Fig. 8 Confusion matrix for testing datasetFig. 9: Confusion matrix for testing datasetFig. 10: Accurate result extracted from the random forest classification and regression model usingdecision trees8. CONCLUSIONHence, we have acquired the result of an accurate value of credit card fraud detection i.e.0.9994802867383512 (99.93%) using a random forest algorithm with new enhancements. In comparison toexisting modules, this proposed module is applicable for the larger dataset and provides more accurateresults. The Random forest algorithm will provide better performance with many training data, but speedduring testing and application will still suffer. Usage of more pre-processing techniques would also assist.Thirunavukkarasu.M et al, International Journal of Computer Science and Mobile Computing, Vol.10 Issue.4, April- 2021, pg. 71-79© 2021, IJCSMC All Rights Reserved 79Our future work will try to represent this into a software application and provide a solution for credit cardfraud using the new technologies like Machine Learning, Artificial Intelligence and Deep Learning.REFERENCES [1]. https://towardsdatascience.com/the-random-forestalgorithm-d457d499ffcd [2]. https://www.xoriant.com/blog/productengineering/decision-trees-machine-learningalgorithm.html [3]. Gupta, Shalini, and R. Johari. ”A New Framework for Credit Card Transactions Involvin g Mutual Authenticationbetween Cardholder and Merchant.” International Conference on Communication Systems and Network Technologies IEEE, 2021:22-26.[4]. Y. Gmbh and K. G. Co, “Global online payment methods: the Full year 2020,” Tech. Rep., 3 2020. [5]. Bolton, Richard J., and J. H. David. “Unsupervised Profiling Methodsfor Fraud Detection.” Proc Credit Scoring andCredit Control VII (2020): 5 7. [6]. Drummond, C., and Holte, R. C. (2019). C4.5, class imbalance, and cost sensitivity: why under-sampling beatsoversampling. Proc of the ICML Workshop on Learning from Imbalanced Datasets II, 1 8. [7]. Quah, J. T. S., and Sriganesh, M. (2020). Real-time credit card fraud detection using computational intelligence. ExpertSystems with Applications, 35(4), 1721-1732.