A couple months ago i started a new adventure on my professional life, study a new area, after 11 years on Support/Implementation Software. I worked on BI project, i use ETL on my work, but Data Science always something that i tell to myself “this is not for you, you are not a statistical”, but i’m always was fascinatedto “predict the future, based on past comportament”.
So i started to search training courses on this area. I participated a bootcamp, some free courses on internet, but nothing that burn my hearth. Until i know the Meigarom Lopes and hist website “Seja um Data Scientist”. I suspected on first, but he convince me to try his course “Data Science em Produção”. This course open my mind with Meigarom “framework” and how he bring his experience how data scientist to us, begginers and the diverses applications of Data Science.
After finalizing his course, and ear several times “You need to praticated and make your projects portfolio”, i started particate on “Comunidade DS”, a community of data science students and professionals dedicatede to make projects and share our experiences each other.
And were we are, i get a project that community worked in past and tried, by myself, applied all my knowledge until now.
The Blocker Fraud Company is a company specialized in detecting fraud in financial transactions made through mobile devices. The company has a service called “Blocker Fraud” in which it guarantees the blocking of fraudulent transactions.
And the business model of the company is of the Service type with the monetization made by the performance of the service provided, that is, the user pays a fixed fee on the success in detecting fraud in the customer’s transactions.
However, the Blocker Fraud Company is expanding in Brazil and to acquire customers more quickly, it has adopted a very aggressive strategy. The strategy works as follows:
- The company will receive 25% of the value of each transaction that is truly detected as fraud.
- The company will receive 5% of the value of each transaction detected as fraud, but the transaction is truly legitimate.
- The company will return 100% of the value to the customer, for each transaction detected as legitimate, however the transaction is truly a fraud.
- With this aggressive strategy, the company assumes the risks of failing to detect fraud and is remunerated for assertive fraud detection.
For the client, it is an excellent business to hire the Blocker Fraud Company. Although the fee charged is very high on success, 25%, the company reduces its costs with fraudulent transactions correctly detected and the damage caused by an error in the anti-fraud service will be covered by the Blocker Fraud Company itself.
For the company, in addition to getting many customers with this risky strategy to guarantee reimbursement in the event of a failure to detect customer fraud, it depends only on the precision and accuracy of the models built by its Data Scientists, that is, how much the more accurate the “Blocker Fraud” model, the greater the company’s revenue. However, if the model has low accuracy, the company could have a huge loss.
You have been hired as a Data Science Consultant to create a model of high precision and accuracy in detecting fraud of transactions made through mobile devices.
At the end of your consultancy, you need to deliver to the CEO of Blocker Fraud Company a model in production in which your access will be made via API, that is, customers will send their transactions via API so that your model classifies them as fraudulent or legitimate.
In addition, you will need to submit a report reporting your model’s performance and results in relation to the profit and loss that the company will have when using the model you produced. Your report should contain the answers to the following questions:
- What is the model’s Precision and Accuracy?
- How Reliable is the model in classifying transactions as legitimate or fraudulent?
- What is the Expected Billing by the Company if we classify 100% of transactions with the model?
- What is the Loss Expected by the Company in case of model failure?
- What is the Profit Expected by the Blocker Fraud Company when using the model?
The dataset is available on kaggle plataform, but all the business context was extracted from page “Seja um Data Scientists”.
In this step i tried to know the dataset. Ilook to dataset, to find how big it is, how many columns and rows it have, if exists null rows, and some statisticals from data.
Two points made me attention on this step. The size of dataset (6MM of rows) and unbalanced between frauds(0,13%) and legithimcs transactions(99,87%).
After overview, we start to create Hypotheses about data. I did this step before analyse all data to avoid be influenced by what i saw. I start to create a Mind map with all features of dataset and try get insights by this features.
I did some derivating feature, but the features that increased significantily the models performances was:
diffOrig — Delta between old Balance and new Balance based on Amount Transaction. In regular case must to be equal zero
diffDest — Delta between old Balance and new Balance based on Amount Transaction, in transfer transactions to regular customer. Anothers transactions types dont update dest Balance. In regular case must to be equal zero
qtdTransferOrigName — Quantity transactions from accountName
qtdTransferDestName — Quantity transactions to accountName
This is the most challenge step for me, i’m still working hard to better on this step, because irealize hist importance, but i broke my mind sometimes.
In this step i try look insights, values and informations about the business and fenomenos. I Applied logarithm to allow see behavior of frauds, because of unbalanced proportion of frauds e legitimacy transactions.
The points made me attention was the balance difference between old balance and new balance by amount transaction.