Reducing False Positives for AML: Part III – Model Governance
08/12/2019 by Chris St. Jeor Financial Crimes
Imagine you are a data scientist at a large national bank. Your manager has informed you that the AML and Compliance department requested help with their anti-money laundering solution.
While they have purchased a top-of-the-line AML solution, their investigators are overwhelmed because the solution is generating thousands of alerts each month and only a small handful ever actually require further investigation. You tell them that you are an analytics wizard and that you would be thrilled to build them a predictive model to help determine which alerts the investigators can confidently ignore and which require further investigation. In your enthusiasm, you realize that you have built over 100 models. As you look over your models, your fingers tremble and your heart begins to race; decision paralysis takes over. Blood, sweat, and tears were poured into each one of these models. How could you possibly choose just one?
The first step to choosing the right model is to familiarize yourself with the life-changing possibility of reducing false positives in your AML solution.
So let’s quickly recap what you’ve learned in this three-part series. In the first post, you learned about the problems current AML solutions have false positives in the alert generation process. Currently, 95 percent of alerts are ultimately proven to be unproductive, costing banks and other financial service organizations billions of dollars each year as they chase ghosts through their data. In the second post, you discovered how organizations can use SAS Visual Data Mining and Machine Learning (VDMML) to elevate the data supplied in their AML solution and build supervised learning models to predict which alerts can be ignored and which should be considered for investigation. You specifically reviewed the pros and cons of three types of supervised learning models: logistic regression, decision trees, and gradient boosting.
Flashback over. Back to the task at hand. Now that you have built your 100 models, let’s consider how to evaluate the performance of your models. For most modeling approaches, you rarely know ahead of time all the variables you want to use or even the type of model you want to use. To accurately assess the performance of your modeling solution and chose a champion model from your list of candidates, you need to decide on a standardized approach that allows you to compare apples to apples.
There are a host of statistics (R2, AIC, BIC, and so on) to compare supervised learning models and their predictive performance. As discussed in the previous post, I recommend using the misclassification rate to determine which is your champion model. The misclassification rate essentially tells you how often your predictions are incorrect. The lower the number, the better the model is at making predictions. This is a performance statistic available across all binary predictor models that is very straightforward. Bear in mind that the misclassification rate works well when you have an even split in the target that you are trying to predict.
If you have a heavily disproportional split in your target (like in our example of predicting productive AML alerts only about five percent are productive) you will want to oversample your data before you build your model. Otherwise, your model will learn that it does a really good job by just classifying every alert as not productive. It’s hard to do better than being right 95 percent of the time, so your model will take the easy path and make all its predictions one thing. By oversampling your data you allow your model to actually find the hidden relationships in your data.
Remember, the goal is to build a model that can make accurate predictions based on the specific parameters of each individual alert.
Once you have a model fit statistic selected you can begin to compare the predictive power of your multiple models. While you can manually calculate and compare the performance of each model, SAS VDMML provides a point and click interface that can compare all of your candidate models simultaneously and suggest which model to use as your champion model.
When comparing model performance it is important to remember to use hold-out data. Hold-out data is critical to the modeling process because it ensures you aren’t just choosing a model that was overfitted to the specific data it was built on, you want to use a model that will provide accurate predictions for any new incoming as well.
SAS VDMML can perform model comparisons on hold-out data and select the champion model based on the preferred model performance statistic. Once you have a champion model you can export SAS code for the selected model to score new incoming data. The model comparison output can be seen in the following figure.
After scoring each of the models on new hold-out data not previously used, the gradient boosting model still performs best while the logistic regression model (bar chart on the right) performs slightly worse. But keep in mind our discussion about the importance of balancing interpretability vs predictability. While the gradient boosting model is the clear winner from a predictability standpoint, gradient boosting is far less interpretable. Because the interpretability is such an important factor when building trust in an analytics solution and you need to be able to explain your predictions to regulators, you probably want to start with the logistic regression model. It still has a respectable misclassification rate and has a very clear interpretation behind each prediction.
If you haven’t noticed to this point, I have not posted a single line of code in any of the three posts in this series. This was done by design. That’s because one of the advantages of using SAS VDMML is that you can build the entire solution without writing any code. Once you have worked through the modeling process and identified the model you wish to use as your champion model, SAS will automatically generate the code for you.
You can either register your champion model within the solution and begin scoring new alerts right away or you can export the generated code directly to your solution and put your model into production within minutes. The main point is that all your code is managed and generated for you in one place, dramatically saving time and resources. An example of the code export capabilities can be seen in the following figure.
Banks and other financial institutions have a tremendous responsibility to monitor daily financial transactions. While there are several excellent AML solutions on the market, the incredibly high rate of false-positive alerts generated each day is a growing problem. False alerts cause investigator burnout and ultimately cost financial organizations billions of dollars each year. Organizations can use the data generated by these solutions and supplement the monitoring process with predictive models. These models bring additional logic to their solutions and help identify which alerts should in fact be investigated.
If this series has not sustained your appetite for how financial organizations can be smarter with their AML monitoring, read the additional content we have on the subject. Or, if you have specific questions, reach out to us! We’d love to hear from you.