AML Transaction Monitoring and False Positives
11/23/2020 by Chris St. Jeor
Those familiar with the financial sector know the responsibility placed on financial institutions to adhere to strict anti-money laundering regulations or face large fines.
Every financial institution is its own living and breathing organism working to overcome a unique set of challenges. However, all financial organisations share one monster of a problem: keeping up with the number of alerts generated through their anti-money laundering(AML) solutions. The problem they face is figuring out how to deal with the high rate of false positives generated by their solutions.
AML transaction monitoring primarily consists of complicated business logic that tracks demographics and financial transactions in an attempt to identify potential money laundering practices. If any combination of financial transactions falls within the complicated business logic, then the transaction, or group of transactions, is flagged as an alert and must be considered for further investigation.
The problem? False positives – the bane of the very existence of many data scientists. These dastardly little bugaboos are the misclassification or identification of something as one thing when, in fact, they turn out to be another. False positives run rampant in the financial sector.
False positives are a significant problem in financial services. The sheer volume of transactions happening daily leads to thousands of routine financial transactions being flagged each month that ultimately result in unproductive alerts. Billions of dollars are spent each year to determine which alerts require investigation – and the problem is only getting worse. Research by McKinsey & Company shows that resources dedicated to AML compliance have increased up to tenfold at major banks over the last 5 years .
While rule-based monitoring is the current industry best practice, it does not need to remain the only solution to AML monitoring. Financial institutions can use the data created through the alert generation process and leverage predictive models. These models will identify which alerts can be ignored and which should be investigated.
This article will outline how financial organisations can use predictive analytics, machine learning, artificial intelligence (whatever you like to call it) to sift through the thousands of alerts generated and to identify – with a high level of confidence – which alerts need to be investigated.
The first step in any analytics problem is to determine the question you are trying to answer. For your problem, the question you are trying to solve is how to predict whether an alert generated by your AML solution results in a productive alert – an alert that warrants an investigation. While the problem on the surface may seem complicated with several contributing factors, the question itself is quite simple. Will this alert lead to a productive alert? Yes or no?
This type of analysis is called binary supervised learning – having two discrete possible outcomes. For these types of binary problems, you can use a variety of modeling approaches, each ranging in predictability and interpretability. The goal for this type of analysis is to identify known variables or features that you can use to predict an unknown outcome, answering, “Will this alert be productive?”
Other times you may be dealing with continuous or ordinal supervised learning. An example of continuous supervised learning would be predicting the price of a house, where the prediction can be any numeric value. An example of ordinal supervised learning could be predicting customer satisfaction reviews, each prediction taking a discrete value from a scale of 0 to 5.
Once you have determined the type of question you are answering; the next step is to decide what you want to get out of your model. While you have a host of binary predictor models available to use, it is not enough to make a prediction – what you will do with the prediction is critical in determining the model type to use.
When selecting the type of model to use, you need to start with a fundamental question: What is more important? Predictability (the accuracy with which you can make your predictions) or interpretability (the ability to explain why you made the prediction you did)?
In the world of analytics, there is often much discussion about the differences in predictive power between different types of models. Far too many projects die on the vine because the creators forget to account for the differences in interpretability across models. While it would be ideal to choose a model that has high predictability and interpretability, you, unfortunately, cannot always have your cake and eat it too. So you must decide what is more important for the end use of your model – the accuracy of the prediction or the interpretability?
A simple way to view this relationship between model predictability and interpretability is on the analytic spectrum chart in figure 1. While it would be great to have a model that fits in the top right quadrant of the chart, in reality, models usually fit somewhere in the top left or bottom right quadrants of the graph. Models like a decision tree and logistic regression exist in the top left, and gradient boosting and neural network models lie in the bottom right of the spectrum.
To help create some context around this idea, let’s play a little game, I like to call story time.
Let’s say you are heading to The Esplanade for the weekend and you want to lay some money down on a big match. You want to bet on your favorite team, but let’s be honest, this is your money on the line, and you want to make sure you are betting on the correct outcome. All you care about is whether your team is going to win. It doesn’t matter who scores first or last. All that matters is that your money comes back to you with interest. In this situation, you would want to use a highly predictive model, like a gradient boosting model, and ignore the interpretability of the model entirely.
For this scenario, let’s pretend you are the manager of the data science team at a large hospital. You are trying to build a model to identify patients who are at risk of developing high blood pressure, which will help you better care for your patients. Just telling a patient, they have a 75% chance of developing high blood pressure is not enough. You need to be able to tell them why they are at risk and what they can do to avoid it. For this type of problem, you want a model that can be easily interpreted and used to create an actionable plan to help your patients avoid the problem altogether. Logistic regression would be a great candidate for this type of scenario.
In summary, when selecting the model to use, you first need to decide what is most important: the ability to understand the prediction or the accuracy of the prediction? Answering this question upfront helps set the project up for success. Always remember, a model is only as good as the end user’s ability to turn it into action.
Because financial services are highly regulated, the interpretability of the model in your exercise is essential. AML business logic, known as scenarios, have been well thought out and have been the standard for a long time. So, when regulators come in to audit your solution, you need to be able to explain why you determined not to investigate an alert generated by a specific scenario. Having a clear and transparent model will be critical when trying to demonstrate to regulators the value in incorporating predictive models as part of your AML solution. The ability to clearly explain the reason your model made the prediction it did will help build trust, support, and buy-in.
Now that you have framed the question and identified the general type of models you will want to use; you are ready to explore your data and begin the modeling process (que the drum roll for SAS Viya.)
Zencos is helping financial organisations decrease false positives by elevating their current AML process with advanced analytics and machine learning models. While there are a variety of software available to build predictive models, SAS Viya Visual Data Mining and Machine Learning (VDMML) is the analytics platform used throughout the following demonstration.
SAS VDMML is a preferred application in financial services because its a web-based analytics platform that offers an end-to-end solution for complicated business problems. It allows users to explore data and build models without having to write a single line of code. Work can easily be shared and collaborated on across business units. SAS VDMML brings unparalleled levels of efficiency to the business process.
Additional benefits of this application include:
To Investigate an AML Alert or Not to Investigate? That is the Binary Question.
The problem you are trying to solve is to predict whether or not an alert generated by your AML structuring scenario results in a productive alert. It’s a yes-or-no question. As mentioned earlier, this analysis is called binary supervised learning.
In order to demonstrate how to use predictive models to reduce false positives in your AML solution, we will use mock data that replicates what you would use from a common AML structuring scenario. The models demonstrated below use common structuring variables as well as other entity-level data found in a common AML Solution. The variables we are using are defined as follows:
Before you begin modeling you first need to determine the method by which you will compare the various models you will build. There are a host of statistics (R2, AIC, BIC, and so on) to compare supervised learning models and their predictive performance.
For simplicity’s sake, we will use the misclassification rate, which essentially tells you how often your predictions are incorrect. The lower the number, the better the model is at making predictions.
This is a performance statistic available across all binary predictor models and is very straightforward. Bear in mind that the misclassification rate works well when you have an even split in the target that you are trying to predict.
If you have a heavily disproportional split in your target (like in our example of predicting productive AML alerts only about five percent are productive) you will want to oversample your data before you build your model. Otherwise, your model will learn that it does a really good job by just classifying every alert as not productive. It’s hard to do better than being right 95 percent of the time, so your model will take the easy path and make all its predictions one thing. By oversampling your data you allow your model to actually find the hidden relationships in your data.
While there are a wide variety of models you can use to predict productive alerts, we will focus on three specific models: logistic regression, decision tree, and gradient boosting. If you remember from our previous discussion, for regulated practices like AML monitoring, it is essential to use models that are highly interpretable.
Logistic regression and decision tree models are very interpretable and will help build trust in your new predictive solutions. Once you have demonstrated the power of predictive models and built confidence in your solution, you may want to transition to more predictive models like gradient boosting. While gradient boosting models do not have the same level of interpretability, they typically offer much more accurate predictions and can significantly reduce your false-positive alerts.
Logistic regression is one of the most widely used predictor/classification models. While the underlying math can seem a little complicated, in its purest form, logistic regression attempts to find the relationship that each of the predictor variables has with the variable you are trying to predict. It then uses those relationships to calculate the overall probability of an event happening for a given observation.
The feature that makes logistic regression so attractive when compared to other models is its interpretability. Each variable is assigned a coefficient value, which represents the log of odds for a one-unit change in the value of the variable. So, say for example you are predicting whether your team is going to win their next match. If conversion percentage is one of your predictor variables, then the model could tell you that a one percent increase in conversion percentage would result in your team being 1.3 times more likely to win the match. This analysis allows the end-user not only to understand the end prediction of a win or loss but also to understand the specific effect each predictor variable has on the target variable and the final prediction.
SAS VDMML allows you to build logistic regression within the drag-and-drop interface. Using the scenario data previously discussed, we can quickly build a logistic regression model to predict if an alert was productive. The logistic regression model and the model’s misclassification rate of 11% can be viewed in figure 1 below.
You can easily see which variables are significant in your model as well as the cumulative lift and residual plots for your model. Using the output above, you can quickly assess the significance of each variable in the model. Coupling this output with your industry expertise can help expedite the exploration of other potential variables with previously unknown correlations.
The ability to quickly adjust and rerun models without having to manage a single line of code dramatically decreases the amount of time spent on the modeling processes.
Decision trees are one of the most straightforward statistical models you can use for binary predictions or classifications. A decision tree uses a host of predictor variables (both continuous and categorical) and identifies the best splits of those variables to create the “purest” splits of the data. This is done through an iterative process until the specified conditions of the model are met. With each split, the target variable is forced down one of two branches. The goal of each split is to get the most significant separation of the target as possible. Following the path of the final bins created through the splits of the predictor variables provides valuable business logic and insight into why the prediction was classified the way it was.
Using the sample data, you can quickly create a decision tree in SAS VDMML. Once you specify which variable to use as the target and select the predictor variables you want to use, SAS VDMML automatically creates the model for you. The output can be seen in the following figure with a misclassification rate of 11%:
Following the tree path, you can see groups of alerts that are classified as either productive or not productive. If you hover over a node, a pop-up window appears, as shown in figure 3. Interpreting the window in the figure, alerts with more than five transactions and that have generated more than 15 alerts create a group of observations that have 100% productive alerts. Coupling this type of logic generated through a decision tree with the rule-based scenario can add a great deal of insight into which alerts are most likely productive.
An additional step could be to use this insight to create a risk rating for which alerts should be investigated.
Gradient boosting models are some of the most powerful predictive models available. While they are widely used, most people view them as a black-box predictor. So, before we discuss the gradient boosting model used to predict productive alerts, let’s take a minute to understand what is going on under the hood.
A straightforward way to think of gradient boosting models is depicted in one of my kids’ favorite movies: Disney’s Ralph Breaks the Internet (spoiler alert – after causing a nearly catastrophic collapse of the internet, Ralph dramatically returns to save the day). For those who haven’t seen the movie: Ralph is a huge, strong guy who, by himself, is fairly powerful. In the film, a weaker and less intelligent version of himself gets cloned about a million times. While each clone isn’t much to worry about, when all the weaker and less intelligent clones converge together, they become a massive unstoppable rage monster. Viewer discretion is advised.
Think of a gradient boosting model as a collection of weaker decision trees put on steroids. Gradient boosting models are a collection of miniature, weaker decision trees, each built on a different subset of your total data. The model iterates through the entire data set and takes weighted samples for each model. The goal is to give higher weights to observations that are difficult to predict and lower weights to observations that are easier to predict. The final model is essentially an ensemble of all the “weak” prediction models, which creates an unstoppable rage monster that can predict both easy and difficult observations with incredible accuracy. As a result, gradient boosting is one of the most powerful predictive models used today.
By creating several “weak” decision trees on weighted samples of the same data used as the other two models, the first attempt for our gradient boosting model has a misclassification rate of 3% (8% lower than the other models).
One of the tradeoffs of gradient boosting is that while you can get a much lower misclassification rate, you do not have the nice interpretability of logistic regression or a single decision tree model. SAS VDMML, however, provides variable importance showing which variables had the most impact on the actual model. Gradient boosting models would lie in the bottom right quadrant of the analytic spectrum discussed earlier, offering high predictability but little interpretability behind the prediction.
Now, let’s say you are nearing the end of your modeling process. Let’s say instead of three models, you have 100 models to choose from. How do you choose which one to use?
Obviously, you could look through each model’s output and compare each and every fit statistic to determine which model you deem worthy, but that sounds terrible. There has to be a better way. Right?
There is! You can dynamically choose your champion model from a host of candidate models!
Once you have completed your model building process – and you have selected your model fit statistic – you can begin to compare the predictive power of your multiple models. While you can manually calculate and compare the performance of each model, SAS VDMML provides a point and click interface that can compare all of your candidate models simultaneously and suggest which model to use as your champion model.
When comparing model performance it is important to remember to use hold-out data. Hold-out data is critical to the modeling process because it ensures you aren’t just choosing a model that was overfitted to the specific data it was built on, you want to use a model that will provide accurate predictions for any new incoming data as well.
SAS VDMML can perform model comparisons on hold-out data and select the champion model based on the preferred model performance statistic. Once you have a champion model you can export SAS code for the selected model to score new incoming data. The model comparison output can be seen in the following figure.
After scoring each of the models on new hold-out data not previously used, the gradient boosting model still performs best while the logistic regression model (bar chart on the right) performs slightly worse. But keep in mind our discussion about the importance of balancing interpretability vs predictability.
While the gradient boosting model is the clear winner from a predictability standpoint, gradient boosting is far less interpretable. Because the interpretability is such an important factor when building trust in an analytics solution and you need to be able to explain your predictions to regulators, you probably want to start with the logistic regression model. It still has a respectable misclassification rate and has a very clear interpretation behind each prediction.
If you haven’t noticed to this point, I have yet to post a single line of code. This was done by design. That’s because one of the advantages of using SAS VDMML is that you can build the entire solution without writing any code. Once you have worked through the modeling process and identified the model you wish to use as your champion model, SAS will automatically generate the code for you.
You can either register your champion model within the solution and begin scoring new alerts right away or you can export the generated code directly to your solution and put your model into production within minutes. The main point is that all your code is managed and generated for you in one place, dramatically saving time and resources. An example of the code export capabilities can be seen in the following figure.
Banks and other financial institutions have a tremendous responsibility to monitor daily financial transactions. While there are several excellent AML solutions on the market, the incredibly high rate of false-positive alerts generated each day is a growing problem. Unproductive alerts cause investigator burnout and cost organisations billions of dollars each year. Organisations can use the data generated by these solutions and supplement the monitoring process with predictive models. These models bring additional logic to their solutions and help identify which alerts should in fact be investigated.
If you have stayed with us ‘til the end, and you would like to learn more information about how financial organisations can be smarter with their AML monitoring, then you are cordially invited to look up our additional material on our blog. Or, if you have specific questions, reach out to us! We’d love to hear from you.