###### blog

# Reducing False Positives for AML: Part II – Building a Model in SAS Viya

08/12/2019 by Chris St. Jeor Financial Crimes

08/12/2019 by Chris St. Jeor Financial Crimes

False positives. The bane of the very existence of many data scientists. False positives are the misclassification or identification of something as one thing when it, in fact, turns out to be another. These dirty little bugaboos pop up across all industries, from banking to marketing to healthcare, each having their own costs and consequences.

These false positives run rampant in the financial sector. Banks and other financial institutions that participate in anti-money laundering practices spend billions of dollars each year chasing down false positives. However, predictive analytics can help your company can buck the trend and change the status quo.

False positives are a significant problem in financial services (specifically with anti-money laundering monitoring). However, I’ve proposed a new way to approach this problem – predictive analytics. In a previous post, you learned that the first steps to solving the false-positive conundrum in AML monitoring are to identify the issue at hand, then figure out how to frame the question you want answered.

Then you must determine what type of predictive model you want to use to answer your question – a highly predictive model that makes very accurate predictions, or highly interpretable model that provides a clear understanding as to why you made the prediction that you did. Once you’ve completed those steps, you’re ready to begin the modeling process.

Zencos is helping financial organizations decrease false positives by elevating their current AML process with advanced analytics and machine learning models. While there are a variety of software available to build predictive models, SAS Viya Visual Data Mining and Machine Learning (VDMML) is the analytics platform used throughout the following demonstration.

SAS VDMML is a preferred application in financial services because it’s a web-based analytics platform that offers an end-to-end solution for complicated business problems. It allows users to explore data and build models without having to write a single line of code. Work can easily be shared and collaborated on across business units. SAS VDMML brings unparalleled levels of efficiency to the business process.

Additional benefits of this application include:

- Web-based GUI platform
- In-memory distributed process for speedy results
- Wide variety of predictive models
- Compare models quickly and export SAS score code to operationalize a champion model

The problem you are trying to solve is to predict whether an alert generated by your AML structuring scenario results in a productive alert or not. It’s a yes-or-no question.

This analysis is called *binary supervised learning*. For these types of binary problems (with two possible answers: yes or no), you can use a variety of modeling approaches, each ranging in predictability and interpretability. The goal for this type of analysis is to identify known variables or features that you can use to predict an unknown outcome. Thus answering, “Will this alert be productive?”

In order to demonstrate how to use predictive models to reduce false positives in your AML solution, I created mock data that replicates what you would use from a common AML structuring scenario. The models demonstrated below use common structuring variables as well as other entity-level data found in a common AML Solution. The variables we are using are defined as follows:

**Productive_Alert (Target)**: Binary indicator for whether an investigation was initiated.**Party_ID**: Party number to track historical alerts and transactions. We do not use this in the actual model but simply as a party key.**Number_of_Alerts_Generated**: Numeric variable that aggregates the total number of alerts generated for a given party key.**Time_Between_Transactions**: Numeric variable that aggregates the amount of time between transactions.**Past_CTR**: Binary indicator for whether the person has generated a past Currency Transaction Report.**Number_of_Transactions**: Numeric variable that aggregates the total number of cash transactions that were included with the alert.**Currency_Amount**: Numeric variable that aggregates the total amount transferred for the alert.**Cash_Intensive_Business**: Binary indicator for whether the initiator of the transaction is a cash-intensive entity.

Before you begin modeling you first need to determine the method by which you will compare the various models to one another. For simplicity’s sake, this post will use the misclassification rate, which essentially tells you how often your predictions are incorrect. The lower the number, the better the model is at making predictions. This is a performance statistic available across all binary predictor models and is very straightforward.

There is a lot more to discuss when it comes to misclassification rates and other performance statistics, but for now, let’s focus on building your predictive models.

While there are a wide variety of models you can use to predict productive alerts, we will focus on three specific models: logistic regression, decision tree, and gradient boosting. If you remember from our previous discussion, for regulated practices like AML monitoring, it is essential to use models that are highly interpretable. Logistic regression and decision tree models are very interpretable and will help build trust in your new predictive solutions. Once you have demonstrated the power of predictive models and built confidence in your solution, you may want to transition to more predictive models like gradient boosting. While gradient boosting models do not have the same level of interpretability, they offer much more accurate predictions and can significantly reduce your false-positive alerts.

Logistic regression is one of the most widely used predictor/classification models. While the underlying math can seem a little complicated, in its purest form, logistic regression attempts to find the relationship that each of the predictor variables has with the variable you are trying to predict. It then uses those relationships to calculate the overall probability of an event happening for a given observation.

The feature that makes logistic regression so attractive when compared to other models is the calculated relationship each predictor variable has with the target. Each variable is assigned a coefficient value, which represents the log odds for a one-unit change in the value of the variable. So, say for example you are predicting whether the Cowboys are going to win on Sunday. If quarterback passer rating is one of your predictor variables and it has a coefficient of 1.34, this would mean that for each additional percentage increase in quarterback passer rating, the Cowboys are 1.34 times more likely to win the game. This analysis allows the end-user not only to understand the end prediction of a win or loss but also to understand the specific effect each predictor variable has on the target variable and the final prediction.

SAS VDMML allows you to build logistic regression within the drag-and-drop interface. Using the scenario data previously discussed, we can quickly build a logistic regression model to predict if an alert was productive. The logistic regression model and the model’s misclassification rate of 11% can be viewed in figure 1 below.

You can easily see which variables are significant in your model as well as the cumulative lift and residual plots for your model. Using the output above, you can quickly assess the significance of each variable in the model. Coupling this output with your industry expertise can help expedite the exploration of other potential variables with previously unknown correlations.

The ability to quickly adjust and rerun models without having to manage a single line of code dramatically decreases the amount of time spent on the modeling processes.

Decision trees are one of the most straightforward statistical models you can use for binary predictions or classifications. A decision tree uses a host of predictor variables (both continuous and categorical) and identifies the best splits of those variables to create the “purest” splits of the data. This is done through an iterative process until the specified conditions of the model are met. With each split, the target variable is forced down one of two branches. The goal of each split is to get the most significant separation of the target as possible. Following the path of the final bins created through the splits of the predictor variables provides valuable business logic and insight into why the prediction was classified the way it was.

Using the sample data, you can quickly create a decision tree in SAS VDMML. Once you specify which variable to use as the target and select the predictor variables you want to use, SAS VDMML automatically creates the model for you. The output can be seen in the following figure with a misclassification rate of 11%:

Following the tree path, you can see groups of alerts that are classified as either productive or not productive. If you hover over a node, a pop-up window appears, as shown in figure 3 below. Interpreting the window in the above figure, alerts with more than five transactions and that have generated more than 15 alerts create a group of observations that have 100% productive alerts. Coupling this type of logic generated through a decision tree with the rule-based scenario can add a great deal of insight into which alerts are most likely productive.

Alternatively, an additional step could be to use this insight to create a risk rating for which alerts should be investigated.

Gradient boosting models are some of the most powerful predictive models used today. While they are widely used, most people view them as a black-box predictor. So before we discuss the gradient boosting model used to predict productive alerts, let’s take a minute to understand what is going on under the hood.

A straightforward way to think of gradient boosting models is depicted in one of my kids’ favorite movies: Disney’s *Ralph Breaks the Internet* (spoiler alert – after causing a nearly catastrophic collapse of the internet that Bezos himself could not prevent, Ralph dramatically returns to save the day). For those who haven’t seen the movie: Ralph is a huge, strong guy who, by himself, is fairly powerful. In the film, a weaker and less intelligent version of himself gets cloned about a million times. While each clone isn’t much to worry about, when all the weaker and less intelligent clones converge together they become a massive unstoppable rage monster. It’s quite intense.

Think of a gradient boosting model as a collection of weaker decision trees put on steroids. Gradient boosting models are a collection of miniature, weaker decision trees, each built on a different subset of your total data. The model iterates through the entire data set and takes weighted samples for each model. The goal is to give higher weights to observations that are difficult to predict and lower weights to observations that are easier to predict. The final model is essentially an ensemble of all the “weak” prediction models, which creates an unstoppable rage monster that can predict both easy and difficult observations with incredible accuracy. As a result, gradient boosting is one of the most powerful predictive models used today.

By creating several “weak” decision trees on weighted samples of the same data used as the other two models, the first attempt for our gradient boosting model has a misclassification rate of 3% (8% lower than the other models) as can be seen in the following figure.

One of the tradeoffs of gradient boosting is that while you can get a much lower misclassification rate, you do not have the nice interpretability of logistic regression or a single decision tree model. SAS VDMML, however, provides variable importance showing which variables had the most impact on the actual model. Gradient boosting models would lie in the bottom right quadrant of the analytic spectrum discussed earlier, offering high predictability but little interpretability behind the prediction.

Now, let’s say you are nearing the end of your modeling process. Let’s say instead of three models, you have 100 models to choose from. How do you choose which one to use?

Obviously, you could look through each model’s output and compare each and every fit statistic to determine which model you deem worthy, but that sounds terrible. There has to be a better way. Right?

There is! You can dynamically choose your champion model from a host of candidate models!

Close

Close