The Madness Begins: Can You Beat Our Machine Learning Model?
03/12/2018 by Chris St. Jeor Modernization - Analytics
Do you smell that? *sniff* No, that’s not the smell of fresh spring air after a long winter. That’s not the smell of the blossoms finally emerging from tree branches. What you smell is the sweat and tears of millions of people who are trying to perfect their NCAA Men’s Basketball Tournament brackets before it is too late.
They need to hurry because Kaggle is about to run one of the best contests of the year: the March Madness showdown for the NCAA Men’s Basketball Tournament. This competition is an epic showdown, pitting nerd against jock, ardent fan against casual observer, know-it-all against know-nothing-at-all. Regardless of their status, participants unite around one goal, to predict the perfect bracket that accounts for crazy upsets while respecting traditioned powerhouses.
While some people rely on personal knowledge and opinion, others take a more analytical approach. The Madness tournament provides a great opportunity to experiment with different machine learning models to predict winners in the tournament. The problem we face when selecting tournament winners is a binary one. Is team X going to win? Yes or no?
Companies face binary problems every day:
There are a wide array of modeling approaches we can use to answer these types of questions, not just providing the correlation between two variables, but also provide powerful insights and actionable analytics.
Here at Zencos, we like to rely on analytics, so as we approached this year’s Kaggle competition, we began with a baseline model. Logistic regression is a powerful model that provides decent predictions but is also very interpretable, allowing us to identify the underlying relationship that our variables have on the probability that a given team wins.
Once you understand the relationship each variable had with the target (did the team win or not?) you can start exploring various models and assess the overall accuracy of the predictions. For the madness problem, I used the misclassification rate to compare the quality of each model.
The final model we submitted to Kaggle’s competition was a three-tier ensemble model. Ensemble models take the predictions from two or more models and synthesize the results into one prediction. Ours became a Logistic Regression, Gradient boosting, Neural Network ensemble model.
Whether a tenderfoot or seasoned veteran in machine learning, March Madness is an excellent opportunity to test new hypotheses and challenge old theories. As you agonize over your picks, remember that you are not alone. Relax. Take a deep breath. Madness is in the air.
You already know that you are going to win your work tournament. But can your bracket beat the brackets of Abe Lincoln, a carefully crafted ensemble model, and a major sports enthusiast (me)?
Here’s your chance to prove it. Submit it into the Zencos Tournament Challenge.
Editor’s Note: Link removed after the tournament ended.
Those familiar with the tournament know it is one of the most anticipated sporting events of the year. The tournament’s 64 team win-or-go-home bracket design is nearly irresistible to even the most casual sports fan. In 2017 an estimated 40 million people filled out 70 million brackets and wagered $10.4 billion on tournament games. Part of the allure is the sheer number of possible tournament outcomes.
With 64 teams in the tournament (not including play-in games)
there are 9,223,372,036,854,775,808—or 9.2 sextillion—possible outcomes.
Predicting the perfect bracket is virtually impossible (even for you). If the average person lived to be 90 years old and could fill out a unique bracket every second, it would take 3.2 billion lifetimes to create every unique bracket combination.
Just like the countless choices when building a bracket, businesses are faced with limitless choices each day. However, businesses don’t have 3.2 billion lifetimes to make decisions. Today’s economy is changing rapidly, and personal intuition is an increasingly fickle friend. Just ask Kodak about sitting on the digital camera or Blockbuster passing on the opportunity to buy Netflix.
Now that big data is old news and enterprise machine learning solutions are becoming the norm, businesses that continue to rely solely on human intuition will be left behind. While instinct and intuition remain assets, predictive models increase the speed and increase the quality of today’s business decisions.
For the tournament challenge, the problem we face is predicting a binary outcome: is NC State going to beat UNC (sigh…probably not). Predicting tournament games is an excellent case study because, like many decisions we make, the decision is binary.
Deep learning models provide many methods to predict binary outcomes. The same models you use to predict if a team will win or not can be applied to predict whether a customer will make a purchase.
Whether you are trying to predict a binary, interval, or continuous target, the first question to consider is whether you are looking for interpretability or predictability in a model.
Some models, like logistic regression or decision trees, can provide great interpretability but might not provide a necessary level of accuracy. In other words, if you need to know the price increase of a house per square foot added, then regressions and decision trees do a great job.
However, if you are purely interested in getting the most accurate prediction possible, and you aren’t worried about the underlying reasons for the prediction, then neural networks and gradient boosting might be the best fit.
To demonstrate and measure the abilities of deep learning models we are going to have a showdown over the March Madness tournament, pitting human against machine. For some extra fun, we’ll also throw dumb luck into the equation!
Let’s get to know our four contestants in this year’s Zencos Madness Challenge.
You may have realized by now that I am an avid sports fan. Each year I painstakingly agonize over every pick. I look at season stats, I listen to my favorite sports pundits, and I even do a rain dance or two to make my picks. I am proud to say that my bracket has never finished below 80% nationally (though, that likely has more to do with luck than anything else).
For this competition, we are only interested in identifying the team with the highest probability of winning. We don’t need to know the actual impact each additional assist has on the probability of a victory. Therefore, we used SAS Enterprise Miner to build a three-tier ensemble model.
Ensemble models take the predictions from multiple models, then synthesize the results into a single prediction. While these models leave very little interpretability, they can create very accurate predictions. For our bracket, we ensembled three common machine learning models: a logistic regression, a neural network, and a gradient boosting model.
This bracket is simple. For each matchup, a penny will be flipped (yes, we are high rollers). If the result is “heads”, the higher seed advances.
If the result is tails, the lower seed moves on. And we are using Abe Lincoln, so you know the flips are honest.
I may not know what method you are going to use to make your picks, but there is one thing I am sure of: it’s going to be on like Donkey Kong.
The rules of the game are simple: each bracket will be entered into the Zencos ESPN tournament challenge. Points are scored for each successful pick, with the number of points per correct choice increasing through each round. The bracket with the most points at the end of the tournament wins.
If there is one thing we know about the March Madness tournament, it’s that we know nothing about the tournament. So, who will win? Your guess is as good as mine (er, probably better—you are a genius, after all).
With that said, Zencos would like to cordially invite you to participate in the first-ever Zencos Tournament Challenge. To participate simply create an ESPN account and submit your bracket to the Zencos Madness Tournament Challenge. May the best human, machine, or coin win!
Now Lessons Learned from the Madness for the results …