Blog

Lessons Learned from Predicting The Madness

Integration

04/13/2018

Now that it’s April can we take a minute to talk about the NCAA Men’s Basketball Tournament? It lived up to its name this year. With 9.2 sextillion possible outcomes and only 64 choices per bracket, it’s nearly impossible to predict exactly what will happen throughout the madness. Did a little fact like that keep us from trying? Of course not! Along with millions of other people, we submitted our brackets to the tournament gods (ESPN) for judgment.

Those of you who are smart, cultured, and up for an even more daunting challenge submitted your brackets to the 2018 Zencos Tournament Challenge. There, we pitted man against machine (er, data science model).

You went up against three brackets:

One created by Abe Lincoln (the flip of a coin),
One from yours truly (my previous years’ brackets had never finished below the 80th percentile),
One created by advance analytics (with the help of SAS Enterprise Miner)

So how did we all do?

Tournament Winner

We had several contestants that made a valiant run for the top spot. With the tumultuous and unprecedented upsets, it seems only fitting that the winner of the first-annual Zencos Tournament Challenge names their bracket “NaïveHope”.

However, I hope it’s not naïve of me to think I’m going to come back and win next year.

A Flip of a Coin?

This competition was not Honest Abe’s finest hour. Each pick for this bracket was determined by a flip of a coin. If heads – the higher team advanced. If tails – the lower seed. Unfortunately, the penny we used was no lucky penny. It chose Oklahoma to win it all. Alas, Oklahoma was the first team out of the tournament. This bracket finished in the bottom 4th percentile nationally. Better luck next year, Linc.

Humble Pie Anyone?

My mom always said I was above average. My bracket this year proved her wrong. While in past years I have never had a bracket finish below the 80th percentile, this year’s finished dead in the middle at 50 percent.

Yes. My bracket that involved hours of research, sweat, and tears was beaten by brackets selected based on which mascot would win in a fight, which color is best, and even which coach has better hair. On the flip side, I did manage to beat Abe Lincoln. My mom might be right after all.

Swing and a Miss

Despite the fact that the one-seed eliminated in the first round was the very team we predicted to win it all (thanks a lot, Virginia), our ensemble model’s bracket was doing well as we entered the Final Four. Of the 17 million-plus brackets filled out on ESPN, The Machine (as we lovingly named it) ranked in the 98th percentile. Just as I leaned back to pop open a bottle of champagne, the Final Four games were played and not even Sister Jean was able to pull out the miracle we needed. By the end of the tournament, The Machine ranked a sad (but almost respectable?) 70 percent nationally.

The Art of Data Science

Had our model been one we were using in a real-world business situation, we would have stopped to care for our model as the results began to change. Even the best models cannot prepare for the changes that come with new information. That would be like Nexflix basing all movie recommendations off that time your niece watched “Barbie: Island of Unicorns”, refusing to consider all the super-cool action movies you’ve watched since.

Lessons Learned

The art of data science comes down to maximizing the information you have while adjusting models to account for the new information that is acquired along the way. It allows us to take the “madness” in the world and make sense of it. However, in the case of our machine learning model, all we can do is use the information we have now to look back and reflect on what we would have done differently.

In retrospect, our model made two glaring mistakes, overweighting the value of the difference between seeds and underweighting the value of three-point shooting. The unprecedented run Loyola made into the Final Four and the historic three-point shooting of Villanova was simply too much for our little Machine to overcome.

This madness experiment highlights one of the weaknesses of predictive models. In the real world, you encounter massive outliers and events that have never taken place. While machine learning does an excellent job picking up on historic patterns, it cannot predict a future that it has never seen before. Until this year, we had never seen a 16-seed knock out a 1-seed in the first round of the NCAA Men’s Basketball Tournament.

The madness came. The madness saw. The madness conquered us all (except you, NaiveHope). However, you’d be wrong to think that I’ve been left defeated. I plan to take this new information and spend the next 11 months preparing The Machine 2.0. As always, the Tournament was full of fun and excitement. Enjoy the April winddown. My model and I will see you in March of 2019.

March Madness is a trademark of the NCAA.

…..

Related Insights

Virtual Event

Shark Tales: Helping OCEARCH Save Sharks With Data Storytelling