Building a Better Bracket: Beating the Odds with Machine Learning

Like most other fans of college basketball, I spent an unhealthy amount of time dedicated to the sport the week after Selection Sunday (March 11th). Starting with spending hours filling out brackets, researching rosters, injuries, and FiveThirtyEight’s statistical predictions to fine-tune my perfect bracket, through watching around 30 games over the course of four days. I made it a full six hours into the tournament before my whole bracket busted. The three-punch combo of Buffalo (13) over Arizona (4), Loyola Chicago (11) beating Miami (6), and, most amazingly, the UMBC Retrievers (16) crushing the overall one-seed and tournament favorite, UVA, spelled the end for my predictions. After these three upsets, everyone’s brackets were shattered. The ESPN leaderboards looked like a post-war battlefield. No one was safe.

The UMBC good boys became the only 16th seed to beat a 1st seed in NCAA tournament history

The odds against picking a perfect bracket are astronomical. The probability ranges from 1 in 9.2 quintillion to 1 in 128 billion. Warren Buffet offers $1 million a year for life for Berkshire Hathaway employees who correctly pick a bracket. Needless to say, no one has been able to cash in on the prize. Picking a perfect bracket is nearly impossible, and is (in)famous for being one of the most unlikely statistical probabilities in gambling.

The Yin and Yang of March Madness

To make the chances of making a perfect bracket somewhat feasible, a competition has been set up to see who can beat the odds with machine learning. Hosted by Kaggle, an online competition platform for modeling and analytics that was purchased by Google’s parent company, Alphabet, the competition has people making models to predict which teams will win each game based on prior data. A model that is correct and predicted it with 99% confidence will score better than one with a 95% confidence and so on. The prize is $100,000, split among the teams that made the top 3 brackets. Teams are provided with the results of every men’s and women’s game in the tournament since 1985, the year that the tournament first started with 64 teams. They are also provided with every play since 2009 in the tournament. Despite all this data, it is still very hard to predict, with the best bracket in this competition, which has been hosted for five years, predicting 39 games correct. Many unquantifiable factors, such as hot streaks and team chemistry, play a large factor in the difficulty in choosing, so it looks like we’re still years off from having our computers picking the perfect bracket.