Author Archives: cgorman

Building a Better Bracket: Beating the Odds with Machine Learning

Like most other fans of college basketball, I spent an unhealthy amount of time dedicated to the sport the week after Selection Sunday (March 11th). Starting with spending hours filling out brackets, researching rosters, injuries, and FiveThirtyEight’s statistical predictions to fine-tune my perfect bracket, through watching around 30 games over the course of four days. I made it a full six hours into the tournament before my whole bracket busted. The three-punch combo of Buffalo (13) over Arizona (4), Loyola Chicago (11) beating Miami (6), and, most amazingly, the UMBC Retrievers (16) crushing the overall one-seed and tournament favorite, UVA, spelled the end for my predictions. After these three upsets, everyone’s brackets were shattered. The ESPN leaderboards looked like a post-war battlefield. No one was safe.

The UMBC good boys became the only 16th seed to beat a 1st seed in NCAA tournament history

The odds against picking a perfect bracket are astronomical. The probability ranges from 1 in 9.2 quintillion to 1 in 128 billion. Warren Buffet offers $1 million a year for life for Berkshire Hathaway employees who correctly pick a bracket. Needless to say, no one has been able to cash in on the prize. Picking a perfect bracket is nearly impossible, and is (in)famous for being one of the most unlikely statistical probabilities in gambling.

The Yin and Yang of March Madness

To make the chances of making a perfect bracket somewhat feasible, a competition has been set up to see who can beat the odds with machine learning. Hosted by Kaggle, an online competition platform for modeling and analytics that was purchased by Google’s parent company, Alphabet, the competition has people making models to predict which teams will win each game based on prior data. A model that is correct and predicted it with 99% confidence will score better than one with a 95% confidence and so on. The prize is $100,000, split among the teams that made the top 3 brackets. Teams are provided with the results of every men’s and women’s game in the tournament since 1985, the year that the tournament first started with 64 teams. They are also provided with every play since 2009 in the tournament. Despite all this data, it is still very hard to predict, with the best bracket in this competition, which has been hosted for five years, predicting 39 games correct. Many unquantifiable factors, such as hot streaks and team chemistry, play a large factor in the difficulty in choosing, so it looks like we’re still years off from having our computers picking the perfect bracket.

Scout’s Honor: Will the Rise of Sabermetrics and Data Replace the Role of Baseball Scouts?

As popularized by the book-turned-movie Moneyball, a large portion of baseball relies on sabermetrics, a newer analysis of statistics to gauge players and teams. Where in the past RBIs and batting average were heavily relied on as be-all indicators, newer statistics have emerged as more accurate in measuring ability, since the game has changed so much in the 100+ years since its inception. With all these new statistics and measurements, software has been developed over the past 30 years to calculate and simulate different scenarios. If you are curious, Out of the Park Baseball (OOTP) is a yearly video game series in which you manage a baseball team, simulating games and poring over stats just like loads of front-offices do. Alongside the software, there are devices in most ballparks measuring every pitch (called PITCHF/x), and even players on the field, just collecting all the data possible for use.

An example of a spread chart from PITCHF/x

With all the tools available now to measure just about every fine detail about a player, some teams are cutting back on their scouting team. Scouts have been essential for baseball teams since the beginning of the sport, looking at prospects, or talent on other teams, measuring them and seeing if they are of any interest to the team. The Astros just cut eight scouts from their team, so scouts around the league are becoming weary about the future. Some teams still heavily rely on scouts and find them indispensable, with teams like the Brewers using them for players in the lower minor leagues, where there are not enough stats to fully screen players. There is also some information that can’t just be measured through just data and film.

Scouts with their radar guns ready

In a similar vein, all this technology also frees up the scouts and allows them to just watch the game for qualitative factors. Whenever a scout is recording the reading off a radar gun, or writing down the time for a sprint, they aren’t looking at the game. This issue is the same for sports reporters, because whenever you’re writing something down, you’re not paying attention to play as it happens. With the advancements of PITCHF/x and the like, scouts don’t need to spend time doing the busy work of recording numbers, where devices can do them automatically. This frees the scouts from tedious tasks, and watch the players as they play and how they interact in detail. They can see how other players react to a play not directed at them, see their energy while playing, and just their general dispositions. Used properly, modern baseball technology might free up and help scouts, maybe not replace them entirely.