In lieu of the upcoming NCAA tournament office pools and pundit prognosis are starting to gain momentum. For the last few years, however the stakes are high; and the amount of predictive data available gets higher. Betting has been taken to an all new level with big data scientists using analytics to predict bids and sponsor competitions.

With millions fillings out brackets, coin flipping is no longer an option. The odds of making the correct predictions are exponentially increased with the correct analysis of data collected throughout the seaon, as well as previous years’ data which includes player statistics, tournament seeds, geographical factors and social media.

Kaggle HQ has yet again taken out their annual March Data Madness competition pits you against the millions of sports fans and office-pool bandwagoners who are hoping to win big by correctly predicting the outcome of the men’s NCAA basketball tournament. Presented by HP Software’s industry leading Big Data group and the HP Haven Big Data platform, this competition will test how well predictions based on data stack up against a (jump) shot in the dark.

The competition doesn’t just hone one’s analytical skills; Kaggle is also offering a hefty $15,000 cash prize to the team with the closest prediction. The cash prize and opportunity for glory is drawing an increasingly large field of competitors: 114 teams, 139 individual players, and 649 entries.

“The response is healthy so far and I’d expect many more to jump in, now that there’s a prize on offer,” says Will Cukierski, a Kaggle data scientist. “The make-or-break on our expectations will happen after the 2014 madness starts. We’re really excited to see if people can beat the traditional rankings and experts and seed-based predictions.”

In stage one of this two-stage competition, participants will build and test their models against the previous four tournaments. In the second stage, participants will predict the outcome of the 2015 tournament. While participation in the first stage is not necessary to enter the second, the first stage exists to incentivize model building and provide a means to score predictions. The real competition is forecasting the 2015 results, for which you’ll predict winning percentages for the likelihood of each possible matchup, not just a traditional bracket.

As well as sponsoring the event, HP are offering use of their Haven technology to fuel competitor’s algorithms. According to their blogpost:

You will have access to key HP Haven technologies, including HP Vertica Distributed R to accelerate your machine learning by running your R models across multiple nodes to vastly reduce execution time and analyze much larger data sets.

The competition started on Monday 2 February 2015 UTC and ends on Saturday 14 March 2015 UTC (40 total days).

This isn’t the first example of big data being used for prediction. More than a decade ago, professors Jay Coleman of the University of North Florida in Jacksonville, Allen Lynch of Mercer University in Macon, Georgia, and Mike DuMond of Charles River Associates and Florida State University in Tallahassee created the Dance Card  – a formula designed to predict which teams will receive at-large bids to the NCAA Tournament (aka the Big Dance). For the 2014 bids announced recently the dance card formula correctly predicted 35 of the 36 at-large bids. The model is a combined 108 of 110 over the last three years.

Big Data is also being used in a huge way by teams to improve performances. “Sports teams are using new analytical capabilities to improve their team personnel and on-court performance,” says Davis, vice president of Intel’s Data Center Group. “As an example, teams are using emerging technologies such as multi-view cameras that can measure the tendencies of players in very specific situations to improve performance.”

Full contest details can be found over at Kaggle.

Previous post

10 Data Science Stories You Shouldn't Miss This Week

Next post

Index Indicates Facebook is Bigger than We Thought, While Twitter is Flat