Big Data Books

If you want to dig a little deeper and learn more then read this list of the best Big Data books. Some are targeted more to the novice will others will be handy for the more advanced user but the following books cover the full range. You think something it missing? Tell us in the comments!

1. Big Data: A Revolution That Will Transform How We Live, Work, and Think – Viktor Mayer-Schönberger

Myer-Schönberger is a professor at Oxford’s Internet Institute and was previously at the Kennedy School of Government. In a very lively and well-structured fashion he outlines the progress Big Data Analytics has already made, where it is predicted to lead and the dangers associated with the algorithmitisation of the world. Its a great introduction that tends to avoid being overly technical and, when it does use the occasional specificity, illuminates it by means of examples.

2. Big Data Now: 2012 Edition (ebook) by O’Reilly Media

Big Data Now is a free publication by O’Reilly. the book is aimed both at professionals and at merely interested people who wish to get up to speed with the topic. It provides an overview over all relevant tools, the applications thereof and a view to the future.

3. 101 “Insanely Great” Resources – Big Data by Ben Kerschberg

101 “Insanely Great” Resources — BIG DATA is an easy-to-use introduction to the world of Big Data, and in particular to 101 important resources for understanding the topic. This e-Book / PDF is filled with embedded links that take you directly to the Big Data section of a particular resources (e.g., The Wall Street Journal) or to more specific sites such as Data Science Central or journals.

4. Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work by Ethan McCallum

The book for anyone sitting on a mound of data in bad shape and wondering what to do with it. Despite what you might initially think, its not worthless! Based on 19 examples Bad Data Handbook explains how you can transform the data into something that can be worked with.

5. Planning for Big Data: A CIO’s Handbook to the Changing Data Landscape (ebook) by Edd Dumbill

Specifically targets senior audience within a company. This book tries to explain Big Data with questions in mind such as what products can be created from insight that has been gotten and how the new technology may make the organisation more efficient.

6. Doing Data Science: Straight Talk from the Frontline by Cathy O’Neil and Rachel Schutt

For anyone with the desire to enter the field of Big Data themselves since the book follows Columbia University’s Intro to Data Science class. Experts from companies such as Google or eBay share algorithms, insights etc.

7. Big Data For Dummies by Judith Hurwitz et al

Solid introduction. You know what to expect from the series.

8. Data Analysis with Open Source Tools by Philipp K. Janert

Useful for the applied user. More advanced programmers will learn to use the open-source software available and how to best apply it to data in a business setting.

9. Building Data Science Teams by DJ Patil

DJ Patil, potentially the originator of the term “Data Scientist”, explains what makes an effective team of data scientists and what individual talents and skills to look for.

10. The Culture of Big Data by Mike Barlow

Big Data requires an environment that supports a data-driven mentality. This book explains what it means to be data-driven and how to establish a culture among the people that will benefit from Big Data in order to benefit.

11. Big Data Glossary (Paperback) by Pete Warden

Rather technical glossary that hones in on the various features of analytics tools and rates them based on Warden’s personal programming experience with them.

12. Too Big to Ignore: The Business Case for Big Data by Phil Simon

Using real-world examples Phil Simon tries to explain the economics behind Big Data. To reap valuable insight from Big Data analytics, he claims, one does not need to be guru data scientist. Simply accepting that Big Data can give insight is in many cases enough to change the culture into an embracing one.

13. The Signal and the Noise: Why Most Predictions Fail- But Some Don’t by Nate Silver

Nate Silver is a renowned statistician and writer. Career highlights include developing PECOTA, a system for forecasting baseball performance, and correctly predicting the winner of 49 out of 50 states in the 2008 American Presidential Elections.
In The Signal and the Noise, Silver aims to demystify the process of finding the predictive ‘signals’ in amongst the noisy data. He examines a dizzying range of fields- from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA- to find the common patterns amongst successful prediction. His biggest lesson? Start noticing the differences between confident predictions and accurate predictions.
An insightful & essential read for anyone interested in predictive analytics.

14. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Described as “the Freakonomics” of Big Data, Predictive Analytics… is the perfect primer to the most potent, booming unnatural resource of our time: data. Siegel offers insight to the near-endless applications of Big Data, including how companies know who will drop out of school, cancel a subscription, or get divorced before they are even aware of it themselves; how IBM’s Watson computer used predictive modeling to answer questions and beat the human champs on TV’s Jeopardy!; and how we know retirement decreases life expectancy, and vegetarians miss less flights.

15. The Human Face of Big Data by Rick Smolan and Jennifer Erwitt

Smolan & Erwitt are the co-founders of Against All Odds Productions, a company which brings together prominent journalists and photographers to capture and explore some of today’s pressing issues. One of their latest projects is The Human Face of Big Data, which aimed to investigate the impacts of Big Data from a human, personal perspective. Featuring 10 essays from noted writers and stunning infographics from Nigel Holmes, this book offers a whole new approach to looking at the influence and possibilities of Big Data.

16. Data Science for Business: What you need to know about data mining and data-analytic thinking by Foster Provost and Tom Fawcett

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business walks the reader through the fundamental principles of data analysis. Crucially, it not only discusses the practices of effective data mining, but how to convert findings into real business value. Drawing real-world examples from big business, this book is essential for any company wondering if a data scientist would actually be worth his first pay cheque.

17. The Black Swan: Second Edition: The Impact of the Highly Improbable by Nassim Nicholas Taleb

In the Europe of 1696, it was a widely-accepted, scientific fact that all swans were white. In Australia, 1697, the cygnus atratus, or black swan, was discovered.
Both history and modern life are rife with such improbable events, whether positive or negative. Yet humans, programmed to crave narrative and order, often overlook them. Taleb takes a playful look at how black swans explain almost everything about our world, and encourages us to embrace the outliers and anomalies.

18. Competing on Analytics: The New Science of Winning by Thomas H. Davenport and Jeanne G. Harris

Businesses now have more information at their fingertips than ever before, but information alone is not enough. In Competing on Analytics, Davenport and Harris aim to demonstrate how statistical analysis and predictive modeling can give you that competitive edge.

19. Big Data Marketing: Engage Your Customers More Effectively and Drive Value by Lisa Arthur

If you’re stuck with an archaic marketing strategy and drowning in messy data, let Big Data Marketing drag you out of the mire. Using real-world examples, jargon-free language, downloadable resources, and a healthy dose of humor, Big Data Marketing will transform ‘Big Data’ from your least favourite term into your most valuable asset.

20. Journeys to Data Mining: Experiences from 15 Renowned Researchers by Mohamed Medhat Gaber

For Journeys to Data Mining, Mohamed Medhat Gaber asked 15 successful and widely-respected names from the data mining field to write down their journeys through the world of data science. He asked them to consider ten key questions, such as: What are your motives for conducting research in the data mining field? How would you advise a young researcher to make an impact?
The format of their replies was left entirely in the hands of the data miners, leading to informal and insightful responses. An incredibly useful read for anyone interested in a career in computer science.

21. The Fourth Paradigm: Data-Intensive Scientific Discovery by Tony Hey, Stewart Tansley and Kristin Tolle

Written back in 2009 and marketing itself as ‘the first broad look at the rapidly emerging field of data-intensive science’, The Fourth Paradigm was written as industry experts were realising the power of massive datasets. Even after five years of rapid and unimaginable expansion, this book still offers insight into what data science offers across various disciplines.

22. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson

In the tradition of Bruce A. Tate’s Seven Languages in Seven Weeks, Redmond and Wilson aim to teach you about seven open-source databases-Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres- in seven weeks. Rather than just giving you the basic overview of the systems, Seven Databases aims to educate you about the fundamental concepts of the technology, and get you solving real-world problems with the databases. The most intensive and comprehensive introduction to large scale data management there is.

23. Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis by Colleen McCue

In Data Mining and Predictive Analysis, McCue combines discussion of the possible applications of predictive analysis with real-world examples showing how data mining has identified crime trends, anticipated community hot-spots, and refined resource deployment decisions. Combining expert knowledge with user-friendly tone, this book is suitable for readers just stepping into the world of predictive analysis and patrons alike.

24. A Statistical Guide for the Ethically Perplexed by Lawrence Hubert and Howard Wainer

In disciplines such as law, medicine and psychology which impinge upon human wellbeing, statistics have to be used in accordance with standards for ethical practice. A Statistical Guide for the Ethically Perplexed straddles the intersection between ethics and statistics, hoping to teach readers how to use and apply statistics in an ethical manner.
Real-world examples discussed in A Statistical Guide… include: breast cancer screening, risk and gambling, the Federal Rules of Evidence, “high-stakes” testing, regulatory issues in medicine, difficulties with observational studies, ethics in human experiments, health statistics, and much more.