The 3 Don'Ts Of Big Data

We recently discussed the pitfalls of big data with Gartner’s Research VP, creator of the 3 V’s and all-round big data legend Doug Laney, and formulated some don’ts of big data to steer your big data dreams in the right direction.

In almost any discussion about big data, the word “value” inevitably rears its ugly head. The media has recently been inundated with lofty statistics around this fabled big data value. Wikibon projects the big data market will be worth $28.5 billion in 2014, rising to $50.1 billion in 2015. AT Kearney value the big data hardware & software market alone at $114 billion by 2018. McKinsey estimates big data could save European government administrations $149 billion by improving operational efficiency. Which such huge sums of money being forecast in big data’s near future, you can almost see the cartoon-ish dollar signs appearing in front of business leader’s eyes.

But big data isn’t a magical solution; in fact, it isn’t a solution at all. Of course, it can and will continue to be a huge source of revenue and growth for companies, but it has its pitfalls as well as its promises. Poor data is estimated to cost US businesses $3.1 trillion a year. But don’t despair just yet; we’ve devised some key directives to steer your big data initiatives down the right path.

1. Don’t Put All of Your Eggs in One Basket

One of the main missteps when embarking upon a big data project is being too myopic. You need to take a broad approach to choosing the right technologies- as I’ve reiterated in my Understanding Big Data series, there really isn’t a “one-size-fits-all” big data architecture- it’s not a case of downloading one open-source package and being ready to go. More often than not, a combination of technologies catered to specific tasks is the best approach. In his Gartner paper “Information Innovation Key Initiative Overview”, Doug Laney also highlights that taking a broad approach to data sources is key. The paper states enterprises can succeed or fail based on how they respond to:

  • Big data
  • Social media
  • Cloud computing
  • Mobile
  • The Internet of Things
  • Machine data
  • Personal analytics
  • Open, syndicated and other external data sources

We asked Doug if there one of these facets in particular which was chronically overlooked.

“It depends what industry you’re in, and what data sources you already have that can intersect with these others in unique and high-value ways. Social media is a still largely untapped constant stream of insights into what the general public are thinking and doing–perhaps with your products/services or with those of your competition. I wouldn’t dare get into a new market or introduce a new product today without a thorough examination of social media trends. Ideally, if you can connect your own customers and prospects to their social specific media accounts, the insights gleaned can be tenfold.”

2. Don’t Charge in Blindly

Perhaps the most damaging setback to big data initiatives is a lack of a clear and defined strategy. SAS found that in a survey of 256 industry leaders, 40% defined “the lack of a compelling business case” as a major obstacle to big data implentation (specifically in this case, using Hadoop). In a recent LinkedIn poll of “Why Hadoop Projects Fail?”, a consistent pain point identified by respondents was “No financially compelling use case”. Hiring a data scientist to construct a big data solution and expecting immediate ROI is not a sufficient strategy. As Jamal Khawaja notes, “If your client is asking you for a Big Data solution, then you’re already in the wrong place”. Although Laney agreed that an efficient strategy is vital, he added flexibility and experimentation were also key components of effective big data strategies.

“Ultimately you need to have a clear vision for what you’re doing with data that cannot be done (cost effectively) on current infrastructure/architecture. But that doesn’t mean you shouldn’t be experimenting too”.

Such flexibility is something he also highlights in his “Information Innovation Key Initiative Overview”. It states that as well as a clear vision for what you’d like to achieve, a robust big data intiative must also be prepared for the changing big data landscape. The paper outlines a set of strategic planning assumptions which efficient big data strategies will bear in mind. Laney told Dataconomy these predictions are “are based on a combination of surveys and speaking with with 1000+ of clients annually (per analyst). These we publish each year, and periodically throughout. They are our general prognostications that our clients should use in planning IT or business initiatives.” The latest strategic planning assumptions are:

  • By 2016, 30% of businesses will have begun directly or indirectly making money from their information assets via bartering or selling them outright.
  • Through 2016, 25% of organizations using consumer data will face reputation damage due to inadequate understanding of information trust issues.
  • By 2016, excessive focus on truth over trust in big data will prompt leadership changes in 75% of projects.
  • Through 2017, 90% of the information assets from big data analytic efforts will be siloed and unleverageable across multiple business processes.
  • By 2017, 33% of major global companies will experience a crisis due to their inability to adequately value, govern and trust their enterprise information.

Knowing what you want to achieve and being able to adapt to future pain points are both key to big data strategy.

3. Don’t Place Truth Over Trust

The prognostication about “excessive focus on truth over trust” is particularly intriguing. Laney defined the “truth over trust” conundrum as: “Simply, data may be factual, but if it doesn’t have supporting metadata to convince people of its utility, then it’s really not much good at all. What is its provenance, it quality characteristics (e.g. completeness, integrity, etc.).”

This is something Laney’s colleague at Gartner Andrew White also highlighted in a post earlier this year. “With significant growth in data, new theories (and assumptions) of causation (versus correlation) will emerge,” it states. “This growth will occur, perhaps, at such a prodigious rate that our testing won’t be able to keep up, and so our ability to improve our understanding (i.e. make better predictions) will fall.” Many companies have vast silos of data, and want their big data intiatives to leverage all of it. Yet often, analysis on a smaller, high-quality, trustworthy dataset can be much more enriching.

In short, you’ve got a have a robust and creative strategy with a range of trustworthy data sources. When summarised in one sentence, it almost seems obvious- and yet these pain points arise time and time again, and prevent companies from maximising their potential.

(Image credit: Flickr)


Doug Laney is a Research VP for Gartner Research. He is considered a pioneer in the field of data warehousing and originated the field of infonomics (short for “information economics”). He has led analytics and information-management-related projects on five continents and in most industries. Mr. Laney is also an experienced IT industry thought leader, having launched Meta Group’s Enterprise Analytics Strategies research and advisory service, established and co-led the Deloitte Analytics Institute, and guest-lectured at leading business schools on information asset management and valuation.

Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!


Previous post

IBM Commits $100 Million to Support China in Nurturing Big Data Talent

Next post

The Internet of Things is on it's Way to a Universal Language?