Dark Data: What Is It And Why Does It Matter?

One of the latest buzz terms to emerge, quite literally, from the shadow of Big Data is Dark Data. Despite its ominous-sounding name (one can imagine a tween goth band adopting that moniker), it is perhaps more accurate, if less foreboding, to see Dark Data as more like Big Data’s blight side.

For the most part, Dark Data has been categorized as the dearth of information an organization creates and ultimately uses once, only to then be buried within a vast and unorganized collection of other content assets. In pragmatic terms, it’s the estimated 80% of documents an organization produces that never get used again.

Now, before we get more into what Dark Data is (and the various degrees of “darkness” that it can have), we need to first look at why the term has come to be. Simply put, the very emergence of Dark Data as its own entity can be seen as a reminder of the endemic challenges of information management, namely not just how, but why we manage information. Some data, after all, can and should be left in the “dark.” For example, quality systems are used to store and manage information that serves as evidence of compliance, such as training records and audit logs. This kind of data is often needed just in case the system and processes are audited. From an IT perspective, this deliberate darkness, so to speak, becomes a function of safeguarding the data and preventing unauthorized access to it.

But when frequently accessed content becomes dark data unintentionally, it’s not just an inconvenience, it’s a serious problem. If a business proposal from a few years back that can’t be found, and thus must be recreated instead of just revised and repurposed, too much unnecessary effort is spent. Or, when a customer calls support, it is essential for a support engineer to be able to view the entire customer history even if that data is scattered among multiple business solutions and document repositories.

From Fail to Grail: What the Experts Say

Bear in mind the concept of Dark Data is being defined with every new article in which the term appears, but experts so far have cast the problem with its darkness, so to speak, with regards to its unfulfilled potential value in terms of maximizing the ability to harness all organizational information assets.

Gartner, for instance, sees darkness as a matter of under-utilization, with the optimistic caveat to do something about it: “[Dark Data consists of] information assets organizations collect, process and store during regular business activities but fail to use for other purposes.” Likewise, Forbes sees this failure to use Dark Data for other purposes as an opportunity: “Dark data—produced during an increasingly complex manufacturing process—offers the potential to shape the factories of the future.”

Putting Dark Data into Context – and Putting it Back to Work – with Metadata

Where we began talking about Dark Data representing the practical challenges of information management as a philosophy and practice, we can now talk about it as an opportunity to take a more pragmatic and intelligent approach to managing and harnessing existing content. The engine driving this is metadata – literally the “data about data” – that can be used to identify, link, curate and cross-reference information in a way that unlocks its relevance and usefulness.

How? Rather than limiting the concept of metadata to attributes (date created or modified, size, etc.), we can expand our view and use of metadata to create a more holistic business perspective. Metadata contains specific properties that not only relate to critical elements of the organization, but can also be proactively applied to drive processes and classify data intelligently – by project, customer, workflow, status and other factors. In this way, we can see value of an information asset can be seen in terms of the amount of metadata associated with it.

Metadata is like a set of omni-directional headlights, navigating very specifically – and relevantly – through Dark Data, while illuminating associations and relationships between items and users along the way. By nature, metadata can illuminate these relationships across one or more repositories or related applications, such as an ERP or CRM system, as well as ensuring a consistency in the way information is used, stored and shared. It provides clear, concise insight into data origins and histories, which can be used to ensure, for instance, that workflows and business processes are properly followed and administered.

Digging deeper into this concept, metadata can also consist of information about the development and lifecycle of a document, including the users, processes and applications involved in its creation, revision and archival, retention and destruction — complete with granular details that drill down to the exact timestamp of changes and actions, such as reviews and approvals as well as the access permissions involved in performing them.

From Blight Side to Bright Side: Managing Metadata

As Dark Data is Big Data’s blight side, the idea of managing metadata as its own entity creates Dark Data’s bright side. IT administrators gain more flexibility to manage its structure. Think of metadata as micro-beacons added to documents to ensure that all enterprise information is searchable, available and exportable – regardless of file type, format or object type.

In this sense, companies can see that what’s important is not so much where metadata resides, or what attributes it contains, but what the information represents and how it should best be classified and managed to maximize its usefulness and value. By managing metadata separately, organizations gain a more holistic view of enterprise content, where even metadata associated with information for which no file exists, like an audit or a deviation, becomes its own asset for the insight it can provide into processes and procedures.

So where we started talking about Dark Data as underutilized information, we see now that metadata comprises the pixels, if you will, to illuminate the connection and alignment of information assets – in all their forms – to create a dynamic 360-degree view of the information. This panoramic approach to information management not only brings Dark Data into the light, it enables dispersed information to be revealed and harnessed in a more relevant way.

(image source: UCL Mathematical and Physical Science)