The White Rabbit, Cheshire Cat And Your Analytics Wonderland: How Data Curation Reminds Me Of A Familiar Childhood Journey

Analysis is easy. It’s rational and straightforward. Understand the relatively linear set of steps for populating an analytic model and you’re set for success. Finding insights is the part of the data process that is hard. It requires the non-linear skill of interpretation. Interpretation is inexact, it requires judgement and is generally a process that can be unpredictable in advance and occasionally extremely frustrating.

Going down the rabbit hole

Successfully finding insights requires having a white rabbit to guide you down the rabbit hole of discovery. You must leave the familiar ground of key performance indicators that you trust and drill downward into the world of raw data. Faced with a new world of wonders, Alice could only interpret them through her own limited frame of reference. In some organizations, that frame of reference is provided by expert data stewards. Data stewards, where they exist, are asked to manually consolidate and document this knowledge as best practices or rules for exploration. But divorced from your process of exploration, these rules are often ignored. And even when the rules are followed, they may be only one perspective among many possibilities and not enough to deliver true insight. Like Alice, you may find yourself left to your own devices on the path to insight and may not be able to predict when you’ll hit your Wonderland, but when you do it will be magical, created out of your own curiosity and the wonder of data.

This is one of the most beautiful things about analytics but it is also the most dangerous. For curiosity has a strong link to imagination. And the enticement, but also the downfall of imagination, is that it can set us on paths divorced from reality. There are rules that matter in discovering insights- governance policies, privacy rules, and details about the accuracy of your data. All of these count in finding a correct conclusion. Insights are only as useful as they are accurate.

Knowing your data like the Cheshire Cat

Today, data governance rules are packed away in process diagrams, documents, wikis and technical metadata repositories divorced from your analytic process and hard to incorporate manually into your workflow. What inevitably results is a well-known scenario played out across boardrooms day after day. Data can lead to inaccurate conclusions, and data visualizations based on bad data can be especially problematic. A beautiful, full-color visualization is revealed only to be defrocked by a barrage of questions from the business leaders- where did you get this data? Who validated its accuracy? What you’re showing me doesn’t align with what I know about our business from my years of experience. Does your interpretation follow best practices? What are our best practices for visualizing this type of information?

In all reality, your exploration of Wonderland requires a guide, but no one has ever walked the exact same path as you have through your data. It’s more likely you need a guiding spirit, someone who will appear when most necessary to keep you sane, but also allow you to experience the essential secret of Wonderland, maybe through your own Mad Teaparty. You need Cheshire Cat.

In every analytics organization, there are experienced analysts ready to take on the role of the Cheshire Cat in your analytics Wonderland. They have explored the inner corners of every table, file and cube in your data repository. They’ve struggled with standardizing the semantics of different parts of the business. You might not be able to see them, but they have been there before and have privileged knowledge of the inner workings of Wonderland.

Enter Data Curation

Like the ephemeral Cheshire Cat, this knowledge needs to appear when and where you need it. Data Curation is how you can let the knowledge of these analysts materialize and dematerialize within the flow of your analysis. By revealing the breadcrumbs left explicitly or implicitly by analysts that have some before you, systems that support the curation of data establish a simple, frictionless way of sharing data knowledge throughout an organization.

Data Curation includes techniques introduced initially through social curation applications. Things like:

Lists
Popularity rankings
Relevance feeds
Annotations
Comments
Up-votes/Down-votes
Articles

Some of these require analysts to manually populate some information to enable sharing, but others, like relevance feeds and popularity rankings leverage machine learning and AI technologies to observe, parse, and automate the communication of data usage patterns to end users.

Increasingly, data catalogs and their integrated data recommendation engines automate a good deal of the work of curating data. They provide just enough of an automated head-start to sharing best practices that they make a real difference in the distribution of data knowledge throughout an organization. And what these automated techniques start is a virtuous circle of engagement by both expert data users and the casual user who feel empowered to add to the knowledge base of information about your data.

Getting Data Governance right

In order to be truly effective, that data knowledge needs to be consolidated into a single source of reference rather than scattered across multiple, sometimes conflicting sources of truth, a virtual jabberwocky that can be unintelligible to the uninitiated. Traditionally, creating a data catalog that could serve as this source required heavy-handed documentation performed by expert technical writers or IT data modelers, not business users. Proficient with the skill of documenting, these were clear sources for guidance, but they lacked the wisdom of applied actions and the speed and insights that can come from the free-form exploration of data.

Today’s modern data catalogs take a page from the world of social media to empower data curation with a lighter touch. Analysts who are data experts in action can share, annotate, and collaborate around data, seamlessly, in the flow of exploration. Queries, common syntax and communicated definitions are captured through social interaction. Think Pinterest for your data.

This approach to governing your data allows for a best of both worlds scenario — IT gets the assurance and control it needs to ensure accuracy and compliance while business users remain as unencumbered and free as Alice to explore new worlds of data analysis. Most important is the fact that, not only does this allow business users to play with data, it also provides the organization as a whole with the insights needed to drive business growth.

Some of the world’s most advanced data-driven organizations are embracing Data Curation as a technique to share data knowledge seamlessly across a new population of self-service data users, and take the data literacy of their organizations to a new level.

Whether through data visualization, analysis or data prep, a new world of self-service, but with governed access unfolds and no longer is the data teaparty a mad endeavor. The sanity of those who have come before is shared with every new Alice and yet the Wonderland of your data is preserved.

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Image: Jannes Pockele