I met with a client last week that was trying to determine what and how to deal with one of the most common and complex problems that organizations both small and large have to deal with – data management. This oil and gas company had data coming out of its ears – production data, seismic data, HSE data, down-hole sensor data, well log data, financial data… you get the picture. Each department was its own silo; a vast, sprawling archipelago of data islands completely disconnected from each other. Moving data from one department to another was painful and clunky, much like throwing a bunch of stuff on a boat and hoping whoever receives it finds something useful in the reams and ream of info.
I know I just used two or three different analogies to explain a problem; my apologies. Data management isn’t really analogous to anything, given its size, scope, complexity, and the different systems and departments involved. Data management is like data management, and that’s pretty much it. When you ask most industry technology strategists what the big fuss with data management is about, they invariably respond with something to the effect of “companies must leverage data in order to be better informed for decision support, which will lead to higher profitability.” That’s why they are strategists and not CEOs; they can’t speak English coherently.
The problem isn’t what we want to do with the data – everyone knows that data, properly leveraged, can help make better business decisions. The issue is how to tie the disparate pieces of data together such that they can be used to construct insights. How do you get personnel records and field work orders tied together such that you can automatically track employees and their certifications to work on specific equipment, or how many times they have made mistakes on certified equipment? How do you get the systems to share data? Where will it be shared? How can it be manipulated? Once it is manipulated, is there any way to update the source data with insights? That’s the problem – there’s a language barrier. Systems can’t talk to each other.
There are some companies – smaller ones, mostly – that have the discipline and self-control to institute top down master data management (MDM) programs that essentialize the data into relevant quanta and then construct a standard format for that data to be shared with the rest of the company, usually a combination of a standard like PPDM and a bunch of SOA-enabled data adapters that sit on an enterprise service bus. Others use tools like ETL (for tight coupling) or mediated schemas (for loose coupling) in order to overcome this problem, but you are stuck in rigid one-to-one database relationships, orphaned data, and a lack of agility.
Still other try to do it the “looks quick and easy but really isn’t” way with Informatica, Tibco, or InfoSphere. Sure, you get the enterprise bus and a nice data catalogue and promises of universal connectivity… but your corporate data governance, internal data models and processes, and organizational fluidity go to poop. Oh, and setting it up sucks.
At this aforementioned oil and gas client, each of the various departments had their own way of managing data. Some had data warehouses where they threw the bulk of their stuff into. Others had multiple data warehouses for multiple systems, or multiple data warehouses for the same system but for different end users. Other departments lived in Excel or Access. And anyone with even a hint of SQL experience would write queries. Many times, it seemed, just for the hell of it.
Directives for data management from executive leadership were unclear and not aligned to the vagaries of an oil patch employing a thousand people broken up into scores of different departments. After about fourteen seconds of reflection, it occurred to me that overcoming organizational inertia, political realities, departmental lines, and resistance to change in order to create and implement a top down master data model for the enterprise would be, in a word, impossible. No one was going to put together a data model that everyone would agree upon; understanding the relevancy of data specific to arcane technical disciplines was too complex an undertaking. Understanding the systems, applications, current transmittal and ingestion technology, and the various reference architectures used for data management would take years. Even if we could aggregate, document, and organize this data in a master data warehouse, by the time we were done it would have already changed. Simply put, a curated data repository at an enterprise level was functionally impossible.
So we pitched a modified master data management strategy. Here are the components:
- Bottom-Up Data Models: Who knows data best? More than likely it is going to be the analysts that are creating, transforming, or ingesting the information on a regular basis. They will know more about the characteristics, value, velocity, accuracy, and nuances associated with the data than any enterprise architect brought in to define a data model. There’s some data that needs to be shared with the rest of the company, and then there’s data that is noise to everyone but the department involved. A solid data management strategy is to allow the decomponentization of data to occur at the lowest possible reasonable level, while ensuring that it conforms to an adopted industry standard like PPDM or whatever else strikes your fancy. Let the boots on the ground decide what ontology or data integration methodology works best for the data or what data needs to be shared; as the architects, we just want to make sure that it conforms to a formatting standard and a top-level ontological vocabulary or schema. The business should be responsible for managing the local data model/schema, updating it for new data types, and incorporating new definitions for specific data requests from other lines of business. By creating a standard structural model for data in transit, we can describe a near-universal mechanism for data translation that only requires source knowledge rather than both source and target knowledge. The goal is a universal mediated schema (for lack of a better term) that allows any-to-one data integration. As long as the ingested data conforms to the universal mediated schema, the need to create connectors between two disparate data sources should not be necessary.
- Enterprise Service Bus: This is where workflow and business processes are instantiated. It also serves as a master communication hub for and between the various nodes connected to it. An ESB can carry event-driven messages that have the capability to kicking off real-time updates o both data and applications. Although not strictly required, it extends the capabilities of connectors and applications immensely.
- SOA-enabled Data Connectors: In this scenario, connectors are universal and have only two functions, ingesting and sending. The adapters, when sending data, transform the data to adhere to a data model that is hierarchically defined – like a master data model that everyone has to map to, and as you go down the food chain, it gets more complex and detailed. The transformation here is blind – the target system data format should be irrelevant. The adapters, when ingesting data that subscribe to the above-mentioned data model, are able to transform the data to suit the needs of the application that has requested it. For instance, if we are sending data from SAP to Oracle, the SAP adapter would format the data according to the rules of the data hierarchy. The receiving Oracle adapter would ingest the data and transform it to fit the needs of the Oracle database based upon the hierarchy and information of the data model.
It’s important to note that for the near- to mid-term, most companies need master data management. It is the only way to ensure data consistency and it simplifies business analytics. There are many ways to skin this cat (pardon the metaphor), and each comes with varying degrees of effort, complexity, and agility. Some may find that a pre-packaged solution from Informatica is good enough; others may determine that all data needs to be cleansed and curated from an executive level. A path forward is predicated upon the structure, complexity, and volume of your current data sets, and success is not some imagined goal post but rather the slow and orderly integration of data and applications. Ultimately, what you want is for your company to leverage that data in order to be better informed for decision support, which will lead to higher profitability. Right?
Riiiight.
Jamal is a regular commentator on the Big Data industry. He is an executive and entrepreneur with over 15 years of experience driving strategy for Fortune 500 companies. In addition to technology strategy, his concentrations include digital oil fields, the geo-mechanics of multilateral drilling, well-site operations and completions, integrated workflows, reservoir stimulation, and extraction techniques. He has held leadership positions in Technology, Sales and Marketing, R&D, and M&A in some of the largest corporations in the world. He is currently a senior manager at Wipro where he focuses on emerging technologies.
(Image Credit: Colleen Galvin)