"Hadoop Practitioners Alike Should Rejoice In The Rise Of Spark..."- Interview With Altiscale's Mike Maciag

Mike Maciag is the COO of Altiscale. Prior to Altiscale, he served as the president and CEO for DevOps leader Electric Cloud, where he grew the revenue from zero to tens of millions while building a worldwide presence and signing hundreds of blue-chip customers. Mike holds an MBA from Northwestern University’s Kellogg School of Management, and a BS from Santa Clara.

You’ve spent your career on the business side of technology companies and have successfully scaled many IT companies. What attracted you to the Hadoop market?

There is no doubt that we are moving toward a data-driven economy where enterprises of all sizes need to tap into their data resources in order to provide relevant products and services to their consumers. As a key component to any data-driven enterprise, Hadoop has evolved from a simple data repository into an engine that fuels smart, data-driven decision-making.

Hadoop began as an Apache project but is now also understood as an ecosystem of rapidly evolving open-source software projects. Because there has been so much interest in Hadoop and subsequently so many developments in this ecosystem, it has become difficult for companies to keep up. Why wrangle complex Hadoop clusters when you can have experts handle this data work on your behalf?

What kinds of companies are most likely to benefit from a cloud-based approach to Hadoop and related Big Data technologies (versus on-premise)?

Hadoop in the cloud, and other Big Data technologies provided as a service, have emerged as popular alternatives for enterprises that don’t want to manage the minutiae of running these technologies at scale and in production. Mid-sized companies with large data sets and limited IT budgets certainly benefit from a managed cloud service that removes these operational burdens, helping Big Data projects flow seamlessly and ultimately resulting in cost savings. However, within our own customer base, we also work with many large companies that find tremendous value in outsourcing these operational burdens, freeing up data scientists from the “janitorial” work associated with running Big Data so that they can focus on what they do best: exploring data to discover innovative ways to drive business value. These customers span all verticals, although we’ve seen particularly strong interest coming from financial services, healthcare and manufacturing.

Do you see the rise of Spark as a threat to Hadoop?

Hadoop practitioners alike should rejoice in the rise of Spark. Spark is actually a replacement for MapReduce, a data processing platform that runs on top of Hadoop. Hadoop also consists of the YARN resource manager and the HDFS storage system, both of which are required to run Spark. Spark is a critical technology to the future of the Hadoop ecosystem as it enables the near real-time processing and analysis of Big Data, a vital technical requirement for today’s fast-paced enterprise.

How do you predict the role of the data scientist will evolve in the next five years?

Data scientists are bogged down with running the operations of Big Data instead of exploring the data. And with good reason. There is already a debilitating skills gap in most enterprises, with research firm Gartner predicting that through 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges. Data scientists are already in high demand and their skills are often put to poor use (like wrangling Hadoop clusters). As technological advances and service organizations that remove these operational burdens gain traction, data scientists will shed their janitorial duties and shift their focus to finding new ways to leverage data.

What is the biggest misconception of Hadoop in the market today?

Many first-time Hadoop adopters are justly concerned about compatibility of their Big Data deployments. While the industry still has some kinks to iron out regarding compatibility, initiatives like the Open Data Procession Initiative (ODPi, under the auspices of the Linux Foundation) are moving practitioners toward greater standardization of Hadoop.

Where does Hadoop fit into the world of Big Data and the Internet of Things?

It may be the buzzword du jour but the Internet of Things aptly describes a growing challenge for IT departments: new methods of data collection have created data stores that were previously incomprehensible to anyone besides the largest of internet companies. Today, that has changed as organizations of all sizes are collecting inordinate amounts of data. Hadoop, with its roots within leading Silicon Valley internet companies, is a natural platform to handle this scale of data. IT leaders may decide to get rid of certain types of data but as a rule of thumb, I recommend that organizations hold on to all their data assets. As business priorities grow or shift, that data may prove to be valuable.

How can an organization make data stored in Hadoop “business-user friendly”?

Hadoop vendors are increasingly finding success selling self-service analytic solutions to non-IT business departments in the enterprise (like marketing). Business users want to have the ability to extrapolate insights from their data by integrating with their business intelligence tool of choice (like Tableau or Microsoft Excel) and start creating business value quickly without getting stuck in data preparation purgatory. We’ll increasingly see Hadoop solutions sold to these business users so that they can get up and running quickly and extract business insights from their data.

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia