Mainframe users are getting a major upgrade thanks to an open source tool from Syncsort that links the IBM z Systems with big name big data processor, Apache Spark.

In the days of cloud computing, few people even remember the mainframe. Likely, no one in your neighborhood knows what they are, and fewer still have ever owned one. Even still, mainframe computers are far from irrelevant.

Companies chained to their mainframes are not simply old fogies afraid of new technology. Their computers provide powerful, secure and stable services that people quite simply cannot do without. Mainframes are integral to many major government programs—including Food Stamps and Social Security, for starters. These computers are stable and secure, developing slowly and carefully integrating the old with the new. Furthermore, they’re capable of performing thousands of transactions every second, managing terabytes of data and handling large­ bandwidth communication. For this reason, airlines, banks and other large companies rely heavily on the technology dinosaur. In fact, if you’ve ever taken money out of an ATM, even you have used a mainframe.

Mainframes may be comparatively old, but they are goldmine of data. This is why Syncsort, who specialize in integrating mainframes with modern technologies, released the Open Source connector for the incredibly popular data processing engine Apache Spark.

Data Practices: Then and Now

Mainframe users have already putting their data to use for years. In the past, this was done using something called ETL: Extract, Transform and Load. This method functioned relatively well when companies had manageable amounts of data to move; of course, with the breadth of social media and wide­spread web transactions that have become so commonplace, data has become far too big to reasonably move using ETL. Companies need to make quick decisions based on current data—not waste time and money just to find that their data has become old and irrelevant. Greg

Willhoit of Rocket Software may have summed general feelings about ETL up perfectly when he talked at SHARE in Pittsburg: “ETL sucks.”

ETL sucks Click To Tweet


A New Spark for the Aging Mainframe

Developed at UC Berkeley, Spark has already succeeded in making a name for itself as a front runner in open source processors. Up to 100 times faster than Hadoop MapReduce, they hold the world record in large­scale on­disk sorting, and are revered for consistent growth and development. They’re exactly what big data users want to get their hands on. In fact, IBM invested $300 million in Spark earlier this year, committing 3,500 developers and researchers, and backing the new Spark Technology Center.

“We are excited that Syncsort has made this valuable contribution to the Apache Spark community,” says Apache Spark’s creator, Matei Zaharia

A sense of gratitude from users and developers alike permeates the news of the new tool. Finally able to hook up mainframes to a powerhouse like Apache Spark means companies stuck in the semi­dark ages of the past can finally emerge.

As banks and other vital companies are the ones utilizing mainframes, the effects of the new Syncsort tool will likely reach the average consumer in big ways. With the ability to read and analyze data more quickly and cost efficiently, the results could be tangible and far ­reaching. With advertisers already predicting what kind of ads a user wants to see, it’s time for government agencies and major organizations to be able to better predict important outcomes.

The creation of Syncsort’s tool opens a multitude of possibilities for old methods and technologies to be merged with those on the cutting edge. Whether mainframe users were stymied by technical realities or an out­of­date mindset, today they are getting a second chance at life. Any ceiling or limitation that existed before is slowly being lifted and these agencies may soon be off to the races as they find new ways to use their data, improve their markets and developments, and, hopefully, create even more possibilities.

(image source: Joe Martin)


Previous post

The Power of Analytics : Solving Problems Beyond Finding Insights

Next post

The Arrival of Scalable, Fault Tolerant Big Data Ingestion