Spark

Machine Learning using Spark and R
R is ubiquitous in the machine learning community. Its ecosystem of more than 8,000 packages makes it the Swiss Army knife of modeling applications. Similarly, Apache Spark has rapidly become the big data platform of choice for data scientists. Its ability to perform calculations relatively quickly (due to features like in-memory

“Spark has the potential to be as transformational in the computing landscape as the emergence of Linux…” – Interview with Levyx’s Reza Sadri
Reza Sadri is the CEO of Levyx, the creators of high-performance processing technology for big-data applications. Prior to Levyx, Sadri was the CTO for software at sTec, which was acquired by Western Digital Corporation in 2013. What is the potential of Spark? How far is the market from realizing

Hadoop and Spark: A Match Made in (Big Data) Heaven
If you listen in on what people are talking about at Big Data conferences, chances are you’ll hear a lot of buzz around Hadoop and Spark. People often think of Hadoop and Apache Spark as key tools for tackling a wide range of big data challenges, but they assume that

Improving the Accuracy of Big Data Analysis
For many people, it stands to reason that the more data you analyze, the more accurate your results will be. That’s why the idea behind big data analytics is so appealing. After all, a business can spend its time gathering lots of information, analyze it, and come up with some

Better Allies than Enemies: Why Spark Won’t Kill Hadoop
Fans and supporters of Hadoop have no reason to fear; Hadoop isn’t going away anytime soon. There’s been a great deal of consternation about the future of Hadoop, most of it stemming from the growing popularity of Apache Spark. Some big data experts have even gone so far as to

Dating Website eHarmony gets an IT Overhaul
eHarrmony is set to strengthen its technological base using Hadoop, Spark, Docker and possibly OpenStack. Company CTO Thod Nguyen says eHarmony is trying to evolve into a company that’s able to innovate on the IT front, as well as the dimensions-of-compatibility front. In conversation with Gigaom, Nguyen expressed that “A big

Azul Systems and DataStax Partner on High-Performance Java Platform for Cassandra
Azul Systems, the award-winning leader in Java runtime solutions and DataStax, the company that delivers Apache Cassandra™ to the enterprise, announced a partnership to allow DataStax Enterprise (DSE) customers to leverage the enhanced performance of Azul Zing. Zing is now a certified Java Virtual Machines (JVM) for DataStax Enterprise (DSE),

Alpine Data Labs Continue Mission to Bring Advanced Analytics to the Masses with Alpine Chorus 5.0
Alpine Data Labs have unvieled Alpine Chorus 5.0, an enterprise-class advanced analytics platform which aims to unify data access, management and monitoring on one platform. One of the major advancements between Chorus 4.0 and Chorus 5.0 is that Alpine now offers data managements across all the major Hadoop distributions. Alpine’s

Pivotal and EMC Supporting Tachyon As Next In-Memory Revolution
Pivotal, the San Francisco-based software and services provider has announced its partnership with the AMPLab at UC Berkeley to support the in-memory Tachyon project. They aim to bolster the data lake technology with an architecture that builds upon disk-based storage with memory-centric processing frameworks. Being led by Haoyuan Li, a

Hortonworks Deploys Apache Kafka in Preview Mode to Simulate Real-Time Event Stream
Late last week, Leading enterprise Hadoop providers Hortonworks announced the availability of Apache Kafka as a technical preview on their Hortonworks Data Platform product. Kafka orginiated as a real-time messaging system developed at LinkedIn, and was incubated as an Apache project in 2011. “Apache Kafka is a fast, scalable, durable,