Apache Spark

Free eBook: A Practical Introduction to Apache Spark
If you are a developer or data scientist interested in big data, Spark is the tool for you. Apache Spark’s ability to speed analytic applications by orders of magnitude, its versatility, and ease of use are quickly winning the market. With Spark’s appeal to developers, end-users, and integrators to solve

Machine Learning using Spark and R
R is ubiquitous in the machine learning community. Its ecosystem of more than 8,000 packages makes it the Swiss Army knife of modeling applications. Similarly, Apache Spark has rapidly become the big data platform of choice for data scientists. Its ability to perform calculations relatively quickly (due to features like in-memory

Putting Mainframe Data to Use: The New Open Source Tool That Teaches Old Tech New Tricks
Mainframe users are getting a major upgrade thanks to an open source tool from Syncsort that links the IBM z Systems with big name big data processor, Apache Spark. In the days of cloud computing, few people even remember the mainframe. Likely, no one in your neighborhood knows what they are, and

Better Allies than Enemies: Why Spark Won’t Kill Hadoop
Fans and supporters of Hadoop have no reason to fear; Hadoop isn’t going away anytime soon. There’s been a great deal of consternation about the future of Hadoop, most of it stemming from the growing popularity of Apache Spark. Some big data experts have even gone so far as to

How Flink Became an Apache Top-Level Project
A multi-coloured squirrel may not seem like the most obvious choice of logo for a data processing technology; then again, the team behind Apache Flink have hardly done things by the book. What start out as a University research project evolved into a fully-fledged company, complete with artfully-decapitalised name (data

Machine Learning and Hadoop- How One of the Most Widely Used Big Data Technologies Has Evolved
It’s safe to say that at the present moment, machine learning is big news. In the past week, we’ve seen Tumblr getting in the game, Google making further machine learning acquistions, and Nervana annoucing an £3.3 million in funding for the machine learning initiatives. When you think of machine learning,

Lightning Fast and Enterprise-Class: Datastax Enterprise 4.5
Datastax, the leading enterprise Cassandra provider, recently unveiled Datastax Enterprise 4.5. DSE 4.5 is focused around making it easier than ever to develop and deploy, as well as increased performance capabilities, supported by integration with Apache Spark and partnership with Databricks. Dataconomy recently spoke to Robin Schumacher, Datastax’s VP of

Databricks Raises $33 Million and Introduces Cloud Platform for Processing Big Data
Databricks, a start up that builds software around the popular open-source project Apache Spark, announced on Monday at this year’s Spark Summit in San Francisco that it has raised $33 million in Series B funding. The announcement also included the launch of a new cloud computing service on Amazon Web

MapR Obtains Three Awards
MapR, a leading provider of Apache™ Hadoop®, has been awarded a number of awards honouring important companies in the in the cloud and big data industries. Below are the three awards MapR has received recently. AlwaysOn OnDemand 100 Top Private Companies AlwaysOn evaluated companies in the entrepreneurial ecosystem to select

Berlin Buzzwords is Back- Our Pick of the Events
Berlin Buzzwords, ‘Germany’s most exciting conference on storing, processing and searching large amounts of digital data’, is back for a fifth year. The conference will take place on May 25-28, at Kulturbrauerei Berlin. It will feature a range of presentations on large scale computing projects, ranging from beginner-friendly talks to