Datastax, the leading enterprise Cassandra provider, recently unveiled Datastax Enterprise 4.5. DSE 4.5 is focused around making it easier than ever to develop and deploy, as well as increased performance capabilities, supported by integration with Apache Spark and partnership with Databricks. Dataconomy recently spoke to Robin Schumacher, Datastax’s VP of Products, about the latest developments, as well as responding to Cassandra naysayers.
Tell us more about the intergrations within Datastax Enterprise 4.5.
To set the context for you a little bit, the next set of releases that you’ll see from Datastax are really performance focused. We’ve concentrated a lot in the past couple of years on building up Datastax Enterprise to be an enterprise-class NoSQL database platform. We’ve added a lot of things to make that happen; now that that’s been accomplished were really focusing on performance. So you’ll be seeing performance enhancements and open-source Cassandra, and in Datastax Enterprise, including this release. So back in February, when we announced Datastacks Enterprise 4, we brought out the first version of our in-memory database for transactional workloads. Now in this release, what we’re focused on is more performance analytics.
One of the things that were using to make that happen is integration with a Apache Spark. Spark is really about being able to run analytics across a shared-nothing architecture and that can happen with in Hadoop with HDFS. But the good news is they also enable us to run the same type of analytics on Cassandra so we have a formal partnership with Databricks which is the company behind Apache Spark. So we will be delivering a more near real-time analytics capability that uses Spark, and this allows us to have both in-memory and a disk space style of running analytics on Cassandra data. Really, the added benefit to the customers is much faster response times for analytic queries on Cassandra than they’ve been able to have in the past.
The other thing readers might be interested in knowing is that because Spark has an in-memory component to it, that can be married to our in-memory transactional option in Cassandra and so for the use cases that it applies to, you can have a full in-memory solution now inside Datastax programs for transactional & analytic workloads. Keeping things in memory will make read operations and analytic operations very, very fast.
Complete our SAP x Data Natives CDO Club survey now, and help us to help you
What is exclusive to your enterprise solution?
One of the main things DSE contains over and above open source is enterprise manageability , through things like our automated management services and off centres, that make things push-button easy. The second is enabling developers, giving them the driver, the craft tools, the utilities they need to really create their applications as fast as possible. And then lastly, being able to satisfy our key use cases, things like fraud detection, Internet of Things, messaging, recommendation engines- these are the things I’m really focused on in the commercial product side.
Can you tell us more about analytics capabilities on Datastax Enterprise 4.5?
With 4.5, we’re actually bringing out two new analytics options for our customers. The first is the Spark integration and the second is integration with external deployment, external Hadoop data warehouses. What we want to do here is better enable our customers to link their hot operational data that they have in Datastax Enterprise and Casandra with historic information that they keep in Hadoop. And so 4.5 enables this very easily, where we can easily connect both platforms together, query data at the same time on both platforms and return that data back to the customer and they can either keep it on our platform or ship it off to their external Hadoop platform.
When we spoke to Jonathan [Ellis, Datastax co-founder], he said Cassandra 2.1 might be production-ready by the end of June. Is there a set release date yet?
I believe the new target is for around the middle of July, where 2.1 is concerned.
When the initial announcement was made about the Datastax and Databricks partnership, the COO of Scaleout said that ‘Spark doesn’t handle real time state changes to individual data items in memory. It can only stream data and change the whole data set,’ and that ‘Cassandra has similar limitations because you can’t update data on Cassandra, all you can do is delete it and create a new copy.’ What do you make of these comments?
The latter has to do with how Cassandra writes data and Cassandra is probably the most efficient platform for writing data which you’re going to find, because of how it writes data. The matter he is describing doesn’t really impact on what customers experience because again the data is written very very quickly; it is done behind the scenes asynchronously in a very fast and efficient manner. It’s one of the reasons why Cassandra is used in so many Internet of Things applications, and other write-intensive environments. Our customers don’t complain about that at all.
And as for the former, I think really the only thing you’re looking at is the end result that customers experience. One of the things we talked about here, some of the differentials that we’re seeing between batch analytics that we’ve offered in the past and this new Spark integration, we’ve seen- even in some of the smallest areas- a 50% increase in performance with some of them getting up to a 30x style speed up. We’ve got some high queries as an example, that have taken five minutes to run on the prior data that now take one second to complete. I think that’s all the customer really cares about in the end.
We’re very proud of how Cassandra writes data; it’s completely durable, your data is completely safe and its written faster that literally any other relational literal sequential engine.
A quick company update is that we’re up to about 300 employees and we’re still growing like crazy, and we just passed our 500 customer mark.
Datastax is the leading enterprise solution focused around Apache Cassandra, an Apache Foundation open-source project. Apache Cassandra is a NoSQL database technology, which features scalability, always-on availability and fault tolerance. Datastax Enterprise solution offers Cassandra with added security, search, analytics and management features.
(Image credit: Silicon Angle)
Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!