While the field of data science is not tied directly to Big Data, advances in one tends to produce advances in the other. Big Data increases our ability to harvest and process data, while data science allows us to dig into it for insights.
The sheer volumes involved with Big Data can sometimes be staggering. So if you want to get value from the time and money you put into a data analysis project, a structured and strategic approach is very important. The phenomenon of Big Data is giving us ever-growing volume and variety
These days, every business is exploring ways to use data and new technologies to gain a competitive edge. While there’s no questioning the value to be found in big data analytics, organizations have had a low success rate to date when it comes to rolling out data initiatives. A recent
#Unit festival is a new tech event with a message, promoting the power of unicorns in the technology community. 40+ top tech speakers delivering insights from a range of backgrounds. Music performances, interactive installations and thought-provoking entertainment. A dynamic networking experience with the opportunity to make an abundance of new
Everyone likes to save money–and when you’re running a huge business like Zalando, saving money by making your systems more efficient can mean saving millions of Euros. That’s why I’m excited to tell you that the Zalando Data Intelligence team recently found a way to save us piles of money.
There is much discussion in the industry about analytic engines and, specifically, which engines best meet various analytic needs. This post looks at two popular engines, Hive and Presto, and assesses the best uses for each. How Hive Works Hive translates SQL queries into multiple stages of MapReduce and it
With passengers once again left frustrated by rail delays and disturbance over the Easter period, UK rail operators will no doubt face further scrutiny over the quality of service they are providing commuters. With services frequently disrupted by signalling problems, broken-down trains and congestion, the industry, the Government and, of
In a particular subset of the data science world, “similarity distance measures” has become somewhat of a buzz term. Like all buzz terms, it has invested parties- namely math & data mining practitioners- squabbling over what the precise definition should be. As a result, the term, involved concepts and their
Twitter has pulled the plug on its data feed ecosystem by revoking access of its syndicated data feed of all tweets generated throughout the microblogging site, collectively called the ‘Firehose,’ to all partnered third parties. The move comes as a followup to its acquisition of the social media API aggregation
Our hyper-networked society is producing data at an ever increasing rate. According to IBM research, more than 2.5 quintillion bytes of data are created each day; and more than 90 percent of the world’s stored data was created in the last two years alone. Analyst firm IDC estimates that the
Heatmaps are now becoming a popular visualization technology used outside the fence of data science. They can be used, for example, to produce an overlay on a webpage to let web site administrators easily understand which parts of the site are most interesting (application examples can be found at https://heatmap.me).