While the field of data science is not tied directly to Big Data, advances in one tends to produce advances in the other. Big Data increases our ability to harvest and process data, while data science allows us to dig into it for insights.
R is ubiquitous in the machine learning community. Its ecosystem of more than 8,000 packages makes it the Swiss Army knife of modeling applications. Similarly, Apache Spark has rapidly become the big data platform of choice for data scientists. Its ability to perform calculations relatively quickly (due to features like in-memory
Data processing today is done in form of pipelines which include various steps like aggregation, sanitization, filtering and finally generating insights by applying various statistical models. Amazon Kinesis is a platform to build pipelines for streaming data at the scale of terabytes per hour. Parts of the Kinesis platform are
Welcome to Part 2 of How to use Elasticsearch for Natural Language Processing and Text Mining. It’s been some time since Part 1, so you might want to brush up on the basics before getting started. This time we’ll focus on one very important type of query for Text Mining.
The R language is often perceived as a language for statisticians and data scientists. Quite a long time ago, this was mostly true. However, over the years the flexibility R provides via packages has made R into a more general purpose language. R was open sourced in 1995, and since
For people in the know, machine learning is old hat. Even so, it’s set to become the data buzzword of the year — for a rather mundane reason. When things get complex, people expect technology to ‘automagically’ solve the problem. Whether it’s automated financial product consultation or shopping in the supermarket of
If the popular media are to be believed, artificial intelligence (AI) is coming to steal your job and threaten life as we know it. If we do not prepare now, we may face a future where AI runs free and dominates humans in society. The AI revolution is indeed underway. To
The late data visionary Hans Rosling mesmerised the world with his work, contributing to a more informed society. Rosling used global health data to paint a stunning picture of how our world is a better place now than it was in the past, bringing hope through data. Now more than
The digital age is characterised increasingly by the collective. The information generated by tapping into the minds of many is driving decisions in both the public and private sector; research is becoming social. On the back of this, a new science has emerged – known as opinion mining – which
The rise of the data scientists continues and social media is filled with success stories – but what about those who fail? There are no cover articles praising the failures of the many data scientists that don’t live up to the hype and don’t meet the needs of their stakeholders.
Everyone has heard the old moniker garbage in – garbage out. It is a simple way of saying that machine learning is only as good as the data, algorithms, and human experience that goes into them. But even the best results can be thought of as garbage if no one