Data Science 101

Big Data 101: Intro To Probabilistic Data Structures
Oftentimes while analyzing big data we have a need to make checks on pieces of data like number of items in the dataset, number of unique items, and their occurrence frequency. Hash tables or Hash sets are usually employed for this purpose. But when the dataset becomes so enormous that

Programming with R – How to Get a Frequency Table of a Categorical Variable as a Data Frame
Categorical data is a kind of data which has a predefined set of values. Taking “Child”, “Adult” or “Senior” instead of keeping the age of a person to be a number is one such example of using age as categorical. However, before using categorical data, one must know about various

Data Science vs. Data Analytics – Why Does It Matter?
Data Science, Data Analytics, Data Everywhere Jargon can be downright intimidating and seemingly impenetrable to the uninformed. While complicated vernacular is an unfortunate side effect of the similarly complicated world of machines, those involved in computers, data and whole host of other tech-intensive sectors don’t do themselves any favors with

If you care about Big Data, you care about Stream Processing
As the scale of data grows across organizations with terabytes and petabytes coming into systems every day, running ad hoc queries across the entire dataset to generate important metrics and intelligence is no longer feasible. Once the quantum of data crosses a threshold, even simple questions such as what is

The Problem With (Statistical) False Friends
I recently stumbled across a research paper, Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US, which piqued my interest in derivative uses of data, an ongoing research interest of mine. A variety of deep learning techniques were used to draw conclusions about relationships

Infographic: A Beginner’s Guide to Machine Learning Algorithms
We hear the term “machine learning” a lot these days (usually in the context of predictive analysis and artificial intelligence), but machine learning has actually been a field of its own for several decades. Only recently have we been able to really take advantage of machine learning on a broad

25 Big Data Terms Everyone Should Know
If you are new to the field, Big Data can be intimidating! With the basic concepts under your belt, let’s focus on some key terms to impress your date, your boss, your family, or whoever. On November 25th-26th 2019, we are bringing together a global community of data-driven pioneers to talk

Stream Processing Myths Debunked
This post appeared originally in the dataArtisans blog Six Common Streaming Misconceptions Needless to say, we here at data Artisans spend a lot of time thinking about stream processing. Even cooler: we spend a lot of time helping others think about stream processing and how to apply streaming to data

Get the facts straight: The 10 Most Common Statistical Blunders
Competent analysis is not only about understanding statistics, but about implementing the correct statistical approach or method. In this brief article I will showcase some common statistical blunders that we generally make and how to avoid them. To make this information simple and consumable I have divided these errors into

How to avoid the 7 most common mistakes of Big Data analysis
One of the coolest things about being a data scientist is being industry-agnostic. You could dive into gigabytes or even petabytes of data from any industry and derive meaningful interpretations that may catch even the industry insiders by surprise. When the global financial crisis hit the American market in 2008, few