
Netflix’s Big Data Architecture
Netflix started with home delivery of DVDs in the late 90s, and moved to video-on-demand Internet streaming in 2007. The company has since moved away from simply distributing content to creating it. House of Cards is the first major TV show to bypass the more traditional distribution channel of TV

Big Data Compounds Privacy Problem
A common concern with big data is privacy – where do we draw the boundary between public information that can be mined, and private information that cannot. This becomes increasingly acute as people are broadcasting more information about themselves. Existing data protection laws generally allow individuals to request that the

Netflix Creating TV Shows with Big Data
In 2012, Americans watched more legally-delivered video content via the Internet than on physical formats such as DVDs. This shift in format also allows online content providers to gather large amounts of data on viewer habits. Recommendation engine Netflix is the de facto iTunes for online video content. It collects

Facebook Rumors – A Visualization of Data Propagation
Last week, we featured Twitter’s engagement and credibility problems and how, if untreated, may spell the platform’s demise. This week we review the dynamics of rumor-sharing. Social media has democratized the act of disseminating information to a wide audience and in real time. While this dilutes the monopoly held by

Big Data: The Be-All, End-All? Part 2 of 2
Last week, in conjunction with Big Data Week, we featured the best articles that speak to the power of big data. While Big Data allows to extract insights like never before, it is also not the magic bullet in all situations. This week we feature the best articles that highlight

Facial Recognition: Big Data Going Too Far?
Facebook’s DeepFace technology is able to recognize faces 97.25% of the time, against 97.53% for a human. Such developments raise questions about the power of such tools and the implications for the future. Selected use cases of facial recognition technology include: * NameTag: Allows the user of Google Glass to

Speech Recognition: What a Difference Statistics Makes
Speech recognition had long been considered difficult to program, but now appears standard in most modern smartphones. What’s the secret? Statistical modeling. While any word can begin a sentence, the words that follow do not appear in a random order. Machine learning can be employed to enable the software to

Internet of Things: Open Standards or Proprietary?
The Internet of Things is not about what an individual device can do. It’s about what happens when multiple devices start interacting with each other. This also means that the Internet of Things will have to be built on open standards. The Internet works because it adhered to the following

Python Displacing R in Data Science
As data science becomes more mainstream, it appears that Python is muscling out R as the programming language of choice. The following factors are at work: 1. R not really a language R is considered by some to be an “interactive environment for doing statistics”. This makes it harder for

The Death of Twitter
Twitter started as a social version of text messaging. The limit on a text message is 160, which Twitter truncated to 140 to account for the sender’s name. Early reviews considered the platform to be “addictive and at the same time annoying”; a less charitable review went as far as