Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Netflix’s Big Data Architecture

by Dan Gray
May 15, 2014
in Uncategorized
Home Uncategorized
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Netflix started with home delivery of DVDs in the late 90s, and moved to video-on-demand Internet streaming in 2007. The company has since moved away from simply distributing content to creating it.

House of Cards is the first major TV show to bypass the more traditional distribution channel of TV networks and cable operators, and premier directly to viewers online. In 2013, the show won three Primetime Emmy Awards and was nominated for six other categories. Netflix initially committed $100 million to produce the first two seasons, and has since announced that the show will be renewed for a third season.

What is Netflix’s secret in producing the hit show? Big data.

Data collection

At the end of 2013, Netflix had 33 million US subscribers. From these subscribers, Netflix gathered every event (every time the user rewinds, fast forwards or pauses), every rating, every search. This data is supplement with geo-location data, device information, time of day and week, as well as social media data. A further layer of enrichment comes from third-party providers such as Nielsen.

The data is used to improve Netflix’s personalization algorithm, which recommends other shows to watch based on what the user has seen. Netflix discovered how viewers who enjoyed the original House of Cards miniseries, produced by the BBC in the 1990s, also enjoyed movies starring Kevin Spacey or directed by David Fincher. While creating original content may be risky, it appears that Netflix managed to hedge its bets, and leverage its powerful distribution channel, in creating House of Cards.

Data architecture

Netflix started with a more traditional MySQL database for data warehousing, storing more than 10 years of customer data and billions of ratings. However, the growth of data collected by Netflix started to increase exponentially as the service started to shift towards Internet streaming. Netflix ran into scalability problems when the growth of the customer base is supplemented by data collected from existing users watching more and more shows.

The immediate impact of the massive growth in data collected is the increase in data storage costs. As storing the data in the cloud was about 100 times cheaper than in traditional data warehouses, it made sense for Netflix to start migrating its data to the cloud. While this solves the problem of cost and scalability, the geographical diversity of the data could pose difficulties for real-time queries.

Enter Cassandra. Cassandra is an open source NoSQL database, created by Facebook and now under the open source Apache licence. It scales wonderfully across multiple data centers all over the world. It is optimised for random write/reads, which makes real-time queries possible.

Engineers at Netflix then created Aegisthus to create a ‘bulk data pipeline’ out of Cassandra, as Cassandra is more ideal for individual queries. The output from Aegisthus is stored on Amazon S3 once a day, which is what Netflix is now using as its main data warehouse. The data stored in S3 is analyzed with Hadoop. Netflix uses S3 as the backend instead of HDFS for its extremely high durability and versioning capabilities, even though doing so adds a little latency.

Netflix’s use case of big data and cloud infrastructure highlights how different tools are used for different tasks. More importantly, it emphasizes the power of big data in extracting insights like never before but still being cost effective.

 

This week, we feature the best articles on limitations of Big Data. In particular, Tim Harford makes an excellent case on why correlation does not equal causation.

Image credit: Flickr

 


Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!

[mc4wp_form]

Related Posts

Meet Microsoft Loop, the Notion competitor you waiting for

Meet Microsoft Loop, the Notion competitor you waiting for

November 16, 2023
Beware customers of Optus: Compensation in talks

Beware customers of Optus: Compensation in talks

November 8, 2023
Breaking down the Okta Data Breach: What happened?

Breaking down the Okta Data Breach: What happened?

October 23, 2023
Data sourcing is still a major stumbling block for AI

Data sourcing is still a major stumbling block for AI

August 18, 2022
How AI and Data Analytics Will Impact The Era of COVID-19

How AI and Data Analytics Will Impact The Era of COVID-19

February 17, 2022
The Medical Field is Changing Because of Artificial Intelligence

The Medical Field is Changing Because of Artificial Intelligence

August 19, 2021

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Executives on social media: The value of social leadership

AWS re:Invent 2023: Amazon raises the bar

Spotify Wrapped 2023 is here: How to check it out

Binance puts Cristiano Ronaldo in legal crosshairs

Magnific AI: Another cool upscaler enters the scene

Everything you need to know about Horizon data breach settlement

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.