Netflix is Revamping their Data Architecture to Handle Streaming Movies to Millions

Netflix’s growth over the last seven years has led to the provider of on-demand Internet streaming media, utilize ‘robust and scalable architecture’ in order to manage and process user viewing data.

Following an announcement earlier this week that they’re open sourcing their data-analysis-on-Hadoop tools, Netflix have now revealed the need for and imminent overhaul of the fundamentals of their computing architecture ‘in order to scale to the next order of magnitude’.

Netflix divides the use cases into three generic categories that affect user experience, aiming for the minimum viable set of use cases, rather than building an all-encompassing solution:

What titles has the user watched
Where a user stopped watching a given title
What else is being watched on that user account right now

However, the stateful tier that emerged in order to tap the ‘benefits of memory speed for our highest volume read/write use cases,’ was more complex than existing mature open source technologies.

Acquiring the next order of magnitude, by ‘re-architecting a critical system to scale’ would at best take time, development, testing and ‘migrating off of the previous architecture,’ points out Netflix.

The redesign of the architecture would follow certain guidelines:

The design would favour availability rather than strong consistency in the face of failures.
Microservices – Components that were combined together in the stateful architecture should be separated out into services (components as services).
- Components are defined according to their primary purpose – either collection, processing, or data providing.
- Delegate responsibility for state management to the persistence tiers, keeping the application tiers stateless.
- Decouple communication between components by using signals sent through an event queue.
Polyglot persistence – Use multiple persistence technologies to leverage the strengths of each solution.
- Achieve flexibility + performance at the cost of increased complexity.
- Use Cassandra for very high volume, low latency writes. A tailored data model and tuned configuration enables low latency for medium volume reads.
- Use Redis for very high volume, low latency reads.