LinkedIn has rolled out its very own web-scale real-time analytics engine, dubbed Pinot. Designed and built at LinkedIn, Pinot will allow sifting through immense troves of data in real-time across a wide variety of products.
Earlier, LinkedIn used generic storage systems like Oracle (RDBMS) and Voldemort (Key Value Storage), for their analytics products, writes Praveen Neppalli Naga, Engineering Manager at LinkedIn.
However, inspiration towards development of Penot came with the realisation of wanting infrastructure to support high read QPS with low latencies, current systems lacking OLAP specialization, exponential increment of data in “breadth and depth” and the widening needs across the company thus prompting the setting up of a centralised system.
“LinkedIn is a data driven company and product usage data is critical to the product management teams and they make decisions based on those insights. They require a powerful tool to analyze the data and Pinot is the infrastructure that supports it,” writes Mr. Naga in the blog announcing the development.
In order to cater to the large volume of data, they took to Apache Helix, to parallelize the query processing and Kafka and Hadoop to support the real-time needs of their products.
Two years in the making, Pinot is now the “de-facto analytics platform for LinkedIn.”
LinkedIn is setting up a developer community around Pinot to “help take it to the next level and are evaluating turning it into an open source project,” writes Naga.
(Image Credit: Coletivo Mambembe)