Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Criteo’s Prediction on Hadoop: How and Why it Came About

by Guillaume Turri
February 27, 2015
in Data Science
Home Topics Data Science
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

At Criteo we display online advertisement, and we sell clicks to our clients. So we have to predict, for each of our 2 billion daily banners, whether it will likely be clicked or not. That’s why we use machine learning, and we feed the algorithms of our well-oiled engine with big data.

But it has not always been like this. Actually, the shape of our prediction engine –and its underlying architecture- had to evolve as our business grew.

At first, when our dataset fitted in a SQL table, there was no need to invoke the name of Hadoop. At that time, we implemented regression tree algorithms in C#. Single process learnings took place on a single server. And we were happy.

An issue with regression tree, is that their size can explode exponentially as you add dimensions. Before our algorithm reached those limits, we improved it. First we used Bayesian networks for a while. Then we implemented generalized linear models. This change increased our performance a lot. And we were proud.

But then, as the needs of the business increased, we had to add another server. And another one. And… and we were worried that our architecture would reach its limits in a near future.


Join the Partisia Blockchain Hackathon, design the future, gain new skills, and win!


Migrating our existing solution on our Hadoop cluster seemed a healthy way to go. It was far from trivial, but it was definitely where we wanted to be. To achieve this, a lot of questions had to be answered. How do we distribute a mono-threaded gradient descent, into several mappers and reducers? How do we run an existing C# codebase on a Hadoop Linux cluster? How do we keep the reliability of an architecture developed and tuned over the years, when we apply such a big bang?

Our Prediction and Scalability teams worked hand in hand in order to answer those concerns. Our data scientists showed us how we could distribute our learnings. Technical surveillance provided tools that would fit our technology. Reliability has been handled as always, thanks to our engineering culture.

This has been one of our major work last year. It took time and effort. But the outcome met the expectations since we’ve been able to increase the size of our training set, and at the same time nearly doubling the number of trained algorithms. However, I won’t have time to talk much more about it: now there are still a lot of improvements and new technologies I want to test!

If you are curious about what we do and want to join us, have a look at our tech blog and drop us a line at  r&drecruitment@criteo.com!!!

-By Guillaume Turri, Software Developer, R&D, Criteo

Follow @DataconomyMedia

(Image credit: Diana Robinson, via Flickr)

Tags: Apache HadoopcCriteoData MigrationHadoopWeekly Newsletter

Related Posts

What is the Nvidia Eye Contact AI feature? Learn how to get and use the new Nvidia Broadcast feature. Zoom meetings and streams get easier.

Nvidia Eye Contact AI can be the savior of your online meetings

January 30, 2023
How did ChatGPT passed an MBA exam

How did ChatGPT passed an MBA exam?

January 27, 2023
What is AI prompt engineering? Learn how to write a prompt with examples. ChatGPT prompt engineering and more explained in this article.

AI prompt engineering is the key to limitless worlds

January 27, 2023
What is Analytics as a Service (AaaS): Examples

Transform your data into a competitive advantage with AaaS

January 26, 2023
Google code red: ChatGPT and You.com like AI-powered tools threatening the search engine. Moreover, latest Apple Search rumors increased the danger.

Google code red: ChatGPT, You.com and rumors of Apple Search challenge the dominance of search giant

January 26, 2023
Tome AI offers a new way to create presentations easily

Tome AI offers a new way to create presentations easily

January 25, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Fostering a culture of innovation through digital maturity

Nvidia Eye Contact AI can be the savior of your online meetings

How did ChatGPT passed an MBA exam?

AI prompt engineering is the key to limitless worlds

Transform your data into a competitive advantage with AaaS

Google code red: ChatGPT, You.com and rumors of Apple Search challenge the dominance of search giant

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.