Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Criteo’s Prediction on Hadoop: How and Why it Came About

byGuillaume Turri
February 24, 2015
in Articles
Home Resources Articles
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

At Criteo we display online advertisement, and we sell clicks to our clients. So we have to predict, for each of our 2 billion daily banners, whether it will likely be clicked or not. That’s why we use machine learning, and we feed the algorithms of our well-oiled engine with big data.

But it has not always been like this. Actually, the shape of our prediction engine –and its underlying architecture- had to evolve as our business grew.

At first, when our dataset fitted in a SQL table, there was no need to invoke the name of Hadoop. At that time, we implemented regression tree algorithms in C#. Single process learnings took place on a single server. And we were happy.

An issue with regression tree, is that their size can explode exponentially as you add dimensions. Before our algorithm reached those limits, we improved it. First we used Bayesian networks for a while. Then we implemented generalized linear models. This change increased our performance a lot. And we were proud.

But then, as the needs of the business increased, we had to add another server. And another one. And… and we were worried that our architecture would reach its limits in a near future.

Migrating our existing solution on our Hadoop cluster seemed a healthy way to go. It was far from trivial, but it was definitely where we wanted to be. To achieve this, a lot of questions had to be answered. How do we distribute a mono-threaded gradient descent, into several mappers and reducers? How do we run an existing C# codebase on a Hadoop Linux cluster? How do we keep the reliability of an architecture developed and tuned over the years, when we apply such a big bang?

Our Prediction and Scalability teams worked hand in hand in order to answer those concerns. Our data scientists showed us how we could distribute our learnings. Technical surveillance provided tools that would fit our technology. Reliability has been handled as always, thanks to our engineering culture.

This has been one of our major work last year. It took time and effort. But the outcome met the expectations since we’ve been able to increase the size of our training set, and at the same time nearly doubling the number of trained algorithms. However, I won’t have time to talk much more about it: now there are still a lot of improvements and new technologies I want to test!

If you are curious about what we do and want to join us, have a look at our tech blog and drop us a line at  r&[email protected]!!!

-By Guillaume Turri, Software Developer, R&D, Criteo

Follow @DataconomyMedia

(Image credit: Diana Robinson, via Flickr)

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Tags: Apache HadoopcCriteoData MigrationHadoopsurveillanceWeekly Newsletter

Related Posts

What 53,000 Churches Reveal About the Digital Transformation of Faith Communities

What 53,000 Churches Reveal About the Digital Transformation of Faith Communities

June 19, 2026
Xenco Medical wins back-to-back honors with Fast Company’s 2026 World Changing Ideas Award and Time Magazine 2026 Impact Award

Xenco Medical wins back-to-back honors with Fast Company’s 2026 World Changing Ideas Award and Time Magazine 2026 Impact Award

June 17, 2026
Data Sovereignty and Document Security: Where Does the Data Actually Live?

Data Sovereignty and Document Security: Where Does the Data Actually Live?

June 15, 2026
How Public Web Data Can Strengthen Environmental Protection

How Public Web Data Can Strengthen Environmental Protection

June 10, 2026
How automation tools are being integrated into professional networking

How automation tools are being integrated into professional networking

May 31, 2026
Autonomous agentic UI orchestration for high-throughput enterprise ecosystems

Autonomous agentic UI orchestration for high-throughput enterprise ecosystems

May 31, 2026
Please login to join discussion

LATEST NEWS

PlayStation 6 leak points to 2027 release window

Samsung unveils UFS 5.0 storage for future Galaxy phones

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Instagram for TV launches on Samsung TVs in the US

Valve opens Steam Machine reservations starting at $1,049

Apple releases iOS 27 beta 2 with new “Write with Siri” feature

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.