Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Infographic: The 4 Types of Data Science Problems Companies Face

byFlorian Douetteau
February 9, 2017
in Articles, Resources

There’s a part of data science that you rarely hear about: the deployment and production of data flows.  Everybody talks about how to build models, but little time is spent discussing the difficulties of actually using those models. Yet these production issues are the reason many companies fail to see value come from their data science efforts and investments.

The data science process is extensively covered by resources all over the web and known by everyone. A data scientist connects to data, splits it or merges it, cleans it, builds features, trains a model, deploys it to assess performance, and iterates until they’re happy with it. That’s not the end of the story though. Next, you need to try the model on real data and enter the production environment.

These two environments are inherently different because the production environment is continuously running – and potentially impacting existing internal or external systems. Data is constantly coming in, being processed and computed into KPIs, and going through models that are retrained frequently. These systems, more often than not, are written in different languages than the data science environment.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

To better understand the challenges companies face when taking data science from prototype to production,  Dataiku recently asked thousands of companies around the world how they do it. The results show that companies using data science have unique challenges that fall into four different profiles that they’ve coined as follows: Small Data Teams, Packagers, Industrialisation Maniacs, and The Big Data Lab.

infographic-production-survey

Small Data Teams (23%)

Small Data Teams Focus on building small projects fast: standard machine learning packages with a unique server and technical environment for all analytics projects.

> 3/4 Do either Marketing or reporting.

> 61% Report having custom machine learning as part of their business model.

> 83% Use either SQL or Enterprise Analytics databases.

These teams, as their name indicate, use mostly small data and have a unique design /production environment. They deploy small continuous iterations and have little to no rollback strategy. They often don’t retrain models and use simple batch production deployment, with few packages. Business teams are fairly involved throughout the data project design and deployment.

Average level of difficulty of deployment: 6.4

Packagers (27%)

Packagers Focus on Building a Framework (the software development approach): independent teams that build their own framework for a comprehensive understanding of the project.

> 48% have set-up Advanced Reporting.

> 52% of respondents mix storage technologies.

> 63% use SQL and open source.

These teams have a software development approach to data science and have often built their framework from scratch. They develop ad-hoc packaging and practice informal A/B testing. They use Git intensely to understand the globality of their projects and their dependencies, and they are particularly interested in IT environment consistency. They tend to have a multilanguage environment and are often disconnected from business teams.

Average level of difficulty in deployment: 6.4

Industrialisation Maniacs (18%)

Industrialisation Maniacs Focus on Versioning and Auditing: IT-driven teams that think in terms of frequent deployment and constant logging to track all changes and dependencies.

> 61% have Logistics, Security, or Industry Specific use cases

> 30% have deployed Advanced Reporting (vs 50% of all respondents)

> 72 % use NoSQL and Cloud.

These data teams are mostly IT-led and don’t have a distinct production environment. They have complex automated processes in place for deployment and maintenance. They log all data access and modification and have a philosophy of keeping track of everything. In these setups, business teams are notably not involved in the data science process and monitoring.

Average level of difficulty in deployment: 6.9

The Big Data Lab (30%)

The Big Data Lab Focus’ on Governance and Project Management: Mature teams with a global deployment strategy, rollback processes, and preoccupation with governance principles and integration within the company.

>66% of companies have multiple use cases in place.

> 50% do advanced Social Media Analytics (vs 22% of global respondents).

> 53% use Hadoop and two thirds of them only use Hadoop.

These teams are very mature with more complex use cases and technologies. They used advanced techniques such as PMML, multivariate testing (or at least formal A/B testing), have automated procedures to backtest, and robust strategies to audit IT environment consistency. In these larger, more organized teams, business users are extremely involved before and after the deployment of the data product.

Average level of difficulty in deployment: 5.6.

Overall, the main reported barrier to production for all groups (50% of respondents) is data quality and pipeline development issues.  In terms of the overall difficulty of data science production, the average reported difficulty of deploying a data project into production is 6.18 out of ten, and 50% of respondents’ state that on a scale of 1 to 10, the level of difficulty involved in getting a data product in production is between six and 10.

Considering the results, these are a few principles that companies should keep in mind on how to build production-ready data science products:

  1. Getting started is tough. Working with small data on SQL databases does not mean it’s going to be easier to deploy into production.
  2. Multi language environments are not harder to maintain in production, as long as you have an IT environment consistency process. So mix’n’match!
  3. Real-time scoring and online machine learning are likely to make your production pie more complex. Think about whether the improvement to your project is worth the hassle.
  4. Working with business users, both while designing your machine learning project and after when monitoring it day to day, will increase your efficiency. Collaborate!

 

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Tags: Big Datadata sciencesurveillance

Related Posts

From imaging to staffing: 5 ways AI is changing healthcare

From imaging to staffing: 5 ways AI is changing healthcare

October 5, 2025
AI agents are here—Make them your media-buying back office

AI agents are here—Make them your media-buying back office

October 5, 2025
DOT Miners Combine XRP and DeFi to Earn with a Passive Income Model, Bringing Crypto to a Head

DOT Miners Combine XRP and DeFi to Earn with a Passive Income Model, Bringing Crypto to a Head

September 24, 2025
The new social commons of the Internet

The new social commons of the Internet

September 22, 2025
Best ELD devices and fleet management tools 2025: Top picks for trucking companies

Best ELD devices and fleet management tools 2025: Top picks for trucking companies

September 18, 2025
Zen Media and Optimum7 Merge to Create AI-Native Growth Agency: Why Data Is at the Core

Zen Media and Optimum7 Merge to Create AI-Native Growth Agency: Why Data Is at the Core

September 18, 2025
Please login to join discussion

LATEST NEWS

Could CTEM have prevented the Oracle Cloud breach?

ChatGPT reportedly reduces reliance on Reddit as a data source

Perplexity makes Comet AI browser free, launches background assistant and Chess.com partnership

Light-powered chip makes AI computation 100 times more efficient

Free and effective anti-robocall tools are now available

Choosing the right Web3 server: OVHcloud options for startups to enterprises

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.