Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

AI labs invest in RL environments for autonomous agents

OpenAI, Anthropic, and Meta are investing in simulated workspaces that teach AI agents to complete complex, multi-step tasks with feedback-driven training.

byAytun Çelebi
September 17, 2025
in Industry, Artificial Intelligence

Silicon Valley investors and major AI labs are making significant investments in reinforcement learning (RL) environments, which are simulated workspaces designed to train AI agents to use software autonomously.

While AI agents like OpenAI’s ChatGPT Agent have shown promise, they still struggle with complex, multi-step tasks. This new wave of investment is focused on creating sophisticated training grounds to overcome these limitations, moving beyond the static, labeled datasets that powered the last generation of AI.

How AI reinforcement learning environments work

RL environments are virtual training grounds where an AI agent can practice using software in a controlled setting. The agent receives feedback through a system of rewards and penalties, much like a game. For example, an agent tasked with buying socks on Amazon in a simulated Chrome browser would receive a positive reward for successfully completing the purchase. It would receive a penalty for errors like choosing the wrong item or failing to navigate a menu.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

These dynamic environments are far more complex to build than static datasets. They must account for a wide range of unpredictable agent actions and provide precise feedback to guide improvement. The concept builds on earlier AI research, such as the “RL Gyms” developed by OpenAI in 2016 and the simulated board used to train DeepMind’s AlphaGo. However, today’s environments are being applied to general-purpose transformer models to train them for open-ended tasks like web navigation and document editing.

A new ecosystem of startups is emerging to meet demand

Major AI labs like OpenAI, Anthropic, and Meta are building their own RL environments, but the complexity and scale of the task have created a demand for third-party specialists. This has fueled the growth of a new ecosystem of startups and prompted established data companies to pivot.

  • Mechanize Work, a new startup, is focusing on creating a small number of high-fidelity environments for tasks like AI coding. The company is reportedly working with Anthropic and is offering salaries up to $500,000 to attract top engineering talent.
  • Prime Intellect is targeting smaller developers with an open-source hub that it calls a “Hugging Face for RL environments.” The platform provides access to pre-built simulations and sells the computational resources needed to run them.
  • Surge, a data-labeling company that reported $1.2 billion in revenue last year, has created a new internal organization dedicated to building RL environments to meet rising demand from its clients.
  • Mercor is developing domain-specific environments for fields like coding, healthcare, and law, where agents can be trained on simulated software for tasks like reviewing patient records or legal contracts.
  • Scale AI, a former leader in data labeling, is also adapting by developing RL environments as it seeks to remain competitive after losing key contracts with Google and OpenAI.

Challenges and the path forward

Despite the heavy investment, including a reported plan from Anthropic to allocate over $1 billion to RL environments, significant challenges remain. Ross Taylor, a former AI research lead at Meta, pointed to the problem of “reward hacking,” where agents find loopholes to gain rewards without actually completing the intended task. OpenAI’s Sherwin Wu has noted a shortage of specialized startups capable of meeting the rapidly evolving needs of the top labs.

There is also a debate within the AI community about the most effective training methods.

Andrej Karpathy, an investor in Prime Intellect, shared a nuanced view on X.

“I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically.”

This perspective highlights the enthusiasm for using simulated environments while also acknowledging that the best way to extract intelligence from them is still an open question.

Nonetheless, these environments are seen as a critical component in developing the next generation of more capable and autonomous AI agents, powering recent breakthroughs like OpenAI’s o1 and Anthropic’s Claude Opus 4.


Featured image credit

Tags: AIFeatured

Related Posts

Supabase raises 0m series E, valuation at b

Supabase raises $100m series E, valuation at $5b

October 7, 2025
Young founder’s Supermemory raises .6M from Cloudflare and Google execs

Young founder’s Supermemory raises $2.6M from Cloudflare and Google execs

October 7, 2025
ChatGPT reaches 800m weekly active users

ChatGPT reaches 800m weekly active users

October 7, 2025
Claude Sonnet 4.5 flags its own AI safety tests

Claude Sonnet 4.5 flags its own AI safety tests

October 7, 2025
Ethical hackers invited: Google launches Gemini AI bug bounty

Ethical hackers invited: Google launches Gemini AI bug bounty

October 7, 2025
Evernote adds OpenAI-powered AI assistant

Evernote adds OpenAI-powered AI assistant

October 7, 2025

LATEST NEWS

Shinyhunters extorts Red Hat over stolen CER data

CPAP breach exposes data of 90k military members

Windows 11 test build blocks local account bypass

Excel gets AI agent mode for automated data tasks

What is new at iOS 26.1 beta 2?

ChatGPT reaches 800m weekly active users

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.