Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

AI is learning to drive like a human—by watching you panic

Proxy Value Propagation (PVP) is a new method that turns AI training into something far more human

byKerem Gülen
February 6, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Self-driving cars are supposed to be the future. AI is supposed to take the wheel, navigate flawlessly, and eliminate human error. Yet here we are, still gripping our steering wheels while AI stumbles through simulations, making mistakes that range from hilariously bad to downright dangerous.

Why? Because AI learns through trial and error—the digital equivalent of throwing darts in the dark until it finally hits the bullseye. That’s fine when the stakes are low, like playing chess or optimizing ads. But when it comes to real-world applications—where a mistake means plowing into a pedestrian—this approach falls apart.

According to a study conducted by Zhenghao Peng, Wenjie Mo, Chenda Duan, and Bolei Zhou from the University of California, Los Angeles (UCLA), along with Quanyi Li from the University of Edinburgh, AI training can be dramatically improved using Proxy Value Propagation (PVP). Their research, titled Learning from Active Human Involvement through Proxy Value Propagation, challenges traditional reinforcement learning by proving that active human intervention leads to faster, safer, and more efficient AI training.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Traditional Reinforcement Learning (RL), the standard way AI learns to make decisions, is painfully slow. It requires millions of attempts before an AI figures out what works. Worse, it assumes AI can understand human intent just by following a reward system—when in reality, reward systems often lead to bizarre, unintended behaviors. Think of an AI trained to win a race that figures out it can just drive in circles at the start line to rack up “distance traveled” points without ever finishing the course.

Clearly, AI needs a better teacher. And that teacher? You.

Let humans intervene in real time

Proxy Value Propagation (PVP) is a new method that turns AI training into something far more human. Instead of letting AI blunder through its mistakes for months, PVP lets humans step in, intervene, and show AI what to do in real time.

  • Imagine AI is learning to drive in a simulation, say, Grand Theft Auto V (GTA V).
  • The AI makes a terrible decision—say, running a red light straight into traffic.
  • Instead of watching the chaos unfold, a human takes control at that moment and corrects the AI’s action.
  • The system then labels the human’s decision as a “good” move and the AI’s previous mistake as a “bad” move.
  • Using a technique called value propagation, AI spreads this correction across similar situations, learning to avoid bad decisions without needing millions of attempts.

The result is surprising. AI learns much faster, with fewer mistakes, and—most importantly—it actually aligns with human expectations instead of blindly chasing reward points.


AI struggles with strategy: Study shows LLMs reveal too much in social deduction games


The numbers don’t lie: PVP works

The team behind PVP put it to the test in GTA V, CARLA (a driving simulator), and MiniGrid (a virtual maze navigation task). The results were stunning:

  • AI trained with PVP learned 10 times faster than traditional methods.
  • It required only 1,200 human interventions—compared to the 300,000 attempts AI typically needs in RL.
  • The success rate of PVP-trained AI in reaching destinations safely was 85%, compared to just 20-50% for previous methods.
  • AI made 75% fewer critical mistakes when trained with PVP versus traditional reinforcement learning.

In other words, AI actually started driving like a human—not just a robot programmed to maximize abstract rewards.

A win for AI—and for humans

PVP isn’t just better for AI. It also makes life easier for the people training it. Traditional AI training requires constant human oversight, hours of feedback, and a whole lot of patience. With PVP, AI needed 50% less human effort to train. Testers rated PVP-trained AI 4.8 out of 5 for accuracy, compared to just 3.0 for older methods. AI that followed PVP training caused significantly less stress for human trainers—because it didn’t constantly require corrections. For a technology that’s supposed to make our lives easier, that’s a huge step forward.

From GTA to the streets

PVP has already proven itself in virtual driving tests. The real question is: can it work in real-world applications?

The potential is massive. Instead of relying solely on pre-programmed rules, AI could learn directly from human intervention—making it safer, faster. AI-powered robots in warehouses, hospitals, or even homes could be trained in real time instead of through trial-and-error. Human doctors could intervene during AI-assisted surgeries or diagnoses, directly teaching the system what’s right or wrong.

Sometimes, the goal is just to make AI human enough—to act in ways we expect, to align with our values, and to avoid mistakes that put us at risk.


Featured image credit: Kerem Gülen/Midjourney

Tags: AIFeatured

Related Posts

Wireless charging uses about 40% more electricity

Wireless charging uses about 40% more electricity

June 25, 2026
European consumers may leave businesses using US tech providers

European consumers may leave businesses using US tech providers

June 24, 2026
Study links AI-assisted homework to lower exam scores

Study links AI-assisted homework to lower exam scores

June 22, 2026
Harvard and Boston Children’s use AI to revisit unsolved genetic cases

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

June 19, 2026
Adobe report finds 86% of creators now use generative AI in workflows

Adobe report finds 86% of creators now use generative AI in workflows

June 17, 2026
AI transfer learning speeds cosmology research but has hidden risks

AI transfer learning speeds cosmology research but has hidden risks

June 15, 2026

LATEST NEWS

OpenAI limits ChatGPT 5.6 access to government-approved users first

Apple to skip M6 Pro and Max chips and launch M7 in 2027

IBM unveils world’s first sub-1nm chip with new nanostack architecture

Apple raises prices across Macs, iPads and home devices

Nothing to launch entry-level Phone 4b on July 7

Xbox tests 15-character gamertags for Insider users

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

WatchMyCompetitor

TokkingHeads

Fellow.app

Octoparse

AnyToSpeech

Vrew

Fireflies

SpeedLegal

Teachable Machine

Unriddle

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.