Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

UFO2 turns your desktop into an agent playground

One of UFO2’s key technical innovations is its hybrid action model. Instead of just clicking buttons like a human, each AppAgent can call real APIs when available. This means tasks like exporting a spreadsheet or formatting text are reduced from multi-step GUI dances to a single, atomic function call.

byKerem Gülen
April 22, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

What if automating a desktop wasn’t about scripting click patterns, but about giving your operating system an intelligent team of agents? That’s the core idea behind UFO2, Microsoft’s newest open-source system that pushes beyond current Computer-Using Agents (CUAs) and reinvents automation as a first-class OS abstraction. It turns your desktop into an intelligent control panel where language-driven tasks are executed natively, reliably, and with minimal disruption to your workflow.

Traditional desktop automation tools like RPA systems have always struggled with robustness. A minor change in a UI can wreck an entire script. CUAs tried to address this with large language models and screenshot analysis, but they remained limited by shallow system integration and clunky user experiences. UFO2 flips this model by building from the OS upward. It introduces a multiagent architecture where a central HostAgent coordinates specialized AppAgents for different applications. Each agent speaks the native language of the app via APIs and UI metadata, not just pixels.

UFO2 turns your desktop into an agent playground
A comparison of (a) existing CUAs and (b) desktop AgentOS UFO2 (Image)

One of UFO2’s key technical innovations is its hybrid action model. Instead of just clicking buttons like a human, each AppAgent can call real APIs when available. This means tasks like exporting a spreadsheet or formatting text are reduced from multi-step GUI dances to a single, atomic function call. The system also speculates ahead—using a single LLM call to plan multiple steps and validating each one live with Windows UI data. This speculative multi-action execution dramatically cuts down on latency without risking correctness.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Isolation without interruption

CUAs typically hijack your desktop, locking the mouse and keyboard during execution. UFO2’s Picture-in-Picture (PiP) mode solves this with a virtual desktop window that runs automation tasks in parallel. The agent does its thing in a sandboxed environment, while you continue working in the main session. It’s seamless, secure, and uses native Windows RDP loopback to maintain session integrity.

UFO2 turns your desktop into an agent playground_02
An overview of the architecture of UFO2 (Image)

UFO2 integrates help documentation and execution logs into a retrieval-augmented memory, enriching its prompts with procedural knowledge. Over time, this creates a self-improving agent that gets better at new tasks without retraining. Each AppAgent pulls from documentation, patch notes, and prior runs to make smarter decisions. It is an automation system with memory, not just response generation.

In head-to-head benchmarks against OpenAI’s Operator and other top CUAs, UFO2 consistently outperforms. On the OSWorld-W benchmark, UFO2 reaches a 32.7% success rate using the o1 model—more than doubling Operator’s 14.3%. Its speculative planning reduces action steps by up to 50%. Hybrid control detection (combining UIA APIs and vision parsing) recovers over 25% of previously failed interactions. Simply put, UFO2 isn’t just smarter—it’s systemically better.

Everything is an agent now

Extensibility is baked in. UFO2 allows third-party tools, including other CUAs like Operator, to be wrapped as AppAgents. This means you can integrate specialized copilots or proprietary automation backends into the UFO2 ecosystem without retraining or rewriting code. It also supports a client-server architecture for enterprise deployment, keeping orchestration centralized and user devices light.

The paper outlines future goals, including cross-platform compatibility with macOS and Linux via analogous accessibility APIs, faster response via smaller LLMs, and improved reasoning from dedicated GUI-interaction datasets. But even in its current state, UFO2 represents a new baseline for desktop automation. It is open-source, already outperforming commercial systems, and brings a new level of modularity, reliability, and intelligence to human-computer interaction.

For anyone building the next generation of intelligent agents—or just tired of brittle scripts—UFO2 is available on GitHub along with its documentation.


Featured image credit

Tags: AIMicrosoft

Related Posts

63% of new AI models are now based on Chinese tech

63% of new AI models are now based on Chinese tech

January 12, 2026
Nature study projects 2B wearable health devices by 2050

Nature study projects 2B wearable health devices by 2050

January 7, 2026
DeepSeek introduces Manifold-Constrained Hyper-Connections for R2

DeepSeek introduces Manifold-Constrained Hyper-Connections for R2

January 6, 2026
Imperial College London develops AI to accelerate cardiac drug discovery

Imperial College London develops AI to accelerate cardiac drug discovery

January 5, 2026
DarkSpectre malware infects 8.8 million users via browser extensions

DarkSpectre malware infects 8.8 million users via browser extensions

January 2, 2026
CMU researchers develop self-moving objects powered by AI

CMU researchers develop self-moving objects powered by AI

December 31, 2025

LATEST NEWS

63% of new AI models are now based on Chinese tech

Nvidia CEO Jensen Huang slams “doomsday” AI narratives

FCC authorizes 7,500 more Starlink satellites for SpaceX

Musk vows to open source X algorithm in 7 days amid EU scrutiny

Google launches Universal Commerce Protocol to let AI shop for you

Google Cloud launches Gemini Enterprise shopping agents

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.