AI Agents Can Be Controlled By Malicious Commands Hidden In Images

A 2025 study from the University of Oxford has revealed a security vulnerability in AI agents, which are expected to be widely used within two years. Unlike chatbots, these agents can take direct actions on a user’s computer, such as opening tabs or filling out forms. The research shows how attackers can embed invisible commands in images to take control of these agents.

How the image-based attack works

Researchers demonstrated that by making subtle changes to the pixels in an image—such as a desktop wallpaper, an online ad, or a social media post—they could embed malicious commands. While these alterations are invisible to the human eye, an AI agent can interpret them as instructions.

The study used a “Taylor Swift” wallpaper as an example. A single manipulated image could command a running AI agent to retweet the image on social media and then send the user’s passwords to an attacker. The attack only affects users who have an AI agent active on their computer.

Why are wallpapers an effective attack vector?

AI agents work by repeatedly taking screenshots of the user’s desktop to understand what is on the screen and identify elements to interact with. Because a desktop wallpaper is always present in these screenshots, it serves as a persistent delivery method for a malicious command. The researchers found that these hidden commands are also resistant to common image changes like resizing and compression.
Open-source AI models are especially vulnerable because attackers can study their code to learn how they process visual information. This allows them to design pixel patterns that the model will reliably interpret as a command.

The vulnerability allows attackers to string together multiple commands. An initial malicious image can instruct the agent to navigate to a website, which could host a second malicious image. This second image can then trigger another action, creating a sequence that allows for more complex attacks.

What can be done?

The researchers hope their findings will push developers to build security measures before AI agents become widespread. Potential defenses include retraining models to ignore these types of manipulated images or adding security layers that prevent agents from acting on on-screen content.

People are rushing to deploy the technology before its security is fully understood.

Yarin Gal, an Oxford professor and co-author of the study, expressed concern that the rapid deployment of agent technology is outpacing security research. The authors stated that even companies with closed-source models are not immune, as the attack exploits fundamental model behaviors that cannot be protected simply by keeping code private.

Featured image credit