Security researchers at Radware have demonstrated how they tricked OpenAI’s ChatGPT into extracting sensitive data from a user’s Gmail inbox using a vulnerability they call “Shadow Leak.”
The attack, which was revealed this week, used a technique called prompt injection to manipulate an AI agent named Deep Research that had been granted access to the user’s emails. The entire attack took place on OpenAI’s cloud infrastructure, bypassing traditional cybersecurity defenses. OpenAI patched the vulnerability after Radware reported it in June.
How the Shadow Leak attack works
The experiment targeted AI agents, which are designed to perform tasks autonomously on a user’s behalf, such as accessing personal accounts like email. In this case, the Deep Research agent, which is embedded in ChatGPT, was given permission to interact with a user’s Gmail account.
The researchers crafted an email containing malicious instructions hidden as invisible white text on a white background. This email was then sent to the target’s Gmail inbox. The hidden commands remained dormant until the user activated the Deep Research agent for a routine task. When the agent scanned the inbox, it encountered the prompt injection and followed the attacker’s instructions instead of the user’s. The agent then proceeded to search the inbox for sensitive information, such as HR-related emails and personal details, and sent that data to the researchers without the user’s knowledge.
The researchers described the process of developing the attack as “a rollercoaster of failed attempts, frustrating roadblocks, and, finally, a breakthrough.”
A cloud-based attack that bypasses traditional security
A key aspect of the Shadow Leak attack is that it operates entirely on OpenAI’s cloud infrastructure, not on the user’s local device. This makes it undetectable by conventional cybersecurity tools like antivirus software, which monitor a user’s computer or phone for malicious activity. By leveraging the AI’s own infrastructure, the attack can proceed without leaving any trace on the user’s end.
Potential for a wider range of attacks
Radware’s proof-of-concept also identified potential risks for other services that integrate with the Deep Research agent. The researchers stated that the same prompt injection technique could be used to target connections to Outlook, GitHub, Google Drive, and Dropbox.
“The same technique can be applied to these additional connectors to exfiltrate highly sensitive business data such as contracts, meeting notes or customer records.”
Prompt injection is a known vulnerability that has been used in various real-world attacks, from manipulating academic peer reviews to taking control of smart home devices. OpenAI has since patched the specific flaw that enabled the Shadow Leak attack, but the research highlights the ongoing security challenges posed by the increasing autonomy of AI agents.