Security researcher Johann Rehberger has exposed a serious vulnerability in ChatGPT that could permit attackers to record incorrect data alongside pernicious instructions in a user’s settings for long-term memory. After reporting the flaw to OpenAI, Rehberger noticed that the company initially dismissed it as a safety matter rather than a security concern. After Rehberger showed a proof-of-concept (PoC) exploit that used the vulnerability to permanently exfiltrate all user input, engineers at OpenAI became aware and released a partial fix earlier this month.
Exploiting long-term memory
According to Arstechnica, Rehberger found that you can alter ChatGPT’s long-term memory using indirect prompt injection. This method permits attackers to embed false memories or directions into untrusted material such as uploaded emails, blog entries, or documents.
Rehberger’s PoC demonstrated that tricking ChatGPT into opening a malicious web link allowed the attacker full control over capturing and dispatching all subsequent user input and ChatGPT responses to a server they controlled. Rehberger demonstrated how the exploit might cause ChatGPT to keep false information, including believing a user was 102 years old and lived in the Matrix, affecting all future discussions.
OpenAI’s reply and continuing risks
OpenAI initially responded to Rehberger’s report by closing it, classifying the vulnerability as a safety matter rather than a security problem. After sharing the PoC, the company released a patch to prevent the exploit from functioning as an exfiltration vector. Even so, Rehberger pointed out that the fundamental issue of prompt injections remains unsolved. While the explicit strategy for data theft was confronted, manipulative actors could still influence the memory instrument to incorporate fabricated data into a user’s long-term memory settings.
Rehberger noted in the video demonstration, “What’s particularly intriguing is that this exploit persists in memory. The prompt injection successfully integrated memory into ChatGPT’s long-term storage, and even when beginning a new chat, it doesn’t stop exfiltrating data.
Thanks to the API rolled out last year by OpenAI, this specific attack method is not feasible through the ChatGPT web interface.
How to protect yourself from ChatGPT (or LLM) memory exploits?
Those using LLM who want to keep their exchanges with ChatGPT secure are encouraged to look out for updates to the memory system during their sessions. End users must repeatedly check and attend to archived memories for suspicious content. Users have guidance from OpenAI on managing these memory settings, and they can additionally decide to turn off the memory function to eliminate these possible risks.
Due to ChatGPT’s memory capabilities, users can help protect their data from possible exploits by keeping their guard up and taking measures beforehand.