OpenAI Report Reveals It Scans Chats To Stop Spies And Criminals

OpenAI released a report on a recent Tuesday detailing its disruption of over 40 malicious networks that violated its usage policies, a result of monitoring ChatGPT conversations with automated tools and human reviewers to prevent system misuse.

While users of AI chatbots, including ChatGPT, can enable privacy settings to prevent their conversations from being used to train future AI models, this does not preclude all monitoring. AI firms like OpenAI scrutinize chats to maintain user safety and platform integrity. This process involves the deployment of automated systems and the use of human reviewers. The stated objective is to prevent the misuse of ChatGPT for activities that could cause harm, such as the creation of malware or the development of tools for mass surveillance and other security threats. Users have the option to adjust privacy configurations to keep personal or work-related data from being absorbed into the data pool used for training subsequent AI versions.

The company’s report detailed the disruption and reporting of more than 40 networks found to be in violation of its usage policies. The malicious actors identified as attempting to misuse ChatGPT included a range of entities. OpenAI categorized these as “authoritarian regimes to control populations or coerce other states, as well as abuses like scams, malicious cyber activity, and covert influence operations.” The findings from OpenAI indicate that these threat actors are primarily using artificial intelligence to accelerate and refine existing tactics. The company noted that these actors are using AI to improve “old playbooks to move faster,” but are not necessarily acquiring fundamentally new capabilities directly from their use of ChatGPT.

In addition to countering malicious state and criminal actors, OpenAI also monitors conversations for indications of self-harm to assist users who may be in distress. This focus on individual user safety has been described as a key priority, an emphasis that was heightened following reports of a teen’s death by suicide, which had been linked to interactions with ChatGPT. In connection with this safety focus, OpenAI has also introduced parental controls for the ChatGPT service in recent weeks.

OpenAI’s report does not offer a detailed explanation of the specific mechanisms involved in flagging potential abuse. The company acknowledged the existence of a gray area, where certain activities could be interpreted as either benign or malicious depending on the context. Examples of such ambiguous activities include “prompts and generations that could, depending on their context, indicate either innocuous activities or abuse, such as translating texts, modifying code, or creating a website.” To navigate this complexity, the company stated it employs a “nuanced and informed approach that focuses on patterns of threat actor behavior rather than isolated model interactions.” This method is designed to identify genuine threats without disrupting legitimate usage for the broader user base.

Reporting from Gizmodo provided specific examples of high-level threats that OpenAI identified and disrupted. One such case involved an organized crime network, believed to be operating from Cambodia, which attempted to use ChatGPT to streamline its illicit operations. Another instance involved a Russian political influence operation that used the AI to generate prompts intended for use with third-party video-generation AI models. OpenAI also stopped accounts associated with the Chinese government that were seeking assistance with designing systems for monitoring conversations on social media platforms.

Further details on disrupted activities were provided by Reuters. According to its reporting, OpenAI banned a set of Chinese-language accounts that were seeking assistance with phishing and malware campaigns. These accounts also sought help with automations that could be executed via DeepSeek, another AI tool. Separately, accounts linked to Russian criminal groups attempting to develop malware with ChatGPT’s assistance were also stopped. In a similar action, Korean-speaking users who were found to be using the platform to facilitate phishing campaigns were banned from the service.

The October report from OpenAI concentrated exclusively on malicious activities and covert operations, and did not include data or details regarding conversations about self-harm. However, the company has made separate announcements related to this issue. In a recent statement on the social media platform X, OpenAI said it had updated a model named GPT-5 Instant to “better recognize and support people in moments of distress.” The company explained that sensitive portions of conversations are now routed to this specialized model, which is designed to provide more helpful responses. ChatGPT will now also inform users about which specific model is being used during their conversation.

We’re updating GPT-5 Instant to better recognize and support people in moments of distress.

Sensitive parts of conversations will now route to GPT-5 Instant to quickly provide even more helpful responses. ChatGPT will continue to tell users what model is active when asked.…

— OpenAI (@OpenAI) October 3, 2025

This update is part of a broader initiative to improve user safety. In late August, OpenAI announced that ChatGPT had been trained not to respond directly to prompts that mention intentions of self-harm. Instead of fulfilling such requests, the AI is programmed to respond with empathy and to direct users toward professional help, providing information for suicide prevention services and crisis hotlines. For situations where the AI detects a potential risk of physical harm to other individuals, a different protocol is activated. These conversations are routed to specialized systems that may involve human review and can lead to an escalation with law enforcement agencies if deemed necessary.

Featured image credit