Google has shared early results from its research on CodeMender, a new AI-powered agent designed to improve code security automatically. The agent can both reactively patch new vulnerabilities and proactively rewrite existing code to eliminate entire classes of security flaws.
As AI-powered tools become more effective at discovering software vulnerabilities, the volume of identified issues is expected to exceed the capacity of human developers to fix them. CodeMender is designed to address this problem by automating the creation and application of high-quality security patches. Over the past six months, the project has already submitted 72 security fixes to open-source projects, some with codebases as large as 4.5 million lines of code.
How Google CodeMender functions
CodeMender is an autonomous agent that uses Gemini Deep Think models to debug and fix complex vulnerabilities. It is equipped with a set of tools that allow it to reason about code before making changes and to automatically validate those changes to ensure they are correct and do not cause regressions.
The validation process is a critical component, designed to prevent costly mistakes. CodeMender only surfaces high-quality patches for human review that are confirmed to fix the root cause of an issue, are functionally correct, cause no regressions, and follow project style guidelines.
The system uses several techniques to achieve this:
- Advanced program analysis: CodeMender uses tools for static analysis, dynamic analysis, differential testing, and fuzzing to scrutinize code patterns and data flow. This allows it to better identify the root causes of security flaws.
- Multi-agent systems: The system employs specialized agents for specific tasks. For example, a large language model-based critique tool highlights the differences between original and modified code to verify that a proposed change does not introduce new problems, allowing the main agent to self-correct as needed.
Fixing active vulnerabilities
To patch a vulnerability effectively, CodeMender uses tools like a debugger and a source code browser to pinpoint the root cause before devising a solution.
In one example, a crash report indicated a heap buffer overflow, but the agent’s analysis determined the actual problem was an incorrect stack management of XML elements during parsing. While the final patch only changed a few lines of code, identifying the true root cause required complex reasoning. In another case, the agent created a non-trivial patch to fix a complex object lifetime issue by modifying a custom system for generating C code within the project.
Proactively securing existing code
CodeMender is also designed to proactively rewrite code to use more secure data structures and APIs. For instance, the agent was deployed to apply `-fbounds-safety` annotations to parts of the widely used libwebp image compression library. When applied, these annotations instruct the compiler to add bounds checks to the code, which can prevent buffer overflow vulnerabilities from being exploited.
A previously discovered heap buffer overflow in libwebp (CVE-2023-4863) was used as part of a zero-click iOS exploit. With the annotations added by CodeMender, that vulnerability and many others like it would have been rendered unexploitable. The agent can automatically correct new compilation errors or test failures that arise from its own annotations, and it uses an LLM-based tool to verify that its changes have not altered the code’s intended functionality.
Current status and future plans
All patches currently generated by CodeMender are reviewed by human researchers before being submitted to open-source projects. Google is gradually increasing the number of patches it submits to ensure high quality and to systematically address feedback from the open-source community.
The team plans to reach out to maintainers of critical open-source projects with CodeMender-generated patches. By iterating on feedback from this process, the goal is to eventually release CodeMender as a tool that all software developers can use to help keep their codebases secure.