- A Python vulnerability discovered in the tarfile module impacts hundreds of thousands of open source projects, posing supply chain security concerns, according to Trellix.
- CVE-2007-4559, the Python vulnerability, was discovered in 2007 and is still present in the tarfile module.
- The Trellix Advanced Research Center uncovered the route traversal attack vulnerability while investigating another vulnerability.
- The vulnerability CVE-2007-4559 affects around 350,000 open-source projects and an unknown number of closed-source projects, raising worries about software supply chain attacks.
According to Trellix, a Python vulnerability in the tarfile module found over 15 years ago still affects hundreds of thousands of open source projects today, increasing supply chain security issues. The flaw, CVE-2007-4559, was identified in 2007 and is still present in the module.
Possible effects of CVE-2007-4559 Python vulnerability
During a study into another vulnerability, the Trellix Advanced Research Center discovered the route traversal attack vulnerability. CVE-2007-4559 has an impact on around 350,000 open-source projects and an unknown number of closed-source projects, raising concerns about software supply chain assaults. According to NCC Group, assaults on global supply chain businesses surged by 51% between July and December 2021.
“When we talk about supply chain threats, we typically refer to cyber-attacks like the SolarWinds incident, however building on top of weak code foundations can have an equally severe impact,” said Christiaan Beek, Trellix’s Head of Adversarial & Vulnerability Research.
Aside from machine learning, automation apps, and docker containerization, AWS, Google, Intel, Facebook, and Netflix use Python’s susceptible tarfile module for specialized frameworks. Unless otherwise specified, the tarfile module is the default setup in any Python-based project.
“This vulnerability’s pervasiveness is furthered by industry tutorials and online materials propagating its incorrect usage. It’s critical for developers to be educated on all layers of the technology stack to properly prevent the reintroduction of past attack surfaces.”
What is CVE-2007-4559 Python vulnerability?
CVE-2007-4559 allows for the execution of arbitrary code. Although CVE-2007-4559’s CVSS score of 5.1 indicates that it is a medium severity vulnerability, Trellix claims that its attack is quite simple and may be exploited with as few as six lines of code.
The Python tarfile module allows developers to read and write tar archives, which are a UNIX-based tool for packaging uncompressed or compressed (using gzip, bzip2, etc.) files together for backup or distribution.
Because of a few “un-sanitized” lines of code in tarfile, the 2007 path traversal vulnerability still exists. The tarfile.extract() and tarfile.extractall() routines are written without any safeguards to sanitize or examine the path sent to them for file extraction from tar archives.
Microsoft disclosed a TikTok vulnerability deemed “high-severity”
So when a user passes a TarInfo object while calling these extract functions, it causes directory traversal. In other words, it extracts files from a source specified to it without performing the appropriate safety check.
“This vulnerability is incredibly easy to exploit, requiring little to no knowledge about complicated security topics. Due to this fact and the prevalence of the vulnerability in the wild, Python’s tarfile module has become a massive supply chain issue threatening infrastructure around the world,” said Kasimir Schulz, Trellix Threat Labs vulnerability researcher.
Charles Mcfarland, vulnerability researcher in Trellix’s Advanced Threat Research team noted, “Not only has this vulnerability been known for over a decade, the official Python docs explicitly warn to ‘Never extract archives from untrusted sources without prior inspection’ due to the directory traversal issue.”
Vulnerable projects that are on GitHub
There are 588,840 distinct projects/repositories on GitHub that have ‘import tarfile’ in their python code. However, 61% of these repositories did not complete tarfile member cleaning before being processed, bringing the total number of susceptible repositories to 350,000.
Trellix further noted that because machine learning technologies such as GitHub CoPilot are trained on unsecured GitHub projects, they “are learning to do things insecurely. Not from any fault of the tool but from the fact that it learned from everyone else.”
Trellix’s research of CVE-2007-4559 Python vulnerability affected project domains showed the following:
It should be emphasized that Trellix’s vulnerability research is confined to GitHub. As a result, the 15-year-old vulnerability is likely to harm other projects as well. Hundreds of suppliers can contribute applications, independent code, software, libraries, and other dependencies in the software supply chain.
When vulnerable dependencies, such as the tarfile module, are integrated with third-party providers, service providers, contractors, resellers, and so on, the attack surface of everyone in the chain is expanded while the security fabric of even those with appropriate security hygiene practices is weakened.
Douglas McKee, principal engineer and director of vulnerability research for Trellix Threat Labs, asks “While we can’t provide as detailed an analysis [of closed-source projects] as we can with open-source projects, it is fair to expect the trend to be similar. What if 61% of all projects — open- and closed-source — could be exploited due to this vulnerability?”
Alleged cybersecurity issues of Twitter are causing a headache for the firm
Also adding, “To do our part, Trellix is releasing a script which can be used to scan one or multiple code repositories looking for the presence and likelihood of exploitation for CVE-2007-4559. Additionally, we are working on automating submissions of pull requests to open-source projects which can be confirmed to be exploitable.”
Trellix supports automated bulk repository forking, cloning, code analysis, patching, commits, and pull requests. The company’s patches for 11,005 repositories are ready for pull requests. Trellix is working on fixes for further projects.
“The number of vulnerable repositories we found begs the question, which other N-day vulnerabilities are lurking around in OSS, undetected or ignored for years? If this tarfile vulnerability is any indicator, we are woefully behind and need to increase our efforts to ensure OSS [open source software] is secure,” McFarland added.
Refer to Trellix’s GitHub docs to see if your project/repository is susceptible to CVE-2007-4559 Python vulnerability.