Researchers with the open-source AI platform Hugging Face have discovered that the carbon footprint of generative AI tools is substantially worse than previously estimated, particularly for those converting text prompts into video, due to non-linear energy scaling.
In a newly published paper, the researchers detailed how the energy demands of text-to-video generators increase exponentially rather than in direct proportion to the content’s length. The study established that when the duration of a generated video is doubled, its associated energy consumption quadruples. To illustrate this principle, the paper provides a specific example: producing a six-second video clip with AI requires four times as much energy as generating a three-second clip. “These findings highlight both the structural inefficiency of current video diffusion pipelines and the urgent need for efficiency-oriented design,” the researchers concluded in their paper.
This research emerges amid warnings from experts that generative AI technologies are being deployed without a complete understanding of their environmental consequences. A recent analysis by MIT Technology Review supports this concern, stating that “the common understanding of AI’s energy consumption is full of holes.” The gap in understanding is significant when comparing different types of generative tools. While creating a single 1,024 by 1,024 pixel image with an AI generator consumes energy equivalent to warming something in a microwave for five seconds, the requirements for video are orders of magnitude greater.
The Hugging Face study found that producing just a five-second video clip demands an amount of energy comparable to running a standard microwave for over an hour. This disparity underscores the intensive nature of video generation. The non-linear scaling means that as video clips become longer, the power consumption escalates at an even faster rate. According to the paper, this trajectory implies “rapidly increasing hardware and environmental costs” for users and developers of these technologies.
There are potential methods to mitigate these high energy demands. The researchers suggest several strategies, including the implementation of intelligent caching systems and the practice of reusing existing AI-generated content to avoid redundant processing. Another proposed technique is “pruning,” which involves methodically identifying and removing inefficient examples from the large datasets used to train AI models. This process could help streamline the models and reduce their operational energy footprint during generation tasks.
However, it remains uncertain whether these efficiency measures will be sufficient to make a meaningful impact on the overall electricity consumption of current AI systems. The scale of the issue is already substantial. According to data from one recent study, AI-related activities now represent 20 percent of the total power demand from all global datacenters. In response to growing AI demand, major technology companies are investing tens of billions of dollars into new infrastructure buildouts, a process that has led some to abandon previously stated climate objectives.
Google’s 2024 environmental impact report revealed the company is significantly behind its plan to achieve net-zero carbon emissions by 2030. The report disclosed a 13 percent increase in carbon emissions year-over-year, which it attributed in large part to its expansion of generative AI services. Earlier this year, Google released its Veo 3 AI video generator. The company later announced that users had created over 40 million videos with the tool within its first seven weeks of availability. The specific environmental toll of Veo 3 has not been disclosed.