Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

OpenAI used YouTube videos to train AI, report claims

byKerem Gülen
April 8, 2024
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

The ambiguous legal territory of AI development, where YouTube videos transform into machine learning fuel.

In a recent piece by The New York Times, various strategies businesses have adopted to navigate the ambiguous territory of AI copyright law were highlighted. The report began with a focus on OpenAI, which, in its quest for adequate training data, is said to have created the Whisper audio transcription model. This initiative allowed for the transcription of more than a million hours of YouTube content to advance the development of GPT-4, their latest and most sophisticated large language model.

And… YouTube demands answers on Sora’s training data.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Did OpenAI really use YouTube videos to train Sora?

According to The New York Times, OpenAI was aware of the potential legal challenges but justified the action as fair use. Greg Brockman, the president of OpenAI, played a key role in the acquisition of video content for this purpose, as noted by the Times.

The article further mentioned that by 2021, the organization had depleted its resources of beneficial data, leading to considerations of transcribing YouTube clips, podcasts, and audiobooks after exhausting other avenues. By this stage, the training for its models had already incorporated data from sources like Github’s computer code, databases of chess strategies, and educational materials from Quizlet.

Matt Bryant, a spokesperson for Google, communicated to The Verge via email that there have been “seen unconfirmed reports” regarding OpenAI’s actions. He reminded that Google’s robots.txt files and Terms of Service clearly forbid the unauthorized collection or download of YouTube materials, reaffirming the company’s usage policies. Similarly, Neal Mohan, the CEO of YouTube, voiced concerns this week about the alleged utilization of YouTube data for training OpenAI’s Sora, a video generation model. Bryant emphasized that Google enforces technical and legal measures to curb such unauthorized activities, provided there’s a solid legal or technical ground for intervention.

OpenAI used YouTube videos to train AI, report claims
OpenAI’s journey has been marked by breakthroughs, but also by legal and ethical gray areas (Image credit)

According to sources cited by the Times, Google also extracted transcripts from YouTube videos. Bryant mentioned that the company has utilized some YouTube content for training its models, adhering to agreements with YouTube creators.

The Times reported that Google’s legal department advised its privacy team to modify the wording of its policy to broaden the scope of consumer data usage, including services like Google Docs. It’s noted that the updated policy was strategically released on July 1st, aiming to leverage the distraction caused by the Independence Day holiday weekend.

Similarly, Meta faced challenges with accessing adequate training data, and the Times obtained recordings where its AI team discussed the unauthorized use of copyrighted material in an effort to keep pace with OpenAI.

Google, OpenAI, and others in the AI development sphere are dealing with the diminishing availability of quality training data for their models, which improve with increased data consumption.

OpenAI’s journey has been marked by breakthroughs, but also by legal and ethical gray areas. The YouTube transcription controversy underscores the complexities of copyright when training advanced AI models. As tools like Sora enter Hollywood, the company faces even tougher scrutiny. Can Altman navigate these hurdles or is he already replaced?


Featured image credit: Andrew Neel/Unsplash

Tags: FeaturedopenAIyoutube

Related Posts

Android Halo will place AI agent updates in status bar

Android Halo will place AI agent updates in status bar

July 2, 2026
WhatsApp usernames spark impersonation and fraud concerns

WhatsApp usernames spark impersonation and fraud concerns

July 2, 2026
Apple reportedly plans entry-level MacBook Pro redesign for 2027

Apple reportedly plans entry-level MacBook Pro redesign for 2027

July 2, 2026
X launches Live Studio with new creator payouts

X launches Live Studio with new creator payouts

July 2, 2026
Sony will end physical PlayStation game discs in 2028

Sony will end physical PlayStation game discs in 2028

July 2, 2026
Microsoft explores disc-to-digital support for Xbox games

Microsoft explores disc-to-digital support for Xbox games

July 2, 2026

LATEST NEWS

Android Halo will place AI agent updates in status bar

WhatsApp usernames spark impersonation and fraud concerns

Apple reportedly plans entry-level MacBook Pro redesign for 2027

X launches Live Studio with new creator payouts

Sony will end physical PlayStation game discs in 2028

Microsoft explores disc-to-digital support for Xbox games

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Copyleaks – Plagiarism detector

Clipping Magic

KoalaChat

SpeechText

Booknotes

Unscrambler

LingoLooper

Politepost

Evolup

Wondercraft

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.