Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Amazon has a secret way to scrape Microsoft’s GitHub and feed its AI model

Pushing "artificial" boundaries

byEray Eliaçık
June 14, 2024
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Amazon needs vast quantities of high-quality data to create powerful AI models. Recognizing GitHub as a treasure trove of valuable coding metadata, Amazon has devised a strategy to expedite data collection despite platform limitations.

According to an internal memo obtained by Business Insider, Amazon’s Artificial General Intelligence (AGI) Group outlined its need for “quantitative and qualitative metadata from GitHub” to advance its AI training efforts. However, GitHub’s data scraping limits—allowing only 5,000 requests per hour per account—posed a significant obstacle. With over 150 million public repositories on GitHub, traditional methods would have taken years to accumulate sufficient data.

Amazon’s workaround

In response, Amazon proposed a workaround: encouraging its employees to create multiple GitHub accounts and share their access credentials. By leveraging a network of accounts simultaneously, Amazon aims to condense what would have been a multi-year endeavor into a matter of weeks. While Amazon’s actions may not strictly constitute theft in a legal sense, they do raise ethical concerns about data privacy, permission, and the appropriate use of platform resources.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Amazon has a secret way to scrape Microsoft's GitHub and feed its AI model
(Image credit)

The memo provides detailed instructions on how employees should create and manage these accounts to ensure compliance with legal and security guidelines. This includes using Amazon work emails, specific types of GitHub tokens, and setting appropriate permissions for data access.

Amazon claims that its approach has been approved by its legal and security teams. This suggests that Amazon is attempting to operate within legal boundaries by ensuring compliance with internal guidelines. However, the legality of such actions could still be questioned, especially if GitHub or affected users perceive them as violations.

The ethical implications are significant. By soliciting employees to share personal GitHub accounts, Amazon is potentially accessing data without explicit consent from GitHub or the repository owners.

Why does Amazon do this?

Amazon’s need for data from Microsoft’s GitHub is critical for advancing its artificial intelligence (AI) capabilities. AI models, like those used for understanding human language or making predictions, require large amounts of diverse data to learn effectively. GitHub, being a hub for millions of open-source software projects, provides a vast array of code and information that can train these AI algorithms.

Amazon has a secret way to scrape Microsoft's GitHub and feed its AI model
(Image credit)

Access to GitHub’s data isn’t just about lines of code. It includes valuable details like how projects evolve over time, who contributes, and how developers collaborate. This metadata is essential for AI models to learn patterns, improve their accuracy, and develop better ways to solve problems.

In the competitive world of tech giants, having comprehensive datasets can give companies like Amazon a significant edge. By leveraging GitHub data, Amazon aims to innovate faster, catch up with rivals, and create smarter technologies that can enhance everything from online shopping recommendations to cloud services.

For Amazon, AI isn’t just a buzzword—it’s integral to improving customer experiences, optimizing operations, and driving innovation across its business. By training AI models with GitHub data, Amazon can develop more intelligent systems capable of handling complex tasks and improving efficiency.

However, using data from platforms like GitHub raises ethical questions. Companies must navigate issues of user privacy, data ownership, and compliance with platform rules. Amazon’s approach, while approved internally, underscores the ongoing debate about how tech companies should responsibly use and protect digital information.

 


 

Featured image credit: Eray Eliaçık/Bing

Tags: AI modelamazonDataGithubMicrosoft

Related Posts

Amazon adds AI-generated product previews to search results

Amazon adds AI-generated product previews to search results

June 4, 2026
Meta launches AI business agents on WhatsApp, Instagram and Messenger

Meta launches AI business agents on WhatsApp, Instagram and Messenger

June 4, 2026
Google rolls out Ask Gemini in Drive to eligible Workspace users

Google rolls out Ask Gemini in Drive to eligible Workspace users

June 4, 2026
Does your AI clock in without you?

Does your AI clock in without you?

June 3, 2026
Anthropic invites 150 more organizations into Project Glasswing

Anthropic invites 150 more organizations into Project Glasswing

June 3, 2026
Microsoft unveils Project Solara for an agent-first future

Microsoft unveils Project Solara for an agent-first future

June 3, 2026

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.