Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Bloomberg research: RAG LLMs may be less safe than you think

Researchers discovered that safe models paired with safe documents can still generate harmful responses under RAG.

byKerem Gülen
April 28, 2025
in Research

Retrieval-Augmented Generation, or RAG, has been hailed as a way to make large language models more reliable by grounding their answers in real documents. The logic sounds airtight: give a model curated knowledge to pull from instead of relying solely on its own parameters, and you reduce hallucinations, misinformation, and risky outputs. But a new study suggests that the opposite might be happening. Even the safest models, paired with safe documents, became noticeably more dangerous when using RAG.

Researchers from Bloomberg AI, the University of Maryland, and Johns Hopkins conducted one of the first large-scale analyses of RAG systems’ safety. Their findings upend the common assumptions many AI developers and users hold about how retrieval impacts model behavior. Across eleven popular LLMs, RAG often introduced new vulnerabilities, creating unsafe responses that did not exist before.

Retrieval did not protect the models

In a test of over 5,000 harmful prompts, eight out of eleven models showed a higher rate of unsafe answers when RAG was activated. Safe behavior in the non-RAG setting did not predict safe behavior in RAG. The study provided a concrete example: Llama-3-8B, a model that only produced unsafe outputs 0.3 percent of the time in a standard setting, saw that figure jump to 9.2 percent when RAG was used.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Not only did the overall percentage of unsafe responses climb, but models also expanded their vulnerabilities across new risk categories. Previously contained weaknesses in areas like unauthorized practice of law or malware guidance spread into broader categories including adult content, misinformation, and political campaigning. RAG, instead of narrowing risk, broadened it.

Three reasons why RAG can backfire

The researchers traced this unexpected danger to three interlocking factors:

  • LLM Safety Baseline: Models that were less safe to begin with suffered the greatest deterioration in RAG settings.
  • Document Safety: Even when retrieved documents were classified as safe, models still generated harmful content.
  • RAG Task Performance: The way a model handled combining external documents with internal knowledge deeply influenced outcomes.

What emerged is that simply pairing a safe model with safe documents is no guarantee of safe responses. The mechanisms that make RAG appealing, such as context synthesis and document-guided answering, also open new pathways for misuse and misinterpretation.

Two main behaviors stood out when researchers analyzed unsafe outputs stemming from safe documents. First, models often repurposed harmless information into dangerous advice. For instance, a Wikipedia entry about how police use GPS trackers became, in the hands of a model, a tutorial for criminals on evading capture.

Second, even when instructed to rely solely on documents, models sometimes mixed in internal knowledge. This blending of memory and retrieval undermined the safeguards RAG was supposed to provide. Even when external documents were neutral or benign, internal unsafe knowledge surfaced in ways that fine-tuning had previously suppressed in the non-RAG setting.

Adding more retrieved documents only worsened the problem. Experiments showed that increasing the number of context documents made LLMs more likely to answer unsafe questions, not less. A single safe document was often enough to start changing a model’s risk profile.

Not all models handled RAG equally. Claude 3.5 Sonnet, for example, remained remarkably resilient, showing very low unsafe response rates even under RAG pressure. Gemma 7B appeared safe at first glance but deeper analysis revealed that it often simply refused to answer questions. Poor extraction and summarization skills masked vulnerabilities rather than fixing them.

In general, models that performed better at genuine RAG tasks like summarization and extraction were paradoxically more vulnerable. Their ability to synthesize from documents also made it easier for them to misappropriate harmless facts into unsafe content when the topic was sensitive.

The safety cracks widened further when researchers tested existing red-teaming methods designed to jailbreak LLMs. Techniques like GCG and AutoDAN, which work well for standard models, largely failed to transfer their success when targeting RAG setups.

One of the biggest challenges was that adversarial prompts optimized for a non-RAG model lost effectiveness when documents were injected into the context. Even retraining adversarial prompts specifically for RAG improved the results only slightly. Changing the documents retrieved each time created instability, making it hard for traditional jailbreak strategies to succeed consistently.

This gap shows that AI security tools and evaluations built for base models are not enough. Dedicated RAG-specific red-teaming will be needed if developers want to deploy retrieval-enhanced systems safely at scale.

Retrieval is not a safety blanket

As companies increasingly move toward RAG architectures for large language model applications, the findings of this study land as a stark warning. Retrieval does help reduce hallucinations and improve factuality, but it does not automatically translate into safer outputs. Worse, it introduces new layers of risk that traditional safety interventions were not designed to handle.

The takeaway is clear: LLM developers cannot assume that bolting on retrieval will make models safer. Fine-tuning must be explicitly adapted for RAG workflows. Red-teaming must account for context dynamism. Monitoring must treat the retrieval layer itself as a potential attack vector, not just a passive input.

Without RAG-specific defenses, the very techniques designed to ground language models in truth could instead create new vulnerabilities. If the industry does not address these gaps quickly, the next generation of LLM deployments might inherit deeper risks disguised under the comforting label of retrieval.


Featured image credit

Tags: llmRAG

Related Posts

Forget seeing dark matter, it’s time to listen for it

Forget seeing dark matter, it’s time to listen for it

October 28, 2025
Google’s search business could lose  billion a year to ChatGPT

Google’s search business could lose $30 billion a year to ChatGPT

October 27, 2025
AI helps decode the epigenetic ‘off-switch’ in an ugly plant that lives for 3,000 years

AI helps decode the epigenetic ‘off-switch’ in an ugly plant that lives for 3,000 years

October 27, 2025
Researchers warn that LLMs can get “brain rot” too

Researchers warn that LLMs can get “brain rot” too

October 24, 2025
Cyberattacks are now killing patients not just crashing systems

Cyberattacks are now killing patients not just crashing systems

October 21, 2025
Gen Z workers are telling AI things they’ve never told a human

Gen Z workers are telling AI things they’ve never told a human

October 20, 2025

LATEST NEWS

Google marks Pac-Man’s 45th anniversary with a Halloween Doodle

OpenAI Sora adds character cameos and video stitching

WhatsApp introduces passkeys for end-to-end encrypted chat backups

Character.AI is closing the door on under-18 users

Rode upgrades its Wireless Micro Camera Kit with universal compatibility

YouTube’s new Super Resolution turns blurry uploads into HD and 4K

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.