Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Study: LLMs favor sentence structure over meaning

Researchers found that models prioritize grammatical "shapes" over actual semantic meaning.

byEmre Çıtak
December 5, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Researchers from MIT, Northeastern University, and Meta recently released a paper indicating that large language models (LLMs) may prioritize sentence structure over semantic meaning when responding to prompts, potentially explaining the success of certain prompt injection attacks.

The findings, detailed in a paper co-authored by Chantal Shaib and Vinith M. Suriyakumar, reveal a vulnerability in how LLMs process instructions. This structural overreliance can allow bad actors to bypass safety conditioning by embedding harmful requests within benign grammatical patterns.

The team will present these findings at NeurIPS later this month. They employed a controlled experiment using a synthetic dataset where each subject area had a unique grammatical template. For example, geography questions followed one structural pattern, while creative works questions followed another.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

They trained Allen AI’s Olmo models on this data and observed “spurious correlations” where models treated syntax as a proxy for the domain. When semantic meaning conflicted with syntactic patterns, the models’ memorization of specific grammatical “shapes” superseded semantic parsing, leading to incorrect responses based on structural cues rather than actual meaning. For instance, when prompted with “Quickly sit Paris clouded?”—a phrase mimicking the structure of “Where is Paris located?” but using nonsensical words—models still responded “France.”

The researchers also documented a security vulnerability, which they termed “syntax hacking.” By prepending prompts with grammatical patterns from benign training domains, they bypassed safety filters in OLMo-2-7B-Instruct. When the team added a chain-of-thought template to 1,000 harmful requests from the WildJailbreak dataset, refusal rates decreased from 40% to 2.5%.

Examples of jailbroken prompts included detailed instructions for organ smuggling and methods for drug trafficking between Colombia and the United States.

To measure pattern-matching rigidity, the team conducted linguistic stress tests on the models:

  • Accuracy on antonyms: OLMo-2-13B-Instruct achieved 93% accuracy on prompts where antonyms replaced original words, nearly matching its 94% accuracy with exact training phrases.
  • Cross-domain accuracy drop: When the same grammatical template was applied to a different subject area, accuracy fell by 37 to 54 percentage points across model sizes.
  • Disfluent prompts: Models consistently performed poorly on disfluent prompts, which contained syntactically correct nonsense, regardless of the domain.

The researchers also applied a benchmarking method to verify these patterns in production models, extracting grammatical templates from the FlanV2 instruction-tuning dataset and testing model performance when those templates were applied to different subject areas.

Tests on OLMo-2-7B, GPT-4o, and GPT-4o-mini revealed similar performance declines in cross-domain scenarios:

  • Sentiment140 classification task: GPT-4o-mini’s accuracy dropped from 100% to 44% when geography templates were applied to sentiment analysis questions.
  • GPT-4o: Its accuracy fell from 69% to 36% under similar conditions.

The findings carry several caveats. The researchers could not confirm whether closed-source models such as GPT-4o were trained on the FlanV2 dataset. Without access to training data, other explanations for cross-domain performance drops in these models remain possible. The benchmarking method also faces a potential circularity issue; the researchers defined “in-domain” templates as those where models answered correctly, then concluded difficulty stemmed from syntax-domain correlations.

The study specifically focused on OLMo models ranging from 1 billion to 13 billion parameters and did not examine larger models or those trained with chain-of-thought outputs. Additionally, synthetic experiments intentionally created strong template-domain associations, while real-world training data likely involves more complex patterns where multiple subject areas share grammatical structures.


Featured image credit

Tags: AIsyntax

Related Posts

CrowdStrike warns prompt injection attacks hit over 90 firms in 2025

CrowdStrike warns prompt injection attacks hit over 90 firms in 2025

June 29, 2026
Wireless charging uses about 40% more electricity

Wireless charging uses about 40% more electricity

June 25, 2026
European consumers may leave businesses using US tech providers

European consumers may leave businesses using US tech providers

June 24, 2026
Study links AI-assisted homework to lower exam scores

Study links AI-assisted homework to lower exam scores

June 22, 2026
Harvard and Boston Children’s use AI to revisit unsolved genetic cases

Harvard and Boston Children’s use AI to revisit unsolved genetic cases

June 19, 2026
Adobe report finds 86% of creators now use generative AI in workflows

Adobe report finds 86% of creators now use generative AI in workflows

June 17, 2026

LATEST NEWS

Apple touchscreen MacBook could launch with M5 Pro chips

Apple touchscreen MacBook could launch with M5 Pro chips

OpenAI limits ChatGPT 5.6 access to government-approved users first

Apple to skip M6 Pro and Max chips and launch M7 in 2027

IBM unveils world’s first sub-1nm chip with new nanostack architecture

Apple raises prices across Macs, iPads and home devices

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Autoppt

Otter.ai

Slideoo

Disney Pixar AI Generator

Codebay

Newo

BlackInk.AI

WatchMyCompetitor

TokkingHeads

Fellow.app

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.