Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

MIT’s PDDL-INSTRUCT improves Llama-3-8B plan validity

Framework integrates logical reasoning, chain-of-thought prompting, and external validation to improve multi-step plan correctness.

byEmre Çıtak
September 22, 2025
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory developed PDDL-INSTRUCT, a framework using logical reasoning and external validation to improve how large language models generate multi-step plans, achieving up to 94% validity on specific benchmarks.

The framework addresses the common failure of large language models (LLMs) to produce logically valid plans, which often sound plausible but are incorrect. PDDL-INSTRUCT counters this by integrating explicit state and action semantics with ground-truth checking. Through “error education,” models are trained to explain plan failures, including unsatisfied preconditions, incorrect effects, frame violations, or an unreached goal. A logical chain-of-thought (CoT) prompting method also guides the model to perform step-by-step inference, producing detailed state-action-state traces formatted as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩ based on formal semantics.

To ensure correctness, each step of a generated plan is verified by the external VAL plan validator. The system can receive either binary feedback (valid/invalid) or detailed feedback specifying which precondition or effect failed. Research indicated detailed feedback yielded the strongest performance gains. PDDL-INSTRUCT also utilizes a two-stage optimization process. The first stage optimizes the model’s reasoning chains by penalizing state-transition errors. The second stage then optimizes the final accuracy of the end-task plan, creating a systematic training regimen.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The system was evaluated on the PlanBench benchmark, which includes the Blocksworld, Mystery Blocksworld, and Logistics planning domains. Mystery Blocksworld is particularly challenging as it obfuscates predicate names to prevent pattern-matching; prior models reported less than 5% validity on this task without tool support. With PDDL-INSTRUCT, a Llama-3-8B model achieved up to 94% valid plans on Blocksworld. On Mystery Blocksworld, the framework produced orders-of-magnitude improvements, reported as up to 64 times better than baseline models. Substantial increases in valid plans were also recorded in the Logistics domain.

Across all domains, the framework demonstrated up to a 66% absolute improvement in generating valid plans compared to untuned baselines. Performance was further enhanced by using detailed validator feedback and longer feedback budgets during training. This neuro-symbolic approach grounds an LLM’s reasoning in formal semantics that are checked automatically. Its current scope is limited to classical Planning Domain Definition Language (PDDL) domains and requires VAL as an external oracle. The method shows utility for agent pipelines that can accommodate a verifier, while extensions for temporal, numeric, and cost-sensitive planning remain open challenges.


Featured image credit

Tags: Llama-3-8BMITPDDL-INSTRUCT

Related Posts

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

June 23, 2026
Samsung adopts ChatGPT Enterprise and Codex across global workforce

Samsung adopts ChatGPT Enterprise and Codex across global workforce

June 22, 2026
OpenAI improves health responses for free ChatGPT users

OpenAI improves health responses for free ChatGPT users

June 19, 2026
Steam Next Fest sees one in five demos labeled for generative AI

Steam Next Fest sees one in five demos labeled for generative AI

June 17, 2026
Anthropic adds multilingual and push-to-talk features to Claude Voice Mode

Anthropic adds multilingual and push-to-talk features to Claude Voice Mode

June 17, 2026
Is Gemini down? Users report problems with Google Gemini

Is Gemini down? Users report problems with Google Gemini

June 17, 2026

LATEST NEWS

PlayStation 6 leak points to 2027 release window

Samsung unveils UFS 5.0 storage for future Galaxy phones

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Instagram for TV launches on Samsung TVs in the US

Valve opens Steam Machine reservations starting at $1,049

Apple releases iOS 27 beta 2 with new “Write with Siri” feature

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.