Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

MIT’s PDDL-INSTRUCT improves Llama-3-8B plan validity

Framework integrates logical reasoning, chain-of-thought prompting, and external validation to improve multi-step plan correctness.

byEmre Çıtak
September 22, 2025
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory developed PDDL-INSTRUCT, a framework using logical reasoning and external validation to improve how large language models generate multi-step plans, achieving up to 94% validity on specific benchmarks.

The framework addresses the common failure of large language models (LLMs) to produce logically valid plans, which often sound plausible but are incorrect. PDDL-INSTRUCT counters this by integrating explicit state and action semantics with ground-truth checking. Through “error education,” models are trained to explain plan failures, including unsatisfied preconditions, incorrect effects, frame violations, or an unreached goal. A logical chain-of-thought (CoT) prompting method also guides the model to perform step-by-step inference, producing detailed state-action-state traces formatted as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩ based on formal semantics.

To ensure correctness, each step of a generated plan is verified by the external VAL plan validator. The system can receive either binary feedback (valid/invalid) or detailed feedback specifying which precondition or effect failed. Research indicated detailed feedback yielded the strongest performance gains. PDDL-INSTRUCT also utilizes a two-stage optimization process. The first stage optimizes the model’s reasoning chains by penalizing state-transition errors. The second stage then optimizes the final accuracy of the end-task plan, creating a systematic training regimen.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The system was evaluated on the PlanBench benchmark, which includes the Blocksworld, Mystery Blocksworld, and Logistics planning domains. Mystery Blocksworld is particularly challenging as it obfuscates predicate names to prevent pattern-matching; prior models reported less than 5% validity on this task without tool support. With PDDL-INSTRUCT, a Llama-3-8B model achieved up to 94% valid plans on Blocksworld. On Mystery Blocksworld, the framework produced orders-of-magnitude improvements, reported as up to 64 times better than baseline models. Substantial increases in valid plans were also recorded in the Logistics domain.

Across all domains, the framework demonstrated up to a 66% absolute improvement in generating valid plans compared to untuned baselines. Performance was further enhanced by using detailed validator feedback and longer feedback budgets during training. This neuro-symbolic approach grounds an LLM’s reasoning in formal semantics that are checked automatically. Its current scope is limited to classical Planning Domain Definition Language (PDDL) domains and requires VAL as an external oracle. The method shows utility for agent pipelines that can accommodate a verifier, while extensions for temporal, numeric, and cost-sensitive planning remain open challenges.


Featured image credit

Tags: Llama-3-8BMITPDDL-INSTRUCT

Related Posts

Claim: NVIDIA green-lit pirated book downloads for AI training

Claim: NVIDIA green-lit pirated book downloads for AI training

January 20, 2026
OpenAI targets “practical adoption” for 2026 strategy

OpenAI targets “practical adoption” for 2026 strategy

January 20, 2026
Mother of one of Elon Musk’s children sues xAI over sexual Grok deepfakes

Mother of one of Elon Musk’s children sues xAI over sexual Grok deepfakes

January 16, 2026
US Senate slams tech giants over “failing” deepfake guardrails

US Senate slams tech giants over “failing” deepfake guardrails

January 16, 2026
OpenAI launches standalone ChatGPT Translate

OpenAI launches standalone ChatGPT Translate

January 15, 2026
DeepSeek V4 and R2 launch timing stays hidden

DeepSeek V4 and R2 launch timing stays hidden

January 15, 2026

LATEST NEWS

Google Workspace adds password-protected Office file editing

Claim: NVIDIA green-lit pirated book downloads for AI training

Tesla restarts Dojo3 supercomputer project as AI5 chip stabilizes

OpenAI targets “practical adoption” for 2026 strategy

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

Walmart maintains Apple Pay ban in U.S. stores for 2026

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.