Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

MIT’s PDDL-INSTRUCT improves Llama-3-8B plan validity

Framework integrates logical reasoning, chain-of-thought prompting, and external validation to improve multi-step plan correctness.

byEmre Çıtak
September 22, 2025
in Artificial Intelligence

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory developed PDDL-INSTRUCT, a framework using logical reasoning and external validation to improve how large language models generate multi-step plans, achieving up to 94% validity on specific benchmarks.

The framework addresses the common failure of large language models (LLMs) to produce logically valid plans, which often sound plausible but are incorrect. PDDL-INSTRUCT counters this by integrating explicit state and action semantics with ground-truth checking. Through “error education,” models are trained to explain plan failures, including unsatisfied preconditions, incorrect effects, frame violations, or an unreached goal. A logical chain-of-thought (CoT) prompting method also guides the model to perform step-by-step inference, producing detailed state-action-state traces formatted as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩ based on formal semantics.

To ensure correctness, each step of a generated plan is verified by the external VAL plan validator. The system can receive either binary feedback (valid/invalid) or detailed feedback specifying which precondition or effect failed. Research indicated detailed feedback yielded the strongest performance gains. PDDL-INSTRUCT also utilizes a two-stage optimization process. The first stage optimizes the model’s reasoning chains by penalizing state-transition errors. The second stage then optimizes the final accuracy of the end-task plan, creating a systematic training regimen.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The system was evaluated on the PlanBench benchmark, which includes the Blocksworld, Mystery Blocksworld, and Logistics planning domains. Mystery Blocksworld is particularly challenging as it obfuscates predicate names to prevent pattern-matching; prior models reported less than 5% validity on this task without tool support. With PDDL-INSTRUCT, a Llama-3-8B model achieved up to 94% valid plans on Blocksworld. On Mystery Blocksworld, the framework produced orders-of-magnitude improvements, reported as up to 64 times better than baseline models. Substantial increases in valid plans were also recorded in the Logistics domain.

Across all domains, the framework demonstrated up to a 66% absolute improvement in generating valid plans compared to untuned baselines. Performance was further enhanced by using detailed validator feedback and longer feedback budgets during training. This neuro-symbolic approach grounds an LLM’s reasoning in formal semantics that are checked automatically. Its current scope is limited to classical Planning Domain Definition Language (PDDL) domains and requires VAL as an external oracle. The method shows utility for agent pipelines that can accommodate a verifier, while extensions for temporal, numeric, and cost-sensitive planning remain open challenges.


Featured image credit

Tags: Llama-3-8BMITPDDL-INSTRUCT

Related Posts

Microsoft Copilot can now search inside your Google Drive

Microsoft Copilot can now search inside your Google Drive

October 13, 2025
The era of unscripted AI game characters has officially begun

The era of unscripted AI game characters has officially begun

October 13, 2025
How a university’s AI witch hunt derailed a student’s career

How a university’s AI witch hunt derailed a student’s career

October 13, 2025
Microsoft Copilot can now create documents and search your Gmail

Microsoft Copilot can now create documents and search your Gmail

October 10, 2025
Google Messages is about to get a lot smarter with this AI tool

Google Messages is about to get a lot smarter with this AI tool

October 10, 2025
Microsoft’s answer to OpenAI’s data centers: An AI factory

Microsoft’s answer to OpenAI’s data centers: An AI factory

October 10, 2025

LATEST NEWS

Watch 11th SpaceX Starship test flight today live

Instagram tests Reels-first redesign with DMs at the center

Apple ends free repair programs for AirPods Pro and iPhone 12

Apple brings live NBA games to Vision Pro starting with the Lakers

Apple officially kills its Clips app after seven years of quiet decline

Chrome will now silence annoying sites you never click on

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.