MIT’s PDDL-INSTRUCT Improves Llama-3-8B Plan Validity

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory developed PDDL-INSTRUCT, a framework using logical reasoning and external validation to improve how large language models generate multi-step plans, achieving up to 94% validity on specific benchmarks.

The framework addresses the common failure of large language models (LLMs) to produce logically valid plans, which often sound plausible but are incorrect. PDDL-INSTRUCT counters this by integrating explicit state and action semantics with ground-truth checking. Through “error education,” models are trained to explain plan failures, including unsatisfied preconditions, incorrect effects, frame violations, or an unreached goal. A logical chain-of-thought (CoT) prompting method also guides the model to perform step-by-step inference, producing detailed state-action-state traces formatted as ⟨sᵢ, aᵢ₊₁, sᵢ₊₁⟩ based on formal semantics.

To ensure correctness, each step of a generated plan is verified by the external VAL plan validator. The system can receive either binary feedback (valid/invalid) or detailed feedback specifying which precondition or effect failed. Research indicated detailed feedback yielded the strongest performance gains. PDDL-INSTRUCT also utilizes a two-stage optimization process. The first stage optimizes the model’s reasoning chains by penalizing state-transition errors. The second stage then optimizes the final accuracy of the end-task plan, creating a systematic training regimen.

The system was evaluated on the PlanBench benchmark, which includes the Blocksworld, Mystery Blocksworld, and Logistics planning domains. Mystery Blocksworld is particularly challenging as it obfuscates predicate names to prevent pattern-matching; prior models reported less than 5% validity on this task without tool support. With PDDL-INSTRUCT, a Llama-3-8B model achieved up to 94% valid plans on Blocksworld. On Mystery Blocksworld, the framework produced orders-of-magnitude improvements, reported as up to 64 times better than baseline models. Substantial increases in valid plans were also recorded in the Logistics domain.

Across all domains, the framework demonstrated up to a 66% absolute improvement in generating valid plans compared to untuned baselines. Performance was further enhanced by using detailed validator feedback and longer feedback budgets during training. This neuro-symbolic approach grounds an LLM’s reasoning in formal semantics that are checked automatically. Its current scope is limited to classical Planning Domain Definition Language (PDDL) domains and requires VAL as an external oracle. The method shows utility for agent pipelines that can accommodate a verifier, while extensions for temporal, numeric, and cost-sensitive planning remain open challenges.

Featured image credit

MIT’s PDDL-INSTRUCT improves Llama-3-8B plan validity

Framework integrates logical reasoning, chain-of-thought prompting, and external validation to improve multi-step plan correctness.

Related Posts

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Samsung adopts ChatGPT Enterprise and Codex across global workforce

OpenAI improves health responses for free ChatGPT users

Steam Next Fest sees one in five demos labeled for generative AI

Anthropic adds multilingual and push-to-talk features to Claude Voice Mode

Is Gemini down? Users report problems with Google Gemini

LATEST NEWS

PlayStation 6 leak points to 2027 release window

Samsung unveils UFS 5.0 storage for future Galaxy phones

Getty Images partners with OpenAI to supply licensed visuals for ChatGPT

Instagram for TV launches on Samsung TVs in the US

Valve opens Steam Machine reservations starting at $1,049

Apple releases iOS 27 beta 2 with new “Write with Siri” feature

BEST AI MODELS LEADERBOARD

LATEST TOOLS

Moonbeam

Charisma AI

Essay Writer by Papertyper

Slite

Wonderin AI

Spur

Stenography

Calldesk

MaxAI.me

PhotoRestore

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

MIT’s PDDL-INSTRUCT improves Llama-3-8B plan validity

Framework integrates logical reasoning, chain-of-thought prompting, and external validation to improve multi-step plan correctness.

Stay Ahead of the Curve!

Related Posts

LATEST NEWS

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

Follow Us