TL;DR: Decision-time planning scores a candidate only by how close its predicted final state is to the goal, never checking that the route there is realizable. ACID adds cycle action consistency: an inverse dynamics model infers the action behind each predicted transition, and its mismatch with the conditioning action becomes a per-step planning cost — improving planning across four world models and six tasks with substantially less compute.
Overall architecture of ACID. An MPC with CEM searches over candidate action sequences
a0:H-1 to minimize an augmented planning cost. The current observation
is encoded to z0, and the world model Fθ
unrolls a latent trajectory. An inverse dynamics model Gφ then
takes each predicted transition (žt, žt+1) and
infers the action that would explain it. The augmented cost combines a goal cost
(predicted final latent close to the goal) with an action consistency cost (the predicted
trajectory is realizable in the environment).
Decision-time planning with action-conditioned world models has become a popular paradigm for embodied control. However, the standard planning cost judges a candidate solely by how close its predicted terminal state lies to the goal, leaving the realizability of the intermediate transitions unchecked — a predicted trajectory can look convincing while the environment rollout drifts away from it. In this paper, we propose ACID, a decision-time planning framework that introduces cycle action consistency: the action inferred backward from a predicted transition by an inverse dynamics model should recover the one that was conditioned on. We fold this per-step residual into the planning cost via a scale-invariant adaptive weight. Across four action-conditioned world models and six tasks spanning rigid and deformable manipulation, articulated control, and visual navigation, ACID consistently improves planning and matches the baseline's accuracy with substantially less planning compute.
Search-based planning samples candidate action sequences, simulates them through a world model, and executes the one that minimizes a planning cost. But that cost, defined solely by the proximity between the goal and the terminal state, cannot see whether each transition is realizable — whether the conditioning action could actually produce it in the environment. Visual plausibility at the end does not imply it: a planner can commit to an action sequence whose predicted goal-reaching cannot be reproduced.
What makes realizability measurable is a property specific to embodied control: unlike a caption that admits many valid images, a pair of consecutive observations strongly constrains the action between them. So an inverse dynamics model (IDM) can tell whether each predicted step is consistent with its conditioning action — and we repurpose the IDM as a decision-time verifier rather than the offline action decoder or pseudo-labeler it has conventionally been.
ACID modifies only the planning cost. The world model, the IDM, and the CEM optimizer are left untouched, so ACID composes with any action-conditioned world model. It costs the whole trajectory — not only its terminal state.
If a predicted transition žt+1 = Fθ(žt, at)
truly reflects at, then the action an IDM infers from
(žt, žt+1) should recover it. The residual
vanishes exactly when the transition is the one at produces, and grows
as the prediction drifts.
Reuses the rollout already computed for the goal cost — no extra world-model rollout.
CEM ranks candidates by spread, not absolute scale, and the relative spread of the two costs
varies across world models, tasks, and even CEM iterations. We equalize their spread at every
iteration so neither term dominates the ranking, leaving a single λ to tune.
λ is tuned once per world model and transfers across tasks.
The augmented cost simply reranks the goal cost by realizability:
The planner then prefers sequences whose predicted trajectory both reaches a goal-like state and remains step-by-step realizable.
Success rate (%) on rigid manipulation, articulated control, and contact-rich pushing with the Le-WM and PLDM JEPA-style world models. Original plans with the goal cost only; Ours adds the action consistency cost. Parentheses: change over Original. Green: improvement. Higher is better.
| World model | Planning cost | Cube | Reacher | PushT |
|---|---|---|---|---|
| Le-WM | Original | 70.0 | 76.0 | 96.0 |
| Ours | 74.0(+4.0) | 88.0(+12.0) | 100.0(+4.0) | |
| PLDM | Original | 58.0 | 76.0 | 72.0 |
| Ours | 68.0(+10.0) | 90.0(+14.0) | 76.0(+4.0) |
Deformable manipulation with DINO-WM (Chamfer distance) and goal-conditioned visual navigation with NWM trained with CompACT (ATE / RPE). Lower is better for both.
| DINO-WM | Rope | Granular |
|---|---|---|
| Original | 1.38 | 0.49 |
| Ours | 0.56(−0.82) | 0.30(−0.19) |
| NWM w/ CompACT | ATE | RPE (trans.) |
|---|---|---|
| Original | 1.3141 | 0.3831 |
| Ours | 1.2835(−2.3%) | 0.3773(−1.5%) |
Gains hold across four world models — three JEPA-style latent predictors (Le-WM, PLDM, DINO-WM) and one video generative model (NWM) — and six tasks, from rigid and deformable object manipulation to articulated control and visual navigation.
The complex dynamics of deformable and granular media make many CEM candidate trajectories non-realizable — precisely the failure mode the action consistency cost is designed for. Under the Original cost, the planned action sequence reaches the goal in the model's predicted trajectory but drifts away from it in the actual environment rollout, leaving the goal cost satisfied by an unreachable future. Adding the action consistency cost (Ours) removes this discrepancy: the real rollout closely tracks the predicted trajectory and reaches the goal.






Real vs. imagined rollouts. Executing the planned action sequence in the environment, the Baseline (goal cost only) drifts away from the goal, while Ours (with action consistency) keeps the real rollout on track and reaches the goal — across three JEPA-style latent world models (Le-WM, PLDM, DINO-WM) and their respective tasks.
Sweeping the CEM budget from 30 to 300 samples, Ours outperforms Original at
every budget and never falls below the baseline. Sweeping λ across nearly
two orders of magnitude (0.005–0.1) yields consistent gains, so no careful per-task tuning is
needed. Below, a constant weight cannot win everywhere — no single value improves all three
tasks — whereas the scale-invariant adaptive weight yields the largest total improvement.
| N = | 30 | 50 | 150 | 300 |
|---|---|---|---|---|
| Le-WM — Original | 68 | 62 | 70 | 76 |
| Le-WM — Ours | 82 | 76 | 78 | 88 |
| PLDM — Original | 58 | 66 | 70 | 76 |
| PLDM — Ours | 78 | 84 | 82 | 84 |
On Le-WM and PLDM, Ours at the smallest budget (N=30) already matches the
full-budget Original (N=300).
| Task | Constant wa=1 | =5 | =10 | Ours |
|---|---|---|---|---|
| Cube | +0.0 | +0.0 | +2.0 | +4.0 |
| PushT | +0.0 | −2.0 | +2.0 | +4.0 |
| Reacher | +14.0 | −2.0 | +6.0 | +12.0 |
| Total Δ | +14.0 | −4.0 | +10.0 | +20.0 |
No single constant weight improves all three tasks; the adaptive weight removes the dependence and yields the largest total gain.
The verifier is lightweight: a single Euler step of the flow-matching IDM adds only a small fraction of the world-model forward latency per CEM iteration, and reuses the trajectory already rolled out for the goal cost. Despite this modest per-step overhead, ACID reaches the baseline's final Chamfer distance in less than half the planning steps — a net compute of roughly 0.7× to match baseline quality. Because total planning compute scales with both the number of CEM samples and planning steps, reducing either factor directly lowers cost, and Ours reaches a given quality with strictly less compute along both axes.
Mean Chamfer distance over planning steps on Granular and Rope. Ours reaches Original's late-step plateau several steps earlier on Granular, and already at the very first planning step on Rope.
@article{seo2026acid,
title = {ACID: Action Consistency via Inverse Dynamics for Planning with World Models},
author = {Seo, Gawon and Kim, Dongwon and Kwak, Suha},
journal = {arXiv preprint arXiv:2607.02403},
year = {2026}
}