DiffuAgent Icon

DiffuAgent

The Bitter Lesson of Diffusion Language Models for Agentic Workflows:
A Comprehensive Reality Check

Qingyu Lu1,3, Liang Ding2, Kanjian Zhang2, Jinxia Zhang1, Dacheng Tao3

1Southeast University, China 2Alibaba 3Nanyang Technological University, Singapore

Paper Code

TL; DR

Failure of dLLMs as Agent Backbones

Failure tables (embodied + tool-calling)

We compare dLLMs and autoregressive LLMs on embodied (AgentBoard) and tool-calling (BFCL) benchmarks. The results show that dLLMs lag behind on both success/progress and tool-calling accuracy.

Systematic Failure Modes of dLLMs

Failure analysis of dLLMs: retry loops, imprecise tool-calls, and efficiency-performance mismatch

(a) Failure of Replan for embodied agents: dLLMs exhibit significantly more frequent retry loops than LLMs.
(b) Failure of Precision for tool-calling agents: dLLMs are more prone to produce malformed JSON schemas.
(c) Performance-Efficiency Trade-offs: despite higher inference efficiency, dLLMs do not guarantee comparable agentic performance to autoregressive LLMs.

DiffuAgent: Framework on Analyzing Agentic Behaviors in dLLMs

Framework

To better understand the agentic potential of dLLMs, we introduce DiffuAgent, a novel evaluation framework that treats dLLMs as plug-and-play cognitive modules for augmenting LLM agents.

Analysis of Agentic Behaviors in dLLMs

dLLMs are competitive memory modules for memory-augmented agents.

• LLM Verifiers tend to trigger premature early exits, whereas dLLMs terminate more reliably.

• dLLMs are effective tool selectors but struggle as tool-call editors.

Citation

@article{lu2026diffuagent,
  title   = {The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check},
  author  = {Lu, Qingyu and Ding, Liang and Liu, Xuebo and Zhang, Kanjian and Zhang, Jinxia and Tao, Dacheng},
  journal = {arXiv preprint},
  year    = {2026},
  url     = {https://arxiv.org/pdf/2601.12979}
}