Fast-dLLM

ICLR 2026

Fast-dLLM is a family of acceleration techniques for diffusion-based Large Language Models (dLLMs), Vision-Language Models (dVLMs), and Vision-Language-Action (VLA) models. This repository contains:

	Fast-dLLM v1	Fast-dLLM v2	Fast-dVLM	Fast-dDrive
Paper	Training-free Acceleration of Diffusion LLM	Efficient Block-Diffusion LLM	Block-Diffusion VLM via Direct Conversion	Efficient Block-Diffusion VLM for Autonomous Driving
Modality	Text	Text	Vision + Text	Vision + Text + Action (driving)
Approach	Training-free inference acceleration	Block diffusion with fine-tuning	Direct AR-to-diffusion VLM conversion	Section-aware block diffusion + scaffold speculative decoding
Backbone	Dream, LLaDA	Qwen2.5	Qwen2.5-VL	Qwen2.5-VL
Key Techniques	KV Cache + Parallel Decoding	Block Diffusion + Hierarchical Caching	Block-Size Annealing + Speculative Decoding	SASD Training + Scaffold Spec + Test-Time Inference Scaling
Code	`v1/`	`v2/`	`fast_dvlm/`	`fast_ddrive/`
Model	—	Fast_dLLM_v2_7B	Fast_dVLM_3B	Fast-dDrive

News

(🔥 New) [2026/05/26] Fast-dDrive is released! Section-Aware Structured Diffusion VLA for end-to-end autonomous driving on Waymo (WOD-E2E). Combines Scaffold Speculative Decoding with SASD training for SOTA ADE / RFS at over 200 TPS on a single H100 (up to 12x over the AR baseline with SGLang). Check out fast_ddrive/, the model, and the paper.
[2026/04/10] Fast-dVLM is released! Up to 6.18x speedup over AR baseline while matching quality across 11 benchmarks. Check out our webpage, model, and paper!
(🔥 New) [2026/01/26] Fast-dLLM v1/v2 is accepted by ICLR-2026. 🎉🎉🎉
[2025.10.08] We have open sourced Fast-dLLM v2. Have a look at our webpage, model, and paper!
[2025.08.01] Our new online demo of Fast-dLLM: https://fast-dllm.hanlab.ai/, welcome to try!
[2025.07.06] Added factor-based parallel strategy and LLaDA-1.5 evaluation in v1/llada/eval_gsm8k.sh.
[2025.07.04] We updated our paper with latest improvements and evaluation results.
[2025.06.30] Fast-dLLM has been integrated into LLaDA-V. With Fast-dLLM, it accelerates the inference latency from 60s to 6s! Have a try here!!

TODOs

[✅] Inference and evaluation code
[✅] Training code of Fast-dLLM v2
[✅] Fast-dVLM: Block-diffusion VLM
[✅] Fast-dDrive: Block-diffusion VLA for autonomous driving
[🚀] vLLM support

Project Structure

Fast-dLLM/
├── v1/                     # Fast-dLLM v1: Training-free acceleration (LLM)
│   ├── dream/              #   Dream model support
│   ├── llada/              #   LLaDA model support
│   ├── requirements.txt
│   └── README.md
├── v2/                     # Fast-dLLM v2: Block diffusion (LLM)
│   ├── src/                #   LMFlow training framework
│   ├── train_scripts/      #   Fine-tuning scripts
│   ├── configs/            #   DeepSpeed configs
│   ├── generation_functions.py
│   ├── eval.py / eval_script.sh
│   ├── app.py / run_chatbot.py
│   ├── requirements.txt
│   └── README.md
├── fast_dvlm/              # Fast-dVLM: Block-diffusion VLM (chatbot, optional finetune sample, VLMEval; see fast_dvlm/README.md)
├── fast_ddrive/            # Fast-dDrive: Block-diffusion VLA for autonomous driving on Waymo E2E (see fast_ddrive/README.md)
├── CONTRIBUTING.md
├── LICENSE
└── README.md               # This file

Quick Start

Fast-dLLM v1 (Training-free Acceleration)

cd v1
pip install -r requirements.txt

# LLaDA interactive chat
python llada/chat.py --gen_length 128 --steps 128 --block_size 32

# Dream evaluation
accelerate launch dream/eval.py --model dream \
    --model_args pretrained=Dream-org/Dream-v0-Base-7B,max_new_tokens=256,diffusion_steps=8,add_bos_token=true,alg=confidence_threshold,threshold=0.9,use_cache=true \
    --tasks gsm8k --num_fewshot 5 --batch_size 1

For full details, see v1/README.md.

Fast-dLLM v2 (Block Diffusion)

cd v2
pip install -e .

# Gradio web demo
python app.py

# Evaluation
bash eval_script.sh

For full details, see v2/README.md.

Fast-dVLM (Block-Diffusion VLM)

cd fast_dvlm
pip install -r requirements.txt

# Quick inference
python run_chatbot.py \
    --model-name Efficient-Large-Model/Fast_dVLM_3B \
    --image path/to/image.jpg \
    --prompt "Describe this image in detail."

# Interactive mode
python run_chatbot.py

Fine-tuning (optional example): multimodal MDM training uses DeepSpeed + the LMFlow fork under third_party/ (the launcher sets PYTHONPATH for you). Download ALLaVA-4V with fast_dvlm/data/download_example_dataset.sh, then run bash fast_dvlm/train_scripts/finetune_multimodal_example.sh from the repo root—see Fine-tuning (example launcher) in fast_dvlm/README.md.

For full details, see fast_dvlm/README.md.

Fast-dDrive (Block-Diffusion VLA for Autonomous Driving)

cd fast_ddrive
pip install -r requirements.txt

# Single-shot demo: Scaffold Spec decoding on one driving frame.
python run_chatbot.py \
    --model_path xiwenyoumu/Fast-dDrive \
    --image data/example/images/161_CAM_FRONT.jpg \
    --prompt "Describe the driving scene and produce a 5-second plan."

# Waymo E2E validation eval (paper canonical Scaffold Spec, multi-GPU).
MODEL_PATH=xiwenyoumu/Fast-dDrive EVAL_JSON=/path/to/waymo_val.json \
    IMAGE_ROOT=/path/to/image_root bash run_eval.sh

Three decoding paths are exposed via --mode / MODE: section_diffusion (SD), scaffold_spec (SS — paper canonical), and inference_scaling (SS multi-trajectory rollouts).

Fine-tuning (SASD): mirrors the fast_dvlm DeepSpeed launcher and reuses the same vendored LMFlow under third_party/ (with a small set of pure-addition SASD hooks). Provide a Waymo training JSON + image root, then:

DATASET_PATH=/path/to/waymo_train.json IMAGE_FOLDER=/path/to/image_root \
    bash fast_ddrive/train_scripts/train_waymo_sasd.sh

For full details, see fast_ddrive/README.md.

Contributing

Issues and Pull Requests are welcome! Please see CONTRIBUTING.md for details.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Citation

If you find this work useful, please cite our papers:

@misc{zhang2026fastddriveefficientblockdiffusionvlm,
      title={Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving},
      author={Kewei Zhang and Jin Wang and Sensen Gao and Chengyue Wu and Yulong Cao and Songyang Han and Boris Ivanovic and Langechuan Liu and Marco Pavone and Song Han and Daquan Zhou and Enze Xie},
      year={2026},
      eprint={2605.23163},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.23163},
}
@misc{wu2026fastdvlmefficientblockdiffusionvlm,
      title={Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM},
      author={Chengyue Wu and Shiyi Lan and Yonggan Fu and Sensen Gao and Jin Wang and Jincheng Yu and Jose M. Alvarez and Pavlo Molchanov and Ping Luo and Song Han and Ligeng Zhu and Enze Xie},
      year={2026},
      eprint={2604.06832},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.06832},
}
@misc{wu2025fastdllmv2efficientblockdiffusion,
      title={Fast-dLLM v2: Efficient Block-Diffusion LLM}, 
      author={Chengyue Wu and Hao Zhang and Shuchen Xue and Shizhe Diao and Yonggan Fu and Zhijian Liu and Pavlo Molchanov and Ping Luo and Song Han and Enze Xie},
      year={2025},
      eprint={2509.26328},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.26328}, 
}
@misc{wu2025fastdllmtrainingfreeaccelerationdiffusion,
      title={Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding}, 
      author={Chengyue Wu and Hao Zhang and Shuchen Xue and Zhijian Liu and Shizhe Diao and Ligeng Zhu and Ping Luo and Song Han and Enze Xie},
      year={2025},
      eprint={2505.22618},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.22618}, 
}

Acknowledgements

We would like to thank the authors of LLaDA and Dream for their excellent work and open-source contributions. We thank Qwen2.5 and Qwen2.5-VL for the base model architectures and LMFlow for the training framework.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast-dLLM

ICLR 2026

News

TODOs

Project Structure

Quick Start

Fast-dLLM v1 (Training-free Acceleration)

Fast-dLLM v2 (Block Diffusion)

Fast-dVLM (Block-Diffusion VLM)

Fast-dDrive (Block-Diffusion VLA for Autonomous Driving)

Contributing

License

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
asset		asset
fast_ddrive		fast_ddrive
fast_dvlm		fast_dvlm
third_party		third_party
v1		v1
v2		v2
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Fast-dLLM

ICLR 2026

News

TODOs

Project Structure

Quick Start

Fast-dLLM v1 (Training-free Acceleration)

Fast-dLLM v2 (Block Diffusion)

Fast-dVLM (Block-Diffusion VLM)

Fast-dDrive (Block-Diffusion VLA for Autonomous Driving)

Contributing

License

Citation

Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages