feat: fix the vLLM DP path by guyueh1 · Pull Request #2517 · NVIDIA-NeMo/RL

guyueh1 · 2026-05-18T04:46:25Z

What does this PR do ?

Previously nemo-rl doesn't work for vllm's native DP (EP>TP), this PR wants to support this case.

The following basic tests have passed, now trying the nightly test

# eval test
uv run examples/run_eval.py \
generation.model_name=Qwen/Qwen3-30B-A3B \
cluster.num_nodes=2 \
cluster.gpus_per_node=4 \
generation.vllm_cfg.tensor_parallel_size=4 \
generation.vllm_cfg.expert_parallel_size=8 \
generation.vllm_cfg.async_engine=true \

# grpo test
uv run examples/run_grpo.py \
--config examples/configs/grpo_math_1B_megatron.yaml \
policy.model_name=Qwen/Qwen3-30B-A3B \
cluster.num_nodes=4 \
cluster.gpus_per_node=4 \
policy.generation.colocated.enabled=false \
policy.generation.colocated.resources.num_nodes=2 \
policy.generation.colocated.resources.gpus_per_node=4 \
policy.generation.vllm_cfg.tensor_parallel_size=4 \
policy.generation.vllm_cfg.expert_parallel_size=8 \
policy.generation.vllm_cfg.async_engine=true \
policy.megatron_cfg.expert_model_parallel_size=8 \
policy.sequence_packing.enabled=false \

New nightly test figures:
H100 with EP=8 async engine

https://wandb.ai/nvidia/nemo-rl/runs/4mcplb63

H100 with TP=2 EP=16 sync engine

https://wandb.ai/nvidia/nemo-rl/runs/b3mon8zg

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

copy-pr-bot · 2026-05-18T04:46:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

guyueh1 · 2026-05-18T04:52:40Z

/ok to test efc6fc2

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 · 2026-05-18T16:45:07Z

/ok to test 9f381f2

guyueh1 · 2026-05-19T17:33:47Z

fast CI is failing for uuidgen: command not found, trying the full set to see if it helps

guyueh1 · 2026-05-22T15:38:14Z

the added nightly test exceeds the 1340 hour quota for nightly, should we increase it? (now it's 1345) @terrykong @chtruong814

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2026-05-22T20:31:18Z

/ok to test ad114f0

guyueh1 · 2026-05-22T23:51:01Z

Ran a test on llama3-8B with vLLM TP=1 & PP=1 to study the impact of changing distributed_executor_backend from None to mp , there is no observable performance difference.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2026-05-24T00:53:38Z

/ok to test d03ceee

yuki-97 · 2026-05-25T07:04:26Z

+                os.environ["VLLM_DP_SIZE"] = str(vllm_dp_size)
+                os.environ["VLLM_DP_RANK"] = str(vllm_dp_rank)
+                # Always set local rank to 0 because we only expose GPUs belong to this DP rank to the worker; if we set it to the actual local rank, it will cause the worker to hang.
+                os.environ["VLLM_DP_RANK_LOCAL"] = str(0)


will it have some issue when we have local dp > 1?

no, it works when we have only dp (which means on one node the local dp = gpus_per_node) and I confirmed the workers are placed on all GPUs; on the contrary if we set them to rank % 8, then it will cause a failure

assuming we have 2nodes * 8GPUs, is that mean os.environ["VLLM_DP_RANK_LOCAL"] = str(0) works for both DP8 and DP4? (I think for DP4 there should be 2 DP groups locally?)

yuki-97 · 2026-05-25T07:37:26Z

@@ -462,19 +462,41 @@ def _patch_vllm_hermes_tool_parser_thread_safety():
        os.environ["VLLM_ALLOW_INSECURE_SERIALIZATION"] = "1"



I saw the file in vLLM is removed, can we link to https://github.com/vllm-project/vllm/tree/v0.20.0/examples/rl instead?

which file?

ah sorry comment on the wrong line. I mean line 463 # See details in https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/data_parallel.py

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/data_parallel.py is removed in vLLM now, and I think we can link to another instead.

yuki-97 · 2026-05-25T07:40:38Z

@terrykong could you help to take a review as well?

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2026-05-25T23:17:54Z

/ok to test 2272d47

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2026-05-26T01:34:51Z

/ok to test 0ea633c

Support vllm dp

efc6fc2

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

guyueh1 requested review from a team as code owners May 18, 2026 04:46

guyueh1 requested review from yuki-97 and removed request for a team May 18, 2026 04:46

guyueh1 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label May 18, 2026

copy-pr-bot Bot temporarily deployed to public May 18, 2026 04:52 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 04:53 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 18, 2026 04:53 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 04:53 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 04:57 Inactive

Guyue Huang added 2 commits May 18, 2026 09:36

Add functional test to test suite

436f5ad

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

Fix lint

9f381f2

Signed-off-by: Guyue Huang <guyueh@login-lyris01.lyris.clusters.nvidia.com>

copy-pr-bot Bot temporarily deployed to public May 18, 2026 16:45 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 18, 2026 16:45 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci May 18, 2026 16:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 16:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 18, 2026 16:50 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 19, 2026 16:24 Failure

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels May 19, 2026

Change nightly test GPU hour to 1380

ad114f0

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

copy-pr-bot Bot temporarily deployed to public May 22, 2026 20:31 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 22, 2026 20:31 Inactive

copy-pr-bot Bot temporarily deployed to public May 22, 2026 20:31 Inactive

copy-pr-bot Bot temporarily deployed to public May 22, 2026 20:35 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 22, 2026 20:54 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 22, 2026 22:28 Inactive

guyueh1 added 2 commits May 22, 2026 16:57

Merge branch 'save' into support_vllm_dp

71c1c70

Fix

d03ceee

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 force-pushed the support_vllm_dp branch from d0e091a to d03ceee Compare May 24, 2026 00:53

copy-pr-bot Bot temporarily deployed to public May 24, 2026 00:53 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 24, 2026 00:54 Inactive

copy-pr-bot Bot temporarily deployed to public May 24, 2026 00:54 Inactive

copy-pr-bot Bot temporarily deployed to public May 24, 2026 00:58 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 24, 2026 01:31 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 24, 2026 03:06 Inactive

yuki-97 reviewed May 25, 2026

View reviewed changes

yuki-97 requested a review from terrykong May 25, 2026 07:40

guyueh1 added 2 commits May 25, 2026 11:06

Small fixes

1f7c46a

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

add nightly test, review comments

2272d47

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Add nightly test under H100 because Gb200 is broken

0ea633c

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

		@@ -462,19 +462,41 @@ def _patch_vllm_hermes_tool_parser_thread_safety():
		os.environ["VLLM_ALLOW_INSECURE_SERIALIZATION"] = "1"

Conversation

guyueh1 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 18, 2026

Uh oh!

guyueh1 commented May 18, 2026

Uh oh!

guyueh1 commented May 18, 2026

Uh oh!

guyueh1 commented May 19, 2026

Uh oh!

guyueh1 commented May 22, 2026

Uh oh!

guyueh1 commented May 22, 2026

Uh oh!

guyueh1 commented May 22, 2026

Uh oh!

guyueh1 commented May 24, 2026

Uh oh!

Uh oh!

Uh oh!

yuki-97 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

yuki-97 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuki-97 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

guyueh1 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

yuki-97 May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuki-97 commented May 25, 2026

Uh oh!

guyueh1 commented May 25, 2026

Uh oh!

guyueh1 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guyueh1 commented May 18, 2026 •

edited

Loading