Skip to content

Add NIXL to test suite#16

Open
diskdog wants to merge 1 commit into
spectral-compute:releasefrom
diskdog:add-nixl
Open

Add NIXL to test suite#16
diskdog wants to merge 1 commit into
spectral-compute:releasefrom
diskdog:add-nixl

Conversation

@diskdog

@diskdog diskdog commented May 22, 2026

Copy link
Copy Markdown

Summary

Adds clone, build, and example scripts for NIXL (e128059), NVIDIA's network interconnect library for GPU-to-GPU transfers across heterogeneous fabrics. The validation builds NIXL against a locally-compiled UCX and runs the bundled C++ transfer example.

Dependencies

NIXL requires UCX with multi-thread support (--enable-mt). The Ubuntu-packaged UCX (1.16) is too old and lacks APIs that NIXL uses. 00-clone.sh therefore clones UCX (c982cef) alongside NIXL, and 01-build.sh builds and installs UCX from source before building NIXL.

Build notes

UCX is built first into ucx-install/ using autoconf. The --with-cuda flag points at the CUDA toolkit that test.sh provides via environment variables.

NIXL is then built with Meson and Ninja. Two points to note:

  • PKG_CONFIG_PATH and LD_LIBRARY_PATH are updated before the NIXL build so that Meson discovers the locally-built UCX rather than the system one.
  • build_tests=false is set explicitly. NIXL's gtest suite did not build cleanly against the environment; this can be revisited once the test build is fixed.

Validation

02-example.sh runs nixl/build/examples/cpp/nixl_example and checks that the output contains Test done, which the example prints on a successful end-to-end transfer. Two environment variables are required at runtime:

  • NIXL_PLUGIN_DIR - points NIXL's plugin manager at the locally-built UCX backend (nixl/build/src/plugins/ucx).
  • LD_LIBRARY_PATH - includes ucx-install/lib so the loader finds the UCX shared libraries that NIXL was built against.

Status

Passes locally on sm_75 under both native CUDA 13.1 and SCALE 1.7.0. Results are marked ? pending CI validation on the repo's target hardware.

@diskdog diskdog marked this pull request as draft May 23, 2026 03:50
@diskdog diskdog marked this pull request as ready for review May 30, 2026 18:43
@SiliconExarch

Copy link
Copy Markdown
Contributor

This looks good, I've cherry-picked it onto master so it gets picked up by the next CI run. Thanks for the contribution!

@diskdog

diskdog commented Jun 12, 2026

Copy link
Copy Markdown
Author

Thank you, that makes me so happy!

I can see the scripts and version pins are on master now. I noticed Geoff dropped the placeholder row I manually added to the README. Did I miss a step or does the row appear once there are actual results for nixl?

Either way, if there's anything I can do to make your lives easier, just let me know! ^^

@SiliconExarch

Copy link
Copy Markdown
Contributor

Indeed, the row should appear once there are some results, probably around this time tomorrow. Nothing missed on your part ^^

If you could submit any future PRs against master that would be helpful; don't worry about rebasing the one you already have open for code_saturne though - I'll cherry-pick that once I've reviewed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants