Skip to content

tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042

Open
T0nd3 wants to merge 6 commits into
OpenNMT:masterfrom
T0nd3:tests/rocm-hip-support
Open

tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042
T0nd3 wants to merge 6 commits into
OpenNMT:masterfrom
T0nd3:tests/rocm-hip-support

Conversation

@T0nd3
Copy link
Copy Markdown

@T0nd3 T0nd3 commented May 11, 2026

Problem

All C++ GPU test instantiations (OpDeviceTest, PrimitiveTest,
StorageViewDeviceTest, LayerDeviceFPTest, BiasedDecodingDeviceFPTest)
were guarded with #ifdef CT2_WITH_CUDA. HIP builds define CT2_USE_HIP
instead, so every GPU test case was silently skipped when running
ctranslate2_test against an AMD GPU.

Changes

C++ (tests/)

Changed the preprocessor guard in five test files from:

#ifdef CT2_WITH_CUDA

to:

#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)

Files changed:

  • tests/ops_test.cc
  • tests/primitives_test.cc
  • tests/storage_view_test.cc
  • tests/layers_test.cc
  • tests/translator_test.cc

HIP builds use the same Device::CUDA enum and the same code path for GPU
execution, so no test logic needs to change — only the guard.

Python (python/tests/)

  • test_utils.py: Added require_rocm marker (backed by
    get_cuda_device_count() — HIP exposes devices through the same API).
  • test_rocm.py (new): ROCm-specific test suite covering device
    detection, supported compute types (float32, float16, bfloat16,
    int8), and StorageView allocation/copy on a ROCm device.

Testing

Verified on AMD Radeon RX 7900 XTX (gfx1100, RDNA 3) with
CTranslate2 built with -DWITH_HIP=ON on Windows 11.

Relates to the ROCm/HIP support added in v4.7.0.

T0nd3 added 4 commits May 11, 2026 15:34
… tests

All C++ GPU test instantiations were guarded with `#ifdef CT2_WITH_CUDA`,
which excluded them from HIP builds even though HIP uses the same Device::CUDA
enum and code path. Change each guard to
`#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` so the tests run on
both backends.

Add a `require_rocm` pytest marker to `test_utils.py` (currently backed by
`get_cuda_device_count()` since HIP exposes devices via the same API) and a
new `python/tests/test_rocm.py` with ROCm-specific device detection, compute
type, and StorageView tests.
Use lowercase enum values (DataType.float32, Device.cuda) and
to_device() instead of to() for device transfers, matching the
actual ctranslate2 Python API.
@jordimas
Copy link
Copy Markdown
Collaborator

jordimas commented May 12, 2026

Hello,

test_storageview_cuda_to_device and test_storageview_cuda already cover what the ROCm tests do. Since require_rocm and require_cuda have identical conditions (both check get_cuda_device_count() == 0), those two existing tests already run on ROCm — no changes needed.

The only new thing in test_rocm.py is test_float16_on_gpu. That could just be added to test_storage_view.py with @test_utils.require_cuda.

…iew feedback

@jordimas pointed out (OpenNMT#2042 (comment))
that the Python side of this PR was redundant: `require_rocm` had the
exact same body as `require_cuda` (both `get_cuda_device_count() == 0`,
which already returns the count of ROCm/HIP devices on HIP builds), and
`test_storageview_cuda` + `test_storageview_cuda_to_device` already cover
the GPU StorageView allocation / round-trip / dtype paths that
test_rocm.py would have exercised.

Removing both files.  The C++ side of this PR is unchanged — the
`#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` guard changes in
ops_test.cc / primitives_test.cc / storage_view_test.cc / layers_test.cc /
translator_test.cc are still required because those files explicitly
test the CT2_WITH_CUDA preprocessor define rather than going through
the device-count API.
@T0nd3
Copy link
Copy Markdown
Author

T0nd3 commented May 12, 2026

You're right, thanks for the careful read. I removed both test_rocm.py and the require_rocm marker in bbc3d15f.

The C++ side is unchanged — those guard changes are still needed because the test files check CT2_WITH_CUDA at preprocess time rather than going through the device-count API at runtime, so HIP builds were silently skipping the whole OpDeviceTest, PrimitiveTest, etc. instantiations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants