tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests by T0nd3 · Pull Request #2042 · OpenNMT/CTranslate2

T0nd3 · 2026-05-11T13:35:49Z

Problem

All C++ GPU test instantiations (OpDeviceTest, PrimitiveTest,
StorageViewDeviceTest, LayerDeviceFPTest, BiasedDecodingDeviceFPTest)
were guarded with #ifdef CT2_WITH_CUDA. HIP builds define CT2_USE_HIP
instead, so every GPU test case was silently skipped when running
ctranslate2_test against an AMD GPU.

Changes

C++ (`tests/`)

Changed the preprocessor guard in five test files from:

#ifdef CT2_WITH_CUDA

to:

#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)

Files changed:

tests/ops_test.cc
tests/primitives_test.cc
tests/storage_view_test.cc
tests/layers_test.cc
tests/translator_test.cc

HIP builds use the same Device::CUDA enum and the same code path for GPU
execution, so no test logic needs to change — only the guard.

Python (`python/tests/`)

test_utils.py: Added require_rocm marker (backed by
get_cuda_device_count() — HIP exposes devices through the same API).
test_rocm.py (new): ROCm-specific test suite covering device
detection, supported compute types (float32, float16, bfloat16,
int8), and StorageView allocation/copy on a ROCm device.

Testing

Verified on AMD Radeon RX 7900 XTX (gfx1100, RDNA 3) with
CTranslate2 built with -DWITH_HIP=ON on Windows 11.

Relates to the ROCm/HIP support added in v4.7.0.

… tests All C++ GPU test instantiations were guarded with `#ifdef CT2_WITH_CUDA`, which excluded them from HIP builds even though HIP uses the same Device::CUDA enum and code path. Change each guard to `#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` so the tests run on both backends. Add a `require_rocm` pytest marker to `test_utils.py` (currently backed by `get_cuda_device_count()` since HIP exposes devices via the same API) and a new `python/tests/test_rocm.py` with ROCm-specific device detection, compute type, and StorageView tests.

Use lowercase enum values (DataType.float32, Device.cuda) and to_device() instead of to() for device transfers, matching the actual ctranslate2 Python API.

jordimas · 2026-05-12T16:18:44Z

Hello,

test_storageview_cuda_to_device and test_storageview_cuda already cover what the ROCm tests do. Since require_rocm and require_cuda have identical conditions (both check get_cuda_device_count() == 0), those two existing tests already run on ROCm — no changes needed.

The only new thing in test_rocm.py is test_float16_on_gpu. That could just be added to test_storage_view.py with @test_utils.require_cuda.

@jordimas

…iew feedback @jordimas pointed out (OpenNMT#2042 (comment)) that the Python side of this PR was redundant: `require_rocm` had the exact same body as `require_cuda` (both `get_cuda_device_count() == 0`, which already returns the count of ROCm/HIP devices on HIP builds), and `test_storageview_cuda` + `test_storageview_cuda_to_device` already cover the GPU StorageView allocation / round-trip / dtype paths that test_rocm.py would have exercised. Removing both files. The C++ side of this PR is unchanged — the `#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` guard changes in ops_test.cc / primitives_test.cc / storage_view_test.cc / layers_test.cc / translator_test.cc are still required because those files explicitly test the CT2_WITH_CUDA preprocessor define rather than going through the device-count API.

T0nd3 · 2026-05-12T17:28:18Z

You're right, thanks for the careful read. I removed both test_rocm.py and the require_rocm marker in bbc3d15f.

The C++ side is unchanged — those guard changes are still needed because the test files check CT2_WITH_CUDA at preprocess time rather than going through the device-count API at runtime, so HIP builds were silently skipping the whole OpDeviceTest, PrimitiveTest, etc. instantiations.

T0nd3 added 4 commits May 11, 2026 15:34

tests: fix StorageView and DataType API usage in test_rocm.py

b579505

Use lowercase enum values (DataType.float32, Device.cuda) and to_device() instead of to() for device transfers, matching the actual ctranslate2 Python API.

style: fix black formatting in test_utils.py (trailing comma in skipif)

d900007

style: fix isort import order in test_rocm.py

f9408ef

T0nd3 mentioned this pull request May 12, 2026

ops: native HIP Flash Attention kernels for AMD RDNA3 (gfx11) #2043

Open

6 tasks

Merge branch 'master' into tests/rocm-hip-support

cf92c90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042

tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042
T0nd3 wants to merge 6 commits into
OpenNMT:masterfrom
T0nd3:tests/rocm-hip-support

T0nd3 commented May 11, 2026

Uh oh!

jordimas commented May 12, 2026 •

edited

Loading

Uh oh!

T0nd3 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

T0nd3 commented May 11, 2026

Problem

Changes

C++ (tests/)

Python (python/tests/)

Testing

Uh oh!

jordimas commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

T0nd3 commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

C++ (`tests/`)

Python (`python/tests/`)

jordimas commented May 12, 2026 •

edited

Loading