tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042
tests: enable GPU test suites for HIP builds and add ROCm-specific Python tests#2042T0nd3 wants to merge 6 commits into
Conversation
… tests All C++ GPU test instantiations were guarded with `#ifdef CT2_WITH_CUDA`, which excluded them from HIP builds even though HIP uses the same Device::CUDA enum and code path. Change each guard to `#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` so the tests run on both backends. Add a `require_rocm` pytest marker to `test_utils.py` (currently backed by `get_cuda_device_count()` since HIP exposes devices via the same API) and a new `python/tests/test_rocm.py` with ROCm-specific device detection, compute type, and StorageView tests.
Use lowercase enum values (DataType.float32, Device.cuda) and to_device() instead of to() for device transfers, matching the actual ctranslate2 Python API.
|
Hello, test_storageview_cuda_to_device and test_storageview_cuda already cover what the ROCm tests do. Since require_rocm and require_cuda have identical conditions (both check get_cuda_device_count() == 0), those two existing tests already run on ROCm — no changes needed. The only new thing in test_rocm.py is test_float16_on_gpu. That could just be added to test_storage_view.py with @test_utils.require_cuda. |
…iew feedback @jordimas pointed out (OpenNMT#2042 (comment)) that the Python side of this PR was redundant: `require_rocm` had the exact same body as `require_cuda` (both `get_cuda_device_count() == 0`, which already returns the count of ROCm/HIP devices on HIP builds), and `test_storageview_cuda` + `test_storageview_cuda_to_device` already cover the GPU StorageView allocation / round-trip / dtype paths that test_rocm.py would have exercised. Removing both files. The C++ side of this PR is unchanged — the `#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)` guard changes in ops_test.cc / primitives_test.cc / storage_view_test.cc / layers_test.cc / translator_test.cc are still required because those files explicitly test the CT2_WITH_CUDA preprocessor define rather than going through the device-count API.
|
You're right, thanks for the careful read. I removed both The C++ side is unchanged — those guard changes are still needed because the test files check |
Problem
All C++ GPU test instantiations (
OpDeviceTest,PrimitiveTest,StorageViewDeviceTest,LayerDeviceFPTest,BiasedDecodingDeviceFPTest)were guarded with
#ifdef CT2_WITH_CUDA. HIP builds defineCT2_USE_HIPinstead, so every GPU test case was silently skipped when running
ctranslate2_testagainst an AMD GPU.Changes
C++ (
tests/)Changed the preprocessor guard in five test files from:
#ifdef CT2_WITH_CUDAto:
#if defined(CT2_WITH_CUDA) || defined(CT2_USE_HIP)Files changed:
tests/ops_test.cctests/primitives_test.cctests/storage_view_test.cctests/layers_test.cctests/translator_test.ccHIP builds use the same
Device::CUDAenum and the same code path for GPUexecution, so no test logic needs to change — only the guard.
Python (
python/tests/)test_utils.py: Addedrequire_rocmmarker (backed byget_cuda_device_count()— HIP exposes devices through the same API).test_rocm.py(new): ROCm-specific test suite covering devicedetection, supported compute types (
float32,float16,bfloat16,int8), andStorageViewallocation/copy on a ROCm device.Testing
Verified on AMD Radeon RX 7900 XTX (gfx1100, RDNA 3) with
CTranslate2 built with
-DWITH_HIP=ONon Windows 11.Relates to the ROCm/HIP support added in v4.7.0.