windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built by jeremyfowers · Pull Request #3 · lemonade-sdk/llama.cpp

jeremyfowers · 2026-05-15T19:40:08Z

Summary

The windows-rocm job in release.yml was producing zips labeled rocm that contained no HIP backend DLL — runtime callers (e.g. lemonade with rocm-stable) silently fell back to CPU.
Root cause: the cmake invocation passed CMAKE_HIP_COMPILER, HIP_PATH, and GPU_TARGETS but never set -DGGML_HIP=ON, so ggml's CMakeLists never added the ggml-hip subdirectory. CMake even printed a warning that the three HIP variables were "not used by the project" — buried in 8k lines of build output, nothing failed.
Fix: add -DGGML_HIP=ON, pass -DAMDGPU_TARGETS (canonical name on Windows) alongside the existing -DGPU_TARGETS, drop a duplicate -DCMAKE_BUILD_TYPE=Release, and add a post-build Verify HIP backend was built step that fails fast if ggml-hip*.dll is missing from build/bin/ so this can't regress unnoticed.

The Ubuntu rocm job already has -DGGML_HIP=ON (line 96 pre-patch) and produces working artifacts; this just brings the Windows job in line.

Test plan

Trigger Release workflow on this branch via workflow_dispatch with create_release=false.
Confirm CMake configure log no longer prints the Manually-specified variables were not used by the project: CMAKE_HIP_COMPILER / GPU_TARGETS / HIP_PATH warning.
Confirm Verify HIP backend was built step lists a ggml-hip*.dll in build/bin.
Download the llama-bin-win-rocm-7.13-x64.zip artifact and verify dumpbin /dependents llama-server.exe (or one of the ggml DLLs) references amdhip64.dll / rocblas.dll / hipblaslt.dll.
Point lemonade's llamacpp.rocm_bin config at the unpacked llama-server.exe and load a model with llamacpp_backend=rocm; verify llama-server's device_info: enumerates the AMD GPU (not just CPU).

The windows-rocm CI job was passing CMAKE_HIP_COMPILER, HIP_PATH, and GPU_TARGETS but never setting -DGGML_HIP=ON, so ggml's CMakeLists never added the ggml-hip subdirectory. CMake silently warned: Manually-specified variables were not used by the project: CMAKE_HIP_COMPILER GPU_TARGETS HIP_PATH The build completed, produced a zip labeled rocm, but contained zero HIP backend DLLs - runtime callers fell back to CPU. Fix: - Add -DGGML_HIP=ON so the ggml-hip target actually configures. - Also pass -DAMDGPU_TARGETS in addition to -DGPU_TARGETS (the former is the canonical name on Windows clang/hipcc; both are accepted). - Drop the duplicate -DCMAKE_BUILD_TYPE=Release. - Add a post-build "Verify HIP backend was built" step that lists build/bin and fails the job if ggml-hip*.dll is absent, so the silent CPU-only fallback can never reach release artifacts again.

superm1 · 2026-05-15T20:03:54Z

            -DHIP_PATH="${env:HIP_PATH}" `
-            -DCMAKE_BUILD_TYPE=Release `
            -DGGML_HIP_ROCWMMA_FATTN=ON `
+            -DAMDGPU_TARGETS="${{ matrix.gpu_targets }}" `


I don't think this line is necessary. That syntax is deprecated. See below:

-DGPU_TARGETS="${{ matrix.gpu_targets }}"

superm1 · 2026-05-15T20:04:12Z

            -DCMAKE_CXX_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang++.exe" `
            -DCMAKE_HIP_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang.exe" `
            -DHIP_PATH="${env:HIP_PATH}" `
-            -DCMAKE_BUILD_TYPE=Release `


we don't want a release build?

The build is still Release — -DCMAKE_BUILD_TYPE=Release is still set at line 234 just above (the original had it twice; I deduped). And cmake --build . --config Release --parallel ... on the next line reinforces it for multi-config generators. So no behavior change here, just dropping the duplicate.

The previous fix added -DGGML_HIP=ON and produced a working ggml-hip.dll, but at runtime llama.cpp failed model load with: llama_model_load: error loading model: make_cpu_buft_list: no CPU backend found srv main: exiting due to model loading error GGML needs a CPU backend even when the model runs on GPU - for non-GPU buffer types like host scratch buffers. The Windows job sets -DGGML_CPU=OFF and previously emitted per-microarch CPU plugin DLLs (ggml-cpu-zen4.dll etc.) via GGML_CPU_ALL_VARIANTS, but that variable defaults to OFF and was never explicitly set. The prior CPU-only labeled "rocm" zip happened to ship those plugins under different cmake conditions; once GGML_HIP=ON was added and the HIP build path actually configured, the CPU plugin path stopped being taken. Add -DGGML_CPU_ALL_VARIANTS=ON to bring the per-microarch CPU plugins back, matching what the older "rocm-stable" zip contained and what llama.cpp needs at load time.

The prior commit added -DGGML_CPU_ALL_VARIANTS=ON but kept -DGGML_CPU=OFF, which fails configure with: CMake Error at ggml/src/CMakeLists.txt:349 (ggml_add_cpu_backend_variant_impl): Unknown CMake command "ggml_add_cpu_backend_variant_impl". The function lives in ggml/src/ggml-cpu/CMakeLists.txt. That subdirectory is only added by ggml_add_backend(CPU), which is gated on GGML_CPU truthy. With GGML_CPU=OFF the subdir is never loaded so the function is undefined when GGML_CPU_ALL_VARIANTS tries to call it. Flip to GGML_CPU=ON; the GGML_CPU_ALL_VARIANTS branch takes precedence over the single-variant elseif (GGML_CPU) at ggml/src/CMakeLists.txt:377, so we still get the per-microarch ggml-cpu-*.dll plugins (matching the historical artifact layout) rather than a single ggml-cpu.dll.

…uants The CPU per-microarch variants (now enabled by GGML_CPU_ALL_VARIANTS=ON) fail to compile with TheRock 7.13's clang because upstream llama.cpp's ggml/src/ggml-cpu/arch/x86/quants.c calls _mm_prefetch with a const block_q4_0 * / const block_q8_0 * pointer instead of const char *. Clang treats this as a hard error by default: quants.c:782:22: error: incompatible pointer types passing 'const block_q4_0 *' to parameter of type 'const char *' [-Wincompatible-pointer-types] The cast is harmless at runtime - _mm_prefetch just consumes the address - so demote it back to a warning via -Wno-error=incompatible-pointer-types in CMAKE_C_FLAGS. This unblocks the CPU SSE42 / haswell / etc. variants.

Per PR review: GPU_TARGETS is the legacy name and AMDGPU_TARGETS is the canonical one consumed by both HIP's CMake module and ggml's HIP backend (ggml/src/ggml-hip/CMakeLists.txt:30 forwards AMDGPU_TARGETS to CMAKE_HIP_ARCHITECTURES). The earlier defensive belt-and-suspenders "set both" wasn't needed.

jeremyfowers marked this pull request as draft May 15, 2026 19:41

superm1 reviewed May 15, 2026

View reviewed changes

jeremyfowers added 4 commits May 15, 2026 17:36

jeremyfowers self-assigned this May 16, 2026

jeremyfowers marked this pull request as ready for review May 16, 2026 04:22

jeremyfowers merged commit d21d0ab into lemonade May 16, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built#3

windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built#3
jeremyfowers merged 5 commits into
lemonadefrom
fix-windows-rocm-hip

jeremyfowers commented May 15, 2026 •

edited

Loading

Uh oh!

superm1 May 15, 2026

Uh oh!

jeremyfowers May 16, 2026

Uh oh!

superm1 May 15, 2026

Uh oh!

jeremyfowers May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeremyfowers commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

superm1 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

jeremyfowers May 16, 2026

Choose a reason for hiding this comment

Uh oh!

superm1 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

jeremyfowers May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremyfowers commented May 15, 2026 •

edited

Loading