windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built#3
Merged
Conversation
The windows-rocm CI job was passing CMAKE_HIP_COMPILER, HIP_PATH, and
GPU_TARGETS but never setting -DGGML_HIP=ON, so ggml's CMakeLists never
added the ggml-hip subdirectory. CMake silently warned:
Manually-specified variables were not used by the project:
CMAKE_HIP_COMPILER
GPU_TARGETS
HIP_PATH
The build completed, produced a zip labeled rocm, but contained zero
HIP backend DLLs - runtime callers fell back to CPU.
Fix:
- Add -DGGML_HIP=ON so the ggml-hip target actually configures.
- Also pass -DAMDGPU_TARGETS in addition to -DGPU_TARGETS (the former
is the canonical name on Windows clang/hipcc; both are accepted).
- Drop the duplicate -DCMAKE_BUILD_TYPE=Release.
- Add a post-build "Verify HIP backend was built" step that lists
build/bin and fails the job if ggml-hip*.dll is absent, so the
silent CPU-only fallback can never reach release artifacts again.
superm1
reviewed
May 15, 2026
| -DHIP_PATH="${env:HIP_PATH}" ` | ||
| -DCMAKE_BUILD_TYPE=Release ` | ||
| -DGGML_HIP_ROCWMMA_FATTN=ON ` | ||
| -DAMDGPU_TARGETS="${{ matrix.gpu_targets }}" ` |
Member
There was a problem hiding this comment.
I don't think this line is necessary. That syntax is deprecated. See below:
-DGPU_TARGETS="${{ matrix.gpu_targets }}"
superm1
reviewed
May 15, 2026
| -DCMAKE_CXX_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang++.exe" ` | ||
| -DCMAKE_HIP_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang.exe" ` | ||
| -DHIP_PATH="${env:HIP_PATH}" ` | ||
| -DCMAKE_BUILD_TYPE=Release ` |
Member
Author
There was a problem hiding this comment.
The build is still Release — -DCMAKE_BUILD_TYPE=Release is still set at line 234 just above (the original had it twice; I deduped). And cmake --build . --config Release --parallel ... on the next line reinforces it for multi-config generators. So no behavior change here, just dropping the duplicate.
The previous fix added -DGGML_HIP=ON and produced a working ggml-hip.dll,
but at runtime llama.cpp failed model load with:
llama_model_load: error loading model: make_cpu_buft_list: no CPU backend found
srv main: exiting due to model loading error
GGML needs a CPU backend even when the model runs on GPU - for non-GPU
buffer types like host scratch buffers. The Windows job sets -DGGML_CPU=OFF
and previously emitted per-microarch CPU plugin DLLs (ggml-cpu-zen4.dll etc.)
via GGML_CPU_ALL_VARIANTS, but that variable defaults to OFF and was never
explicitly set. The prior CPU-only labeled "rocm" zip happened to ship those
plugins under different cmake conditions; once GGML_HIP=ON was added and the
HIP build path actually configured, the CPU plugin path stopped being taken.
Add -DGGML_CPU_ALL_VARIANTS=ON to bring the per-microarch CPU plugins back,
matching what the older "rocm-stable" zip contained and what llama.cpp needs
at load time.
The prior commit added -DGGML_CPU_ALL_VARIANTS=ON but kept -DGGML_CPU=OFF,
which fails configure with:
CMake Error at ggml/src/CMakeLists.txt:349 (ggml_add_cpu_backend_variant_impl):
Unknown CMake command "ggml_add_cpu_backend_variant_impl".
The function lives in ggml/src/ggml-cpu/CMakeLists.txt. That subdirectory is
only added by ggml_add_backend(CPU), which is gated on GGML_CPU truthy. With
GGML_CPU=OFF the subdir is never loaded so the function is undefined when
GGML_CPU_ALL_VARIANTS tries to call it.
Flip to GGML_CPU=ON; the GGML_CPU_ALL_VARIANTS branch takes precedence over
the single-variant elseif (GGML_CPU) at ggml/src/CMakeLists.txt:377, so we
still get the per-microarch ggml-cpu-*.dll plugins (matching the historical
artifact layout) rather than a single ggml-cpu.dll.
…uants
The CPU per-microarch variants (now enabled by GGML_CPU_ALL_VARIANTS=ON)
fail to compile with TheRock 7.13's clang because upstream llama.cpp's
ggml/src/ggml-cpu/arch/x86/quants.c calls _mm_prefetch with a
const block_q4_0 * / const block_q8_0 * pointer instead of const char *.
Clang treats this as a hard error by default:
quants.c:782:22: error: incompatible pointer types passing
'const block_q4_0 *' to parameter of type 'const char *'
[-Wincompatible-pointer-types]
The cast is harmless at runtime - _mm_prefetch just consumes the address -
so demote it back to a warning via -Wno-error=incompatible-pointer-types
in CMAKE_C_FLAGS. This unblocks the CPU SSE42 / haswell / etc. variants.
Per PR review: GPU_TARGETS is the legacy name and AMDGPU_TARGETS is the canonical one consumed by both HIP's CMake module and ggml's HIP backend (ggml/src/ggml-hip/CMakeLists.txt:30 forwards AMDGPU_TARGETS to CMAKE_HIP_ARCHITECTURES). The earlier defensive belt-and-suspenders "set both" wasn't needed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
windows-rocmjob inrelease.ymlwas producing zips labeledrocmthat contained no HIP backend DLL — runtime callers (e.g. lemonade withrocm-stable) silently fell back to CPU.CMAKE_HIP_COMPILER,HIP_PATH, andGPU_TARGETSbut never set-DGGML_HIP=ON, so ggml's CMakeLists never added theggml-hipsubdirectory. CMake even printed a warning that the three HIP variables were "not used by the project" — buried in 8k lines of build output, nothing failed.-DGGML_HIP=ON, pass-DAMDGPU_TARGETS(canonical name on Windows) alongside the existing-DGPU_TARGETS, drop a duplicate-DCMAKE_BUILD_TYPE=Release, and add a post-buildVerify HIP backend was builtstep that fails fast ifggml-hip*.dllis missing frombuild/bin/so this can't regress unnoticed.The Ubuntu rocm job already has
-DGGML_HIP=ON(line 96 pre-patch) and produces working artifacts; this just brings the Windows job in line.Test plan
Releaseworkflow on this branch viaworkflow_dispatchwithcreate_release=false.Manually-specified variables were not used by the project: CMAKE_HIP_COMPILER / GPU_TARGETS / HIP_PATHwarning.Verify HIP backend was builtstep lists aggml-hip*.dllinbuild/bin.llama-bin-win-rocm-7.13-x64.zipartifact and verifydumpbin /dependents llama-server.exe(or one of the ggml DLLs) referencesamdhip64.dll/rocblas.dll/hipblaslt.dll.llamacpp.rocm_binconfig at the unpackedllama-server.exeand load a model withllamacpp_backend=rocm; verify llama-server'sdevice_info:enumerates the AMD GPU (not just CPU).