Skip to content

windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built#3

Merged
jeremyfowers merged 5 commits into
lemonadefrom
fix-windows-rocm-hip
May 16, 2026
Merged

windows-rocm: enable GGML_HIP so ggml-hip.dll actually gets built#3
jeremyfowers merged 5 commits into
lemonadefrom
fix-windows-rocm-hip

Conversation

@jeremyfowers
Copy link
Copy Markdown
Member

@jeremyfowers jeremyfowers commented May 15, 2026

Summary

  • The windows-rocm job in release.yml was producing zips labeled rocm that contained no HIP backend DLL — runtime callers (e.g. lemonade with rocm-stable) silently fell back to CPU.
  • Root cause: the cmake invocation passed CMAKE_HIP_COMPILER, HIP_PATH, and GPU_TARGETS but never set -DGGML_HIP=ON, so ggml's CMakeLists never added the ggml-hip subdirectory. CMake even printed a warning that the three HIP variables were "not used by the project" — buried in 8k lines of build output, nothing failed.
  • Fix: add -DGGML_HIP=ON, pass -DAMDGPU_TARGETS (canonical name on Windows) alongside the existing -DGPU_TARGETS, drop a duplicate -DCMAKE_BUILD_TYPE=Release, and add a post-build Verify HIP backend was built step that fails fast if ggml-hip*.dll is missing from build/bin/ so this can't regress unnoticed.

The Ubuntu rocm job already has -DGGML_HIP=ON (line 96 pre-patch) and produces working artifacts; this just brings the Windows job in line.

Test plan

  • Trigger Release workflow on this branch via workflow_dispatch with create_release=false.
  • Confirm CMake configure log no longer prints the Manually-specified variables were not used by the project: CMAKE_HIP_COMPILER / GPU_TARGETS / HIP_PATH warning.
  • Confirm Verify HIP backend was built step lists a ggml-hip*.dll in build/bin.
  • Download the llama-bin-win-rocm-7.13-x64.zip artifact and verify dumpbin /dependents llama-server.exe (or one of the ggml DLLs) references amdhip64.dll / rocblas.dll / hipblaslt.dll.
  • Point lemonade's llamacpp.rocm_bin config at the unpacked llama-server.exe and load a model with llamacpp_backend=rocm; verify llama-server's device_info: enumerates the AMD GPU (not just CPU).

The windows-rocm CI job was passing CMAKE_HIP_COMPILER, HIP_PATH, and
GPU_TARGETS but never setting -DGGML_HIP=ON, so ggml's CMakeLists never
added the ggml-hip subdirectory. CMake silently warned:

    Manually-specified variables were not used by the project:
      CMAKE_HIP_COMPILER
      GPU_TARGETS
      HIP_PATH

The build completed, produced a zip labeled rocm, but contained zero
HIP backend DLLs - runtime callers fell back to CPU.

Fix:
- Add -DGGML_HIP=ON so the ggml-hip target actually configures.
- Also pass -DAMDGPU_TARGETS in addition to -DGPU_TARGETS (the former
  is the canonical name on Windows clang/hipcc; both are accepted).
- Drop the duplicate -DCMAKE_BUILD_TYPE=Release.
- Add a post-build "Verify HIP backend was built" step that lists
  build/bin and fails the job if ggml-hip*.dll is absent, so the
  silent CPU-only fallback can never reach release artifacts again.
@jeremyfowers jeremyfowers marked this pull request as draft May 15, 2026 19:41
Comment thread .github/workflows/release.yml Outdated
-DHIP_PATH="${env:HIP_PATH}" `
-DCMAKE_BUILD_TYPE=Release `
-DGGML_HIP_ROCWMMA_FATTN=ON `
-DAMDGPU_TARGETS="${{ matrix.gpu_targets }}" `
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line is necessary. That syntax is deprecated. See below:

            -DGPU_TARGETS="${{ matrix.gpu_targets }}"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

-DCMAKE_CXX_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang++.exe" `
-DCMAKE_HIP_COMPILER="${env:HIP_PATH}\lib\llvm\bin\clang.exe" `
-DHIP_PATH="${env:HIP_PATH}" `
-DCMAKE_BUILD_TYPE=Release `
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want a release build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is still Release — -DCMAKE_BUILD_TYPE=Release is still set at line 234 just above (the original had it twice; I deduped). And cmake --build . --config Release --parallel ... on the next line reinforces it for multi-config generators. So no behavior change here, just dropping the duplicate.

The previous fix added -DGGML_HIP=ON and produced a working ggml-hip.dll,
but at runtime llama.cpp failed model load with:

    llama_model_load: error loading model: make_cpu_buft_list: no CPU backend found
    srv main: exiting due to model loading error

GGML needs a CPU backend even when the model runs on GPU - for non-GPU
buffer types like host scratch buffers. The Windows job sets -DGGML_CPU=OFF
and previously emitted per-microarch CPU plugin DLLs (ggml-cpu-zen4.dll etc.)
via GGML_CPU_ALL_VARIANTS, but that variable defaults to OFF and was never
explicitly set. The prior CPU-only labeled "rocm" zip happened to ship those
plugins under different cmake conditions; once GGML_HIP=ON was added and the
HIP build path actually configured, the CPU plugin path stopped being taken.

Add -DGGML_CPU_ALL_VARIANTS=ON to bring the per-microarch CPU plugins back,
matching what the older "rocm-stable" zip contained and what llama.cpp needs
at load time.
The prior commit added -DGGML_CPU_ALL_VARIANTS=ON but kept -DGGML_CPU=OFF,
which fails configure with:

    CMake Error at ggml/src/CMakeLists.txt:349 (ggml_add_cpu_backend_variant_impl):
      Unknown CMake command "ggml_add_cpu_backend_variant_impl".

The function lives in ggml/src/ggml-cpu/CMakeLists.txt. That subdirectory is
only added by ggml_add_backend(CPU), which is gated on GGML_CPU truthy. With
GGML_CPU=OFF the subdir is never loaded so the function is undefined when
GGML_CPU_ALL_VARIANTS tries to call it.

Flip to GGML_CPU=ON; the GGML_CPU_ALL_VARIANTS branch takes precedence over
the single-variant elseif (GGML_CPU) at ggml/src/CMakeLists.txt:377, so we
still get the per-microarch ggml-cpu-*.dll plugins (matching the historical
artifact layout) rather than a single ggml-cpu.dll.
…uants

The CPU per-microarch variants (now enabled by GGML_CPU_ALL_VARIANTS=ON)
fail to compile with TheRock 7.13's clang because upstream llama.cpp's
ggml/src/ggml-cpu/arch/x86/quants.c calls _mm_prefetch with a
const block_q4_0 * / const block_q8_0 * pointer instead of const char *.
Clang treats this as a hard error by default:

    quants.c:782:22: error: incompatible pointer types passing
        'const block_q4_0 *' to parameter of type 'const char *'
        [-Wincompatible-pointer-types]

The cast is harmless at runtime - _mm_prefetch just consumes the address -
so demote it back to a warning via -Wno-error=incompatible-pointer-types
in CMAKE_C_FLAGS. This unblocks the CPU SSE42 / haswell / etc. variants.
Per PR review: GPU_TARGETS is the legacy name and AMDGPU_TARGETS is the
canonical one consumed by both HIP's CMake module and ggml's HIP
backend (ggml/src/ggml-hip/CMakeLists.txt:30 forwards AMDGPU_TARGETS to
CMAKE_HIP_ARCHITECTURES). The earlier defensive belt-and-suspenders
"set both" wasn't needed.
@jeremyfowers jeremyfowers self-assigned this May 16, 2026
@jeremyfowers jeremyfowers marked this pull request as ready for review May 16, 2026 04:22
@jeremyfowers jeremyfowers merged commit d21d0ab into lemonade May 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants