Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656
Merged
Conversation
jenatali
reviewed
Jun 2, 2026
When the implementation selects a subgroup size < 8 for the test's
workgroup, the split predicate `id < subgroupSize / 2` bisects the
only quad in the subgroup, leaving no fully active quad — which is
undefined behavior for quad operations. This is observed on WARP
(which selects its native D3D12 wave size: 4 on arm64 NEON, often 4
on x86 for small workgroups) and may occur on any implementation
that picks a small native subgroup size at runtime.
Two coordinated guards:
* In the shader, the quad call is wrapped in
`if subgroupSize >= 8u { ... }` so it never executes when the
split predicate would be unsafe.
* In the JS checker, the actual subgroupSize is read out of
metadata.subgroup_size[0] and the test is skipped with t.skip
when it is < 8, so the missing output doesn't get flagged as a
failure.
Querying GPUAdapterInfo.subgroupMinSize would not be sufficient: the
size the implementation actually selects depends on the shader (its
workgroup size, register pressure, etc.), not just the adapter's
minimum supported size. Reading subgroupSize from inside the test
shader itself is the only reliable signal.
Contributor
Author
|
I changed to use the reported subgroup size instead. Below are the test results on various situations.
All three report adapter.subgroupMinSize=4 (WARP) or 32 (NVIDIA), but the runtime size for the same shader differs across: |
jenatali
approved these changes
Jun 4, 2026
alan-baker
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The compute,split tests use the predicate
id < subgroupSize / 2, which deactivates the upper half of each subgroup. When subgroupSize is 8 or greater the boundary lands on a 4-lane multiple, so every quad stays fully active and the test exercises legitimate predicated quad operations. When subgroupSize is 4, the only quad gets bisected, leaving no fully active quad — calling quadBroadcast/quadSwap there is undefined behavior and the result is not meaningful to validate.GPUAdapterInfo.subgroupMinSize is not a sufficient pre-check: the size the implementation actually selects depends on the compiled shader (workgroup shape, register pressure, vendor heuristics), not just the adapter's minimum. Two implementations from the same vendor can report identical {min,max} and pick different runtime sizes for the same shader. Reading @Builtin(subgroup_size) from inside the test shader is the only reliable signal.
Issue: #4650
Requirements for PR author:
All missing test coverage is tracked with "TODO" or.unimplemented().New helpers are/** documented */and new helper files are found inhelper_index.txt.Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)Requirements for reviewer sign-off:
When landing this PR, be sure to make any necessary issue status updates.