Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ceefbe8
Add array_type parameter and GPU extensions
hakkelt Apr 3, 2026
b6c3cff
Refactor test infrastructure and add GPU tests
hakkelt Apr 3, 2026
15cce9d
Refactor GPU tests to use GPUEnv and support multiple backends
hakkelt Apr 21, 2026
61c8d43
Refactor tests to use domain_array_type and codomain_array_type
hakkelt Apr 21, 2026
50fa0ce
Enhance GPU support documentation across various operators and clarif…
hakkelt Apr 21, 2026
8f4cbbd
Fix OperatorWrapper construction by removing unnecessary dimensions i…
hakkelt Apr 21, 2026
054140d
Fix CI testing for Julia 1.10
hakkelt Apr 21, 2026
3438692
skip persistent_tasks Aqua tests on Julia 1.10 & fix benchmarking and…
hakkelt Apr 21, 2026
394e0f4
rename *_storage_type to *_array_type in documentation and AI agent f…
hakkelt Apr 21, 2026
75a5ca7
fix documentation
hakkelt Apr 27, 2026
0e2f164
fix benchmarking CI action
hakkelt Apr 27, 2026
28b55e5
remove stale compat entry for AcceleratedDCTs
hakkelt Apr 28, 2026
ae50823
try zsoerenm/AirspeedVelocity.jl fork for benchmarking CI action
hakkelt Apr 28, 2026
723b354
fix commit on zsoerenm/AirspeedVelocity.jl in benchmarking CI action
hakkelt Apr 28, 2026
9e1590e
fix benchmark CI job
hakkelt Apr 28, 2026
77ecefa
replace AirspeedVelocity with a custom solution
hakkelt May 19, 2026
7520d93
add summary line to benchmark results post
hakkelt May 19, 2026
9d63d46
Merge branch 'custom-benchmarking' into gpu
hakkelt May 19, 2026
d8fbd76
fix regresssions
hakkelt May 19, 2026
3af156a
perf(Xcorr): use tiled 8-wide FIR for CPU adjoint
hakkelt May 19, 2026
402dd25
refactor(Xcorr): extract adj FFT state into XcorrAdjFFT; skip for CPU
hakkelt May 19, 2026
54ce6e2
fix Xcorr: documenter and type issues
hakkelt May 20, 2026
2701027
Merge branch 'master' into gpu
hakkelt May 29, 2026
42cfa93
Revert the unintended change in compare.jl
hakkelt May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions .github/agents/julia.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
description: "Use when improving Julia code quality with very long test suites, slow CI, TestItemRunner tagging/filtering, iterative fix-and-rerun loops, or flaky tests. Keywords: Julia, TestItemRunner, @testitem, tags, filter, long-running Julia process, code quality, assertions, source fixes."
name: "Julia Long-Test Quality"
tools: [read, search, edit, execute, todo]
user-invocable: true
---
You are a specialist for improving Julia code quality in repositories with long-running test suites.

## Mission
- Make tests reliable and informative without weakening test intent.
- Use TestItemRunner capabilities to speed iteration and triage by tags and filters.
- Iterate until the targeted test scope passes, then validate broader scopes.

## Hard Constraints
- Never remove assertions to make tests pass.
- If a failure reflects a real implementation bug, fix source code instead of loosening tests.
- Preserve operator names in tags exactly as implemented (CamelCase, no renamed variants).
- Keep changes minimal and localized; avoid unrelated refactors.

## Repository-Specific Engineering Rules
- Respect package structure and boundaries:
- `src/linearoperators/` for concrete linear operators.
- `src/nonlinearoperators/` for nonlinear operators.
- `src/calculus/` for operator calculus/composition.
- `src/batching/` for batch operators.
- For new or changed operators, ensure implementation completeness:
- Struct with concrete, inference-friendly field types.
- Constructors for dimension tuple and/or data-driven construction.
- Forward path `mul!(y, op, x)` and adjoint path dispatch via `AdjointOperator`.
- Trait and property behavior remains consistent (`is_linear`, `is_diagonal`, rank/invertibility traits).
- Storage traits stay valid (`domain_array_type`, `codomain_array_type`) for CPU/GPU paths.
- Prefer `copy_operator(op; array_type=nothing, threaded=nothing)` behavior when changing copy semantics:
- Deep-copy mutable working buffers only.
- Share immutable and read-only references.
- Keep test files standalone-capable and aligned with TestItems setup modules.
- Preserve quality gates: JET, Aqua, and doctests should remain passing together.
- Use Runic formatting checks when editing Julia source or tests.

## Julia Performance Playbook
- Put performance-critical code in functions, not top-level scope.
- Avoid untyped globals in hot paths; use function arguments and `const` globals where appropriate.
- Prefer concrete field/container types; avoid abstract fields like `Function`, `AbstractArray`, or `Integer` in performance-sensitive structs.
- Maintain type stability:
- Avoid variable type changes within loops.
- Use `zero(x)`, `oneunit(T)`, and stable return types.
- Use function barriers for setup-vs-kernel separation.
- Measure, do not guess:
- Use `BenchmarkTools` for benchmarks.
- Track allocations (`@time`, `@allocated`) and treat unexpected allocations as defects to investigate.
- Use `@code_warntype` and JET to diagnose inference issues.
- Minimize allocations in inner loops:
- Preallocate outputs and favor `mul!`/in-place APIs.
- Use broadcast fusion (`@.` / dotted ops) when beneficial.
- Unfuse broadcasts when repeated subexpressions are recomputed unnecessarily.
- Use `@views` for slicing when copy cost dominates.
- Iterate arrays in memory-friendly order (column-major access patterns).
- For threaded Julia code that also calls BLAS, avoid oversubscription (often `OPENBLAS_NUM_THREADS=1` is best with multithreaded Julia; validate on workload).
- Use performance annotations (`@inbounds`, `@simd`, `@fastmath`) only when correctness assumptions are explicitly validated.

## Test Architecture Rules
- Prefer `@testitem` with explicit `tags` and optional `setup` modules.
- Use tags that encode both operator and test type.
- Mixed tests may include multiple operator tags when behavior genuinely spans operators.
- Test type tags should come from: `:linearoperator`, `:nonlinearoperator`, `:batching`, `:calculus`, `:jet`, `:quality`, `:misc`.
- Operator tags should use exact CamelCase names, for example: `:MatrixOp`, `:FiniteDiff`, `:Compose`, `:SpreadingBatchOp`.
- Use `@run_package_tests filter=ti->...` to run focused slices.
- For grouped runs, prefer strict type-tag exclusion filters (for example, `ti -> !(:jet in ti.tags)`).

## JET.jl Requirements
- Treat JET coverage as mandatory for all public API.
- Ensure JET test coverage includes all three modes:
- `JET.test_package(...)` for package-level inference/type diagnostics on exported/public API paths.
- `@test_opt ...` for representative public operations and constructors.
- `@test_call ...` for key public call signatures and runtime-like call paths.
- Do not accept partial JET migration: missing any of the three test modes is incomplete.
- When adding or changing public API, update JET tests in the same change.

## Fast Iteration Workflow
1. Start one long-running Julia REPL in the package test environment.
2. Load TestItemRunner once.
3. Run filtered test slices repeatedly (by operator/type tags).
4. Fix failures immediately; rerun the same filtered slice until green.
5. Expand to adjacent slices, then run full suite.
6. Capture outputs from each run into `.temp/` files for traceability.

Recommended REPL pattern:
```julia
using TestItemRunner
run_tests("test"; filter = ti -> (:MatrixOp in ti.tags) && (:linearoperator in ti.tags))
```

Recommended shell pattern for captured logs:
```sh
mkdir -p .temp
julia --project=test test/runtests.jl > .temp/test_runtests.log 2>&1
julia --project=test test/jet/test_package.jl > .temp/test_jet_package.log 2>&1
```

## Failure Triage
1. Read the exact failing assertion and stacktrace first.
2. Classify failure:
- Test setup/import/tagging issue
- Real source bug
- Environment/performance instability
3. For real bugs, patch source and keep/assert expected behavior in tests.
4. For flaky perf tests, stabilize methodology (workload, sampling, thresholds) without dropping coverage.
5. Re-run the smallest relevant filtered subset before broad reruns.

## Output Requirements
- Report what was changed and why.
- List files touched.
- Provide exact filtered test commands used.
- State pass/fail counts for the final run.
- Call out remaining risks or follow-up items.
- IMPORTANT! Store all temporary run outputs only under `.temp/` inside the repository (no temp scripts and logs elsewhere).
- When performance work is included, report allocation deltas and the exact benchmark commands used.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,12 @@ applyTo: "src/**/*.jl"
- size/domain/codomain/storage traits,
- property traits such as linearity, diagonal structure, and rank-related predicates.
- `check` utility function must be called in all effective `mul!` paths to ensure consistent argument validation and error messages.
- Preserve `domain_storage_type` and `codomain_storage_type` semantics and dispatch compatibility.
- Preserve `domain_array_type` and `codomain_array_type` semantics and dispatch compatibility.
- Constructors should expose an `array_type` keyword where storage backend selection is meaningful.
- `domain_array_type`/`codomain_array_type` must remain consistent with constructor-selected storage.
- When storage checks become stricter, fix operator traits and tests instead of relaxing `check`.
- Prefer behavior-preserving refactors: extract helpers, separate setup from kernels, reduce method size, but do not weaken checks.
- If modifying copy semantics, preserve the package convention that immutable/read-only arrays are shared while mutable working buffers are copied deliberately.
- Keep source formatted with Runic-compatible Julia style.
- GPU extensions live under `ext/GpuExt/` (triggered by `GPUArrays`). Override `mul!` there for any operator whose base implementation uses scalar indexing loops (`@nloops`, `@nref`, `@inbounds y[i] = b[j]`); replace with broadcast-over-view (`y .= view(b, idx...)`).
- When overriding a threaded operator (e.g. `Variation{..., true}`) for GPU, delegate to the non-threaded variant (`Variation{..., false}`) — the threading strategy is CPU-only.
2 changes: 2 additions & 0 deletions .github/instructions/julia-performance.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,5 @@ applyTo: "src/**/*.jl,benchmark/**/*.jl"
- benchmark representative workloads,
- inspect allocations,
- use JET and `@code_warntype` for inference issues.
- For benchmark harnesses, derive element types robustly when operator type traits may return wrapped array types.
- Keep benchmark setup deterministic (`Random.seed!(0)`) and validate key benchmark states with one smoke `mul!` path before full runs.
10 changes: 10 additions & 0 deletions .github/instructions/julia-testing-and-jet.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,13 @@ applyTo: "test/**/*.jl,docs/**/*.md"
- Keep Aqua and doctests passing alongside functional tests.
- Never remove assertions to force green tests.
- All temporary test and benchmark outputs must go under `.temp/` only.
- If GPU tests are backend-specific, keep them in separate `@testitem`s and use `:gpu` tag.
- When `VERB` is enabled, print each running testitem name at test-runner filter time.
- For local coverage, mirror CI with `julia --project=test --code-coverage=user test/runtests.jl`, then process `*.cov` / `*.info` artifacts into `lcov.info` if needed.
- Subpackages (DSPOperators, FFTWOperators, NFFTOperators, WaveletOperators) have no standalone `test/` directory; they are tested and their coverage is gathered exclusively through the parent package's `test/` project. Do not attempt a separate subpackage coverage run.
- Extension coverage should be gathered through the parent-package tests that load the relevant trigger packages; do not assume a separate extension-only coverage run exists.
- JET `@test_opt` flags `array_type::Type` (unparameterized keyword) as a source of runtime dispatch. Use `array_type::Type{<:AbstractArray}` and avoid kwarg-to-kwarg forwarding; use a typed positional-arg helper (e.g., `_make_eye(T, dims, S)`) so JET can resolve dispatch statically.
- When Aqua reports "Unexpected Pass" on a `@test_broken`/`broken=true` check, the underlying issue is now fixed — remove the workaround and use `Aqua.test_all(pkg)` unconditionally.
- Agent sub-tasks frequently generate `Eye(T, dims, array_type)` (3 positional args) instead of `Eye(T, dims; array_type=...)` (keyword). Always verify agent output for this pattern.
- Stochastic test assertions `op * randn(n) ≈ other_op * (op * randn(n))` are wrong when the two `randn` calls produce different vectors; always capture into a variable first.
- When testing GPU storage-type propagation, add `@test domain_array_type(op) <: CUDA.CuArray` / `<: AMDGPU.ROCArray` assertions directly in the per-operator CUDA/AMDGPU `@testitem`.
50 changes: 50 additions & 0 deletions .github/skills/julia-gpu-implementation/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
name: julia-gpu-implementation
description: 'Use for GPU operator implementations, GPU extension fixes, backend-specific testitems, and GPU benchmark validation in AbstractOperators.jl.'
argument-hint: 'Describe the operator, GPU backend, or benchmark you want to implement or validate'
user-invocable: true
---

# Julia GPU Implementation

## When To Use

- Implementing or fixing GPU overrides under `ext/GpuExt/`.
- Adding or updating CUDA/AMDGPU testitems.
- Debugging backend-specific dispatch, storage traits, or array conversion issues.
- Extending benchmark coverage for GPU behavior.
- Checking whether a CPU operator should get a GPU path or stay CPU-only.

## Implementation Rules

- Julia package extensions can only `import` the parent package, trigger package(s), and stdlib; if extension code needs a parent dependency API, expose it from the parent module first.
- For FFT plans, prefer `inv(plan)` (AbstractFFTs-generic) over backend-specific `FFTW.plan_inv(...)` to keep CUDA/AMDGPU compatibility.
- With JLArrays/GPUArrays, avoid `copyto!(gpu, cpu_view)` where the source is a `SubArray`; materialize first, for example with `src[1:n]`, or copy from a plain array.
- Preserve backend storage semantics and trait dispatch when adding GPU methods.
- Keep CPU-only implementation details out of GPU overrides unless the backend truly supports them.
- For GPU `GetIndex` overrides, keep boolean-mask and integer-vector fancy indexing in CPU paths unless the backend support is verified.
- When overriding a threaded operator for GPU, delegate to the non-threaded variant; threading strategy is CPU-only.
- Prefer direct `CuArray(arr)` / `CUDA.zeros(...)` / `AMDGPU.ROCArray(arr)` / `AMDGPU.zeros(...)` calls over intermediate conversion variables.
- Benchmark setup code should normalize wrapped domain and codomain type traits to scalar element types before calling `randn` or `zeros`.

## Testing Rules

- For honest GPU coverage, keep JLArray checks separate from real device checks and add backend-specific tags such as `:cuda` and `:amdgpu` plus runtime skip guards.
- In `test/runtests.jl`, filter backend-tagged testitems when the runtime is unavailable, but keep per-test safety checks too.
- Add explicit tests for `domain_array_type` and `codomain_array_type`, and verify that `op * x` allocates on the active backend.
- When adding CUDA/AMDGPU companion tests, prefer direct backend array construction instead of temporary conversion variables.
- For GPU `GetIndex` tests, restrict indices to ranges, colons, and scalar integers; bool-mask and integer-vector `view` forms are not universally supported across GPU backends.
- Migrate GPU-backend storage-type assertions from central quality files into each operator's own CUDA/AMDGPU `@testitem` so they run with the functional tests.
- Use direct `import CUDA` / `import AMDGPU` plus `functional()` guards in testitems; avoid try/catch gating.

## Benchmarking Rules

- Benchmark scripts under `benchmark/` must prefer local workspace package paths over registry-installed copies, otherwise GPU fixes in sibling packages can be silently skipped.
- Use representative large inputs for GPU crossover studies and keep the measurement setup deterministic.
- Capture benchmark logs and generated reports under `.temp/`.

## Tooling Reminders

- Agent sub-tasks frequently generate `Eye(T, dims, array_type)` with three positional arguments instead of `Eye(T, dims; array_type=...)` with a keyword; verify this pattern.
- JET `@test_opt` catches runtime dispatch from `array_type::Type` when it is unparameterized; use `array_type::Type{<:AbstractArray}` and avoid kwarg-to-kwarg forwarding by routing through an internal helper.
- When fixing an "unexpected pass" Aqua error, remove the workaround and use `Aqua.test_all(pkg)` once the underlying issue is fixed.
76 changes: 75 additions & 1 deletion .github/skills/julia-long-test-workflow/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,24 @@ user-invocable: true

## Common Commands

Main package coverage:

```sh
julia --project=test --code-coverage=user test/runtests.jl
```

Subpackage coverage (DSPOperators, FFTWOperators, NFFTOperators, WaveletOperators have **no** standalone `test/` directory):

> All subpackage code and their GPU extensions are exercised by the parent package's
> `test/` project. Run the same coverage command above; the `.cov` files under each
> subpackage's `src/` will be populated automatically.

Process coverage after a local run:

```sh
julia -e 'using Coverage; Coverage.LCOV.writefile("lcov.info", Coverage.process_folder())'
```

Filtered test run:

```julia
Expand All @@ -37,9 +55,13 @@ TestItemRunner.run_tests(pwd(); filter = ti -> :MatrixOp in ti.tags) # example o
TestItemRunner.run_tests(pwd(); filter = ti -> ti.name == "DCT") # example of filtering by test name instead of tags
```

AirSpeedVelocity comparison:
### Local benchmark comparison with AirspeedVelocity

AirspeedVelocity works well for local branch-vs-master comparisons and is the
recommended tool for interactive performance investigation:

```sh
mkdir -p .temp/asv
benchpkg \
--path . \
--rev master,dirty \
Expand All @@ -48,6 +70,20 @@ benchpkg \
--exeflags="--threads=4"
```

Filtered AirSpeedVelocity comparison for a single benchmark family:

```sh
mkdir -p .temp/asv
benchpkg \
--path . \
--rev master,dirty \
--script benchmark/benchmarks.jl \
--output-dir .temp/asv \
--exeflags="--threads=4" \
--add RecursiveArrayTools \
--filter MIMOFilt
```

Render a comparison table:

```sh
Expand All @@ -59,6 +95,44 @@ benchpkgtable \
--mode time,memory
```

> **Note:** Use AirspeedVelocity with an explicit `--script` path when comparing
> against revisions that do not yet contain the benchmark file.

### CI benchmark comparison (GitHub Actions)

The GitHub Actions benchmark CI does **not** use the AirspeedVelocity action
because the root-level Julia workspace (`[workspace]` in `Project.toml`) causes
that action's revision-management to mis-resolve the monorepo subprojects.
Instead, two workflows implement a fork-safe two-stage approach:

- **`benchmark.yml`** – unprivileged `pull_request` job that checks out both
the base and head revisions, runs `benchmark/compare.jl` against explicit
worktree paths, and uploads `body.md`, `pr_number.txt`, and
`julia_version.txt` as an artifact.
- **`post_benchmark_comment.yml`** – privileged `workflow_run` job that
downloads the artifact and creates or updates the PR comment.

The comparison table mirrors AirspeedVelocity output with separate Time and
Memory sections, base/head columns, a ratio column, and emoji indicators:
- 🚀 significant speedup: `ratio − ratio_err > 1.2` (time) or `ratio < 0.5` (memory)
- 🐢 significant slowdown: `ratio + ratio_err < 0.8` (time) or `ratio > 1.5` (memory)

To run the comparison locally with the same script used by CI:

```sh
# Check out base separately, e.g. in a worktree:
git worktree add .temp/base master

julia --project=benchmark benchmark/compare.jl \
--base-dir .temp/base \
--head-dir . \
--output-dir .temp/bench-compare \
--pr 0 \
--julia-version "$(julia -e 'print(VERSION)')"

cat .temp/bench-compare/body.md
```

## Done Criteria

- Targeted tests pass.
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
run: |
julia --project=test -e '
using Pkg
Pkg.rm(["DSPOperators", "FFTWOperators", "NFFTOperators", "WaveletOperators"])
Pkg.develop(path = pwd())
for pkg in ("DSPOperators", "FFTWOperators", "NFFTOperators", "WaveletOperators")
Pkg.develop(path = joinpath(pwd(), pkg))
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ docs/Manifest.toml
Manifest.toml
Manifest-v*.toml
.temp/
test/gpu_env/
benchmark/gpu_env/
25 changes: 25 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# AGENTS.md

This repository uses layered guidance. Follow it in this order:

1. Read this file first.
2. Read any applicable files under `.github/instructions/` whose `applyTo` pattern matches the files you will edit.
3. Read the matching skill under `.github/skills/` when the task clearly matches a skill's scope.
4. Then inspect the target source files before editing.

## How to choose guidance

- Use `julia-operator-engineering.instructions.md` for changes under `src/**/*.jl`.
- Use `julia-performance.instructions.md` for code under `src/**/*.jl` and `benchmark/**/*.jl` when performance is relevant.
- Use `julia-testing-and-jet.instructions.md` for `test/**/*.jl` and docs-backed test guidance.
- Use `.github/skills/julia-long-test-workflow/SKILL.md` for long Julia test runs, filtered `TestItemRunner` work, JET triage, and AirspeedVelocity comparisons.
- Use `.github/skills/julia-gpu-implementation/SKILL.md` for GPU operator implementations, GPU extensions, GPU-specific tests, and GPU benchmark validation.

## Working rules

- Prefer the smallest skill and instruction set that fully covers the task.
- Do not ignore a matching instruction file because a skill also exists; use both when they apply.
- If multiple instruction files match, combine them rather than choosing only one.
- If a task touches both implementation and tests, read both the source and test instruction files before editing.
- Keep temporary artifacts under `.temp/`.
- When in doubt, inspect the relevant files before making changes.
Loading
Loading