Skip to content

feat(intervention): isolated GPU-worker execution backend for model.trace()#676

Open
khaiwang wants to merge 30 commits into
mainfrom
worktree-mediator-sandbox
Open

feat(intervention): isolated GPU-worker execution backend for model.trace()#676
khaiwang wants to merge 30 commits into
mainfrom
worktree-mediator-sandbox

Conversation

@khaiwang

Copy link
Copy Markdown
Contributor

Run a trace's user interventions in a spawned, GPU-enabled worker process so
footguns in intervention code (infinite loops, OOM allocations, device-side
asserts, host-object pokes) are contained to the worker while the model server
keeps serving. Results are bit-identical to in-process execution.

The six-event Mediator protocol (VALUE/SWAP/SKIP/BARRIER/END/EXCEPTION) is left
unchanged; isolation is an outer harness that spawns the worker and routes the
existing protocol over a CUDA-IPC bounce-buffer channel (tensors stay on the
GPU, ~0.6 ms/hook, size-independent) instead of a shared Python frame.

Two shared-memory assumptions of the in-process path become explicit harness
steps:

  • host-side hook registration: the worker has no real module, so on the first
    event for a requester the host registers the matching one-shot hook on the
    real module (resolved from the requester string, for the specific step).
  • worker->host saves transmission: .save()'d values live in the worker frame +
    Globals.saves; the worker bundles them into the END event and the host
    injects them into the real user frame.

New sources:

  • transport.py: CUDA-IPC codec + host/worker channels (clone-on-receive,
    per-wait timeout, host->worker live-meta piggyback, worker->host push field,
    cuda.synchronize ordering guard).
  • isolation.py: isolate_mediators() context, spawn_isolated_worker, _worker_main,
    on-demand host hook registration, worker interleaver stub + dummy-module map,
    barrier/variable-store wiring, transmissible-exception degrade.
  • _sandbox.py: seccomp lock_down for fs/net/exec containment.

Seam edits route the protocol through the channel when isolation is on:
interleaver.py (isolated start branch, on-demand registration in handle, saves
injection at END, host-side barrier counting, _iso/cancel teardown), hooks.py
(per-step iteration param on output_hook/input_hook), tracer.py (isolated
Barrier branch).

Covered, each bit-identical and independently reviewed: read / swap / .save() /
multi-invoke / skip / exception / timeout / seccomp lockdown; multi-token
iteration (iter[N], iter[:], per-step swap); cross-invoke barrier + variable
sharing; non-standard-named models. Not yet built: tracer.cache() (returns an
empty CacheDict under isolation), backward/grad (autograd graph is host-side),
warm worker pool. See docs/developing/mediator-gpu-trace-integration.md.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

khaiwang and others added 30 commits June 7, 2026 21:37
…race()

Run a trace's user interventions in a spawned, GPU-enabled worker process so
footguns in intervention code (infinite loops, OOM allocations, device-side
asserts, host-object pokes) are contained to the worker while the model server
keeps serving. Results are bit-identical to in-process execution.

The six-event Mediator protocol (VALUE/SWAP/SKIP/BARRIER/END/EXCEPTION) is left
unchanged; isolation is an outer harness that spawns the worker and routes the
existing protocol over a CUDA-IPC bounce-buffer channel (tensors stay on the
GPU, ~0.6 ms/hook, size-independent) instead of a shared Python frame.

Two shared-memory assumptions of the in-process path become explicit harness
steps:
- host-side hook registration: the worker has no real module, so on the first
  event for a requester the host registers the matching one-shot hook on the
  real module (resolved from the requester string, for the specific step).
- worker->host saves transmission: .save()'d values live in the worker frame +
  Globals.saves; the worker bundles them into the END event and the host
  injects them into the real user frame.

New sources:
- transport.py: CUDA-IPC codec + host/worker channels (clone-on-receive,
  per-wait timeout, host->worker live-meta piggyback, worker->host push field,
  cuda.synchronize ordering guard).
- isolation.py: isolate_mediators() context, spawn_isolated_worker, _worker_main,
  on-demand host hook registration, worker interleaver stub + dummy-module map,
  barrier/variable-store wiring, transmissible-exception degrade.
- _sandbox.py: seccomp lock_down for fs/net/exec containment.

Seam edits route the protocol through the channel when isolation is on:
interleaver.py (isolated start branch, on-demand registration in handle, saves
injection at END, host-side barrier counting, _iso/cancel teardown), hooks.py
(per-step iteration param on output_hook/input_hook), tracer.py (isolated
Barrier branch).

Covered, each bit-identical and independently reviewed: read / swap / .save() /
multi-invoke / skip / exception / timeout / seccomp lockdown; multi-token
iteration (iter[N], iter[:], per-step swap); cross-invoke barrier + variable
sharing; non-standard-named models. Not yet built: tracer.cache() (returns an
empty CacheDict under isolation), backward/grad (autograd graph is host-side),
warm worker pool. See docs/developing/mediator-gpu-trace-integration.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Measure the cost a warm worker pool would amortize: under isolation each
model.trace() spawns a fresh GPU worker. On gpt2 (A100) an isolated trace is
~4.5 s vs ~12 ms in-process (~370x). Decomposed bring-up ~4.2 s = cold import
torch (1.3 s) + import nnsight (2.3 s) + CUDA context init (0.4 s) + warmup;
host-side mediator serialization is only ~3 ms. The tax is essentially
model-independent (weights are not shipped) — a flat per-request cost.

- perf_spawn_cost.py: decomposed synthetic bring-up + real isolated-vs-inprocess.
- perf_spawn_split.py: splits the spawn slice into host serialize vs start().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Amortize the ~4.5 s per-request spawn cost of isolated execution (cold import
torch + import nnsight + CUDA context init, measured model-independent). A
worker is now generic rather than mediator-bound: _pool_worker_main warms
CUDA/imports/mount once, sends a one-time "ready" ack, then loops serving
("job", payload, extras, opts) messages — deserializing a fresh mediator
against fresh dummies per job (only the ~3 ms payload changes per request). The
CUDA context, kernels, bounce buffer, and channel persist across jobs. This
unifies the cold and pooled paths; the worker always loops, the host decides
recycle-vs-kill.

Host side: a process-global thread-safe _WorkerPool persists across traces.
acquire_isolated_worker pulls an idle worker (or lazily grows to the pool_size
cap, or a cold one-shot worker past the cap so a trace never blocks), ships the
job, and re-points the channel's meta_provider/on_push at this mediator.
Mediator.cancel calls release_isolated_worker.

Recycle-safety: only a cleanly-ended worker is reused. handle_end_event sets
_iso.clean when an END is consumed; release recycles iff clean & poolable &
alive & not dirty. A worker drained mid-protocol with a Cancelation (pipe
unbalanced), a timeout/death (spinning, not idle), or a cold one-shot worker is
retired and the pool re-warms lazily. Recycle resets the host channel
(CudaIpcHostChannel.reset) + per-job hook-registration state; the worker
rebuilds its interleaver/dummies and clears Globals.saves per job, so no
cross-trace state leaks.

Opt-in: isolate_mediators(..., pool_size=N) routes through the pool (pool_size=0
is the unchanged cold path); warm_worker_pool(N) pre-warms at startup,
shutdown_worker_pool() tears down. Pool sizing is a GPU-memory budget: each warm
worker costs ~0.55 GiB GPU per GPU touched (model-weight-independent, not reduced
by MPS), ceiling = batch size.

Verified (test_isolated_pool.py, gpt2/A100): reuse bit-identical (max|Δ|=0) at
~21x faster once warm (4.57 s -> 0.22 s) with PIDs reused; 3-invoke trace draws 3
distinct workers; hung worker retired + pool re-warms; non-standard-named model
works. Cold path stays bit-identical across read/swap/save/multi/exception/hang/
multitoken/cross-invoke/barrier/nonstd. See docs §14.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An adversarial review (no Critical issues — the cross-request data invariant
holds: per-job fresh interleaver/dummies/Globals.saves, host channel reset
before re-bind) surfaced robustness gaps, now fixed:

- Dead idle worker: acquire skipped the liveness check, so a worker that died
  while idle (OOM-killed by a neighbor, crash) was handed out -> broken-pipe
  trace failure AND the dead worker was never forgotten (permanent cap erosion).
  acquire now skips/forgets dead idle workers and re-spawns; acquire_isolated_worker
  retries once through the pool if send_job hits a dead worker.
- Multi-device aliasing (silent corruption): the pool's device was frozen at
  first warm, so a second model on another GPU drew a worker whose bounce buffer
  lived on the first GPU -> cross-device copy. The pool is now keyed per
  (device, arena_bytes, gpu_mem_fraction, lockdown) signature.
- Exception re-warm tax: clean was set only on END, so a user-exception worker
  (alive, pipe balanced) was retired -> every erroring trace paid a ~4 s
  re-warm. handle_exception_event now marks the isolated worker clean so it is
  recycled (cancel's dirty check still retires a mid-protocol worker).
- First-event hang-containment: a recycled worker's first event used the cold
  180 s startup_timeout; it now uses timeout + a deserialize margin, since
  spawn/warm completes before the "ready" ack.
- Over-provision: the grow slot is reserved under the lock so concurrent
  acquires can't exceed the cap.
- Resource cleanup: close() now closes the pipe fd + drops the GPU buffer; a
  _shutting_down flag stops a shutdown/release race from orphaning a worker.

Tests: test_isolated_pool.py gains dead-idle, exception-recycle, and (2-GPU)
multi-device cases — all 7 pass bit-identical. Cold path (pool_size=0) still
bit-identical across trace/acceptance(names,multi,exc,hang)/cross-invoke. Doc §14
updated with the hardening notes + lockdown cold-vs-pooled divergence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tracer.cache() registers persistent hooks (mediator_idx=inf) that fill a
.save()'d CacheDict during the forward; in the worker those hooks landed on the
dummy modules and never fired, so the user got an empty cache. Now:

- Worker cache() (isolated): ship the spec (token, module-paths, device, dtype,
  detach, include_output, include_inputs, rename, alias) via a new Events.CACHE
  request instead of registering dummy hooks; return a token-tagged placeholder
  CacheDict the user binds + .save()s.
- Host handle_cache_event: resolve paths to the real envoys, register the real
  cache_output/input_hook into a host Cache keyed by token (Mediator._iso_caches),
  set_user_cache, ack. Hooks live on the host mediator and are dropped at teardown
  by remove_hooks, like in-process.
- handle_cache_event acks + returns True, so the host loop processes CACHE then
  END consecutively at Mediator.start (before the forward). handle_end_event then
  swaps the host CacheDict reference in for the worker's empty placeholder (matched
  by token); the forward fills that same object in-place, so the user's variable IS
  the forward-filled host cache. No separate post-forward injection step. The
  substitution is gated on _iso_caches, so non-cache traces are untouched.

Verified (test_isolated_cache.py, gpt2/A100): single module, multi-module, and
include_inputs=True all bit-identical (max|Δ|=0, keys match in-process). Full
isolated regression (trace/acceptance/multitoken) and cold path unchanged. The
test now derives cache keys from envoy.path instead of a hardcoded prefix. See
docs §15.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The backward case hooked `.grad` on a GPT2 block's `.output[0]` — an off-the-
backward-path index into the block's tuple output, whose grad hook never fires
(a usage gotcha, confirmed via a manual register_hook). So its in-process
control ALSO errored, making the test useless as a gap demonstration.

Switch to `model.transformer.ln_f.output` — a tensor-output module ON the
autograd path — so the in-process control is valid. Backward now succeeds
in-process and fails cleanly under isolation, which is the gap the test exists
to characterize (host-only autograd graph, detached worker clones,
id(tensor)-keyed grads). requires_grad_(True) turned out to be a red herring
(ln_f works with or without it); the discriminator is on-path tensor-output vs
off-path tuple-element view.

Verified: backward in-process=ok, isolated=fails-cleanly; cache=bit-identical
(no longer a gap).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split

with tensor.backward() now works in an isolated GPU worker for the
read-then-backward case: the worker tags delivered activation clones with
requester-string provenance and computes dL/d(clone) on its local tape as
seeds; a new Events.BACKWARD ships them to the host, which continues
torch.autograd.grad on the real graph over the retained on-graph
activations and returns gradients keyed by provenance path. The backward
block's .grad reads are served from that dict; .grad on user-derived
tensors and .grad assignment raise clear errors.

Verified bit-identical (max|delta|=0) on gpt2 ln_f.output and on a
renamed model (final_norm/output_projection). Scalar loss only; swaps,
batched traces, and multi-token backward remain unsupported (documented
in the integration doc's new backward section).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…graph

Characterized multi-token (generate + iter) backward with an in-process
control first: generate() runs the forward without gradient tracking, so
the first .grad read fails in-process ("cannot register a hook on a
tensor that doesn't require gradient") — multi-token backward is
unsupported on both paths and no silent-wrong is possible (there is no
graph at all; the earlier per-step retention-overwrite concern is moot).

The isolated path failed at the same user line but blamed the wrong
cause ("off the backward path from the loss"). The host now signals the
no-graph case — handle_backward_event returns a marker when no retained
activation requires grad — and the worker's .grad error names the
grad-less forward and points at model.trace(). Characterization script
kept as a regression test asserting that message; single-pass backward
unaffected (max|delta|=0).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…o WorkerMediator

_run_one_job built the worker mediator by monkeypatching the deserialized
instance (end/exception closures, a request wrapper when backward is
active) alongside a module-global backward-context dict with its own
reset choreography. The job mediator is now adopted into a
WorkerMediator(Mediator) subclass via __class__ swap: the closures become
method overrides (end ships Globals.saves-filtered locals on END,
exception degrades to a picklable form, request tags delivered clones
with requester provenance when the trace differentiates), the meta/push
piggyback callbacks become methods bound to the channel, and the backward
context collapses to instance attributes plus a single current-mediator
pointer read by worker_backward_context(). _run_one_job is now
deserialize -> adopt -> wire -> run.

Behavior-preserving: full isolated suite (trace, acceptance, multi-token
iteration, cross-invoke, warm pool, cache, backward, multi-token-backward
characterization, renamed-model) all pass; in-process regression 51 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d_job

The backward detection (".backward(" in the intervention source) ran
independently on the host (_wire_host_channel) and in the worker
(_run_one_job), and _build_job recomputed the cross_invoker gate that
Mediator.start had already decided. Both decisions now happen once in
_build_job and ride worker_opts: the host reads backward_active when
wiring the channel (gating real-activation retention), the worker reads
it at adopt time (gating delivered-clone tagging), and cross_invoker
reuses the mediator's already-set value. This is now the single place to
tighten the substring detection (it can false-positive in comments,
costing only needless tagging).

Isolated trace/backward/cross-invoke/multi-token tests all pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…y map

Per-job worker-handle state was reset twice: at release (reset_for_release)
and again at the next acquire (_wire_host_channel). Acquire now owns the
authoritative reset (it runs unconditionally for both pooled and cold
workers); release keeps only reference-dropping so an idle worker doesn't
pin the last trace's hook set, path map, or — via channel.reset() — the
meta/push callbacks closing over its interleaver.

The {path: envoy} resolution map, previously built ad-hoc in two places
(cached by host-side hook registration, rebuilt from scratch on every
CACHE event), is now one lazy helper cached per job on the worker handle.

Pool recycle, cache, and trace isolated tests all pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e gap test

- A frozen IsoOptions dataclass replaces the four hand-copied option
  dicts (_STATE fields, _base_opts(), _WorkerPool._key(),
  warm_worker_pool's rebuild). pool_key lives on it, making the
  warm-time (device/arena/mem-fraction/lockdown) vs per-job (timeout)
  split explicit; the phantom never-set "startup_timeout" option becomes
  the _WARM_STARTUP_TIMEOUT constant.
- The CACHE event spec crosses the wire as a keyword dict instead of a
  9-field positional tuple, so adding a cache option can't silently
  shift fields.
- The gap-characterization test is retired: both gaps it proved are
  closed and its assertions duplicate test_isolated_cache.py /
  test_isolated_backward.py (weaker, in the cache case).
- The doc's duplicate feature-map and support-matrix tables fold into
  one table carrying mechanism + status.
- Doc records a PRE-EXISTING break found while re-running the full
  suite: lockdown has been broken since the warm-pool unification (the
  worker locks down before its first job-recv, and unpickling the job's
  tokenizer extras needs a new transformers submodule import that
  seccomp blocks). Reproduced on the pre-refactor commit 8d09195; needs
  a separate fix decision.

Warm-pool suite passes after the test helper moved to IsoOptions.pool_key
(reuse/concurrent/retire/dead-idle/exception-recycle/renamed-model, plus
trace/cache/backward and the rest of the isolated suite earlier in the
stack); in-process regression 51 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ncel

After an isolated backward trace ended, the host mediator kept its
references to every retained on-graph activation — pinning those tensors
and the autograd graph behind them until the mediator was GC'd. cancel()
already drops the mediator's other ephemeral state (history, iteration
tracker, worker handle); now it also clears the retention map. Safe
because every BACKWARD event precedes the END/exception that triggers
cancel. Found by a four-angle cleanup pass over the backward + refactor
stack; the other findings were judged false positives or already-
documented accepted costs.

Backward + multi-token-backward isolated tests and the in-process
regression (51 passed) stay green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
__setstate__ rebuilds the transient isolation fields (_iso,
_isolated_worker, _iso_backward, _iso_grad_reals) but missed _iso_caches,
so a deserialized mediator (the NDIF/vLLM server path constructs
mediators via __setstate__) running under isolate_mediators() would hit
AttributeError in handle_cache_event the first time a trace used
tracer.cache(). Latent locally (host mediators come from __init__);
found by the high-effort review pass.

Isolated cache test and in-process regression (51 passed) green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ions

Process isolation contains footguns by running each intervention in a
spawned GPU worker — but that worker holds a weightless path-only mirror
of the model, so the interp majority (logit lens, steering, ablation,
activation patching, attribution) cannot run isolated at all: they read
the host model's real weights (F.linear(x, head.weight)) and call its
final-norm / unembed modules. The fast lane is the tier where the real
weights live.

Adds a third execution tier under isolate_mediators(). A fail-closed,
default-deny static classifier (fastlane.py) walks the EFFECTIVE code of
each mediator — the trace body plus every user closure it calls, resolved
through the frame / function globals / closure cells (the harness wraps
real compute in build()/capture() closures, so a walk of the with-block
alone would see only an opaque call). Verdicts: FAST (only whitelisted
ops / host-model access / nnsight primitives -> run in-process at full
speed and full model access), ISOLATE (anything unconfirmable -> the
existing GPU worker), REJECT (an introspection escape -> raise). The
conservative default is ISOLATE; the gate is a footgun selector, not a
malice boundary, so it is cordoned to trust="local" provenance and a
CONFIG.APP.FAST_LANE flag.

Default behavior is preserved: isolation off never consults the gate;
isolation on now fast-lanes the confirmed-safe majority and isolates the
rest. A best-effort wall-clock watchdog restores loop-containment for the
one footgun the static walk cannot bound (a huge bounded range); its
injected FastLaneTimeout rides the intervention body's existing
try/except, so the host re-raises it cleanly. The classifier's
closure-aware backward detection also replaces the old `.backward(`
source substring (blind to a backward hidden in a closure) for the
isolated job's grad-retention flag.

Deferred (documented): the process-global sys.addaudithook backstop (its
leaked-flag failure mode can abort the model's own forward — net-negative
under a static default-deny gate); the five declarative tracer primitives
(unembed/steer/patch/ablate/capture) that would let weight-reading cells
also run on the isolated tier via host event handlers (a cache-shaped
build).

Verified: classifier units 17/17 (logit-lens/steering/patching/attribution
shapes + renamed structures classify FAST; imports/while/unresolved-call/
open ISOLATE; introspection REJECT; flag detection). Fast-lane e2e 6/6 on
gpt2 + a renamed model: weight-reading lens bit-identical on the fast lane
(max|Δ|=0) AND raises under forced isolation; in-place steering
bit-identical; footgun routes off the fast lane, host survives;
introspection rejected; runaway loop killed by the watchdog, host
survives. Existing isolated WORKER path (pinned with fast_lane=False) 9/9
still bit-identical; in-process core 51 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… note

docs/developing/fast-lane.md: why isolation could not run the weight-
reading interp majority (weightless worker), the three-tier design, the
classifier rules + threat-model contract, the watchdog, the prior art it
borrows from (Cloudflare Workers / RestrictedPython / SES / fx+JAX /
gVisor / Firecracker / the pysandbox negative result), the designed
part-2 declarative primitives (next increment), deferred items, and the
verification matrix. Cross-linked from the integration doc's support
matrix, which now notes the FAST/ISOLATE/REJECT tiering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The weight-reading interp readout — F.linear(norm(residual), head.weight),
done by every logit-lens / steering-direction / attribution-metric cell —
cannot run in the isolated worker because its dummy modules are weightless.
tracer.unembed closes that on the isolated tier without putting weights in
the worker: the worker ships the residual VALUE plus the norm/head module
PATHS via a new Events.UNEMBED request; the host's handle_unembed_event
resolves the real envoys, runs the real norm + unembed on the real
weights, and ships back only the logits (bounce-buffer round trip,
clone-on-receive). Weights never cross the boundary — so this neither
binds the generic warm worker to a model nor places host weight memory in
the less-trusted worker (the two costs that ruled out shipping/sharing
weights). In-process / on the fast lane it just runs the real modules
directly. Shaped exactly like Events.CACHE / handle_cache_event.

This is the first of the part-2 declarative primitives; it also means a
deployment that forces pure isolation (fast_lane=False, e.g. for OOM
containment) can still run weight-reading workloads if they are written
with tracer.unembed.

Verified (test_isolated_unembed.py, all under forced isolation, gpt2 +
renamed model): single-layer / 3-layer-interleaved / formulation="module"
/ norm=None / renamed-model readouts isolated-vs-in-process max|Δ|=0;
tracer.unembed == the manual F.linear it replaces. Isolated trace/cache,
fast-lane e2e, in-process core (31), and classifier units (17) unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… isolated tier

Add tracer.steer(envoy, direction, alpha), the next part-2 declarative primitive
after tracer.unembed. It adds alpha*direction to a module's output residual via a
*replacement* boundary write (assign envoy.output), which routes through the
eproperty setter -> Events.SWAP and ships the steered value back on either tier.

Steering touches no host weights — only the delivered activation — so unlike
unembed it needs no host round-trip and no isolated/in-process branch: the same
method is correct in-process, on the fast lane, and in the isolated worker.

The point is the replacement swap. The hand-written additive form is in-place
(block.output[:, -1, :] += direction); under isolation that mutates only the
worker's delivered clone, no SWAP fires, the host's real activation is untouched,
and the steering silently no-ops. tracer.steer makes it cross the boundary by
construction. Tuple outputs (attention modules) are replaced whole, steering
element [0] and carrying the tail (incl. a None) through pack_cuda untouched.

The classifier already treats tracer.steer as a trusted nnsight primitive (its
__module__ is nnsight.*), so no gate change is needed.

Verified (test_isolated_steer.py, gpt2 + a renamed model), all max|Δ|=0:
steering a block, an attention tuple output, and three blocks at once are
isolated-vs-in-process bit-identical and propagate through later layers;
tracer.steer equals the manual whole-tuple replacement; and the crux — under
forced isolation the in-place form leaves the downstream residual at the
unsteered baseline (silent no-op) while tracer.steer takes effect and matches the
in-process result. Classifier units still 17/17.

Doc: docs/developing/fast-lane.md §6 (steer marked built + subsection), §7, §8.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…del coverage

Add a `preimport=` option to isolate_mediators()/warm_worker_pool() that loads
modules at worker warm time, BEFORE seccomp lockdown freezes new file opens
(import == open()). This brings user-facing import capability under lockdown to
parity with an in-process module whitelist without weakening containment: the
model's own kernels (incl. triton) run host-side and are unaffected.

- thread `preimport` through _STATE -> _base_opts -> the pool key (now per
  device/arena/mem/lockdown/preimport signature) and consume it in
  _pool_worker_main warmup, before lock_down().
- test_isolated_triton_model.py: a @triton.jit-kernel model traced under
  isolation+lockdown is bit-identical to in-process (host compiles triton while
  the worker is fully locked down); worker-side triton in the intervention is
  blocked. Requires GPU + triton.
- docs §16: the triton deployment motivation, the strictly-better-than-upstream
  module-restriction comparison, and the verified timeout-directionality and
  cold-vs-pool lockdown-ordering facts. Fix the stale docstring claiming the cold
  path deserializes before lockdown (the unified worker locks down before both).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The host read worker frames with mp.Pipe.recv() (pickle) — and the worker runs
UNTRUSTED user code, so a crafted __reduce__ gadget over the control plane was a
host-side RCE that would bypass every other isolation layer (seccomp, namespaces,
row-bounding). The design hardened the inbound user payload (unpickled inside the
worker) but not the outbound results. Close it:

- worker->host is now tensor-free (tensors already ride the GPU buffer /
  safetensors) and the small remaining structure is decoded with a RESTRICTED
  unpickler (transport._RestrictedUnpickler / _safe_loads): find_class allows ONLY
  torch dtype/device and refuses every other class/function. find_class resolves a
  global before the REDUCE that would call it, so a gadget is refused before it can
  execute.
- the event rides as its string .value (no enum class) and exceptions as a
  (type-name, message) sentinel (no class), so the allowlist stays {torch dtype,
  device}. host->worker stays normal pickle (host-authored, trusted).
- this also fixes a real correctness gap a prior hand-rolled JSON codec had: the
  Events.CACHE spec carries torch.dtype/torch.device, which the tagged codec
  rejected; pickle handles all plain nested structures natively (no per-type
  enumeration), and anything un-allowlisted fails loud at decode with the class name.
- capability narrowing: .save() of an arbitrary object / numpy / framework type
  (e.g. ModelOutput) is no longer transmittable from a worker — save a tensor instead.

test_isolated_codec_security.py: fidelity for VALUE/SWAP/END/CACHE(dtype,device)/
EXCEPTION/push, and a genuine __reduce__ gadget refused at decode without executing.
CPU is enough (needs torch). Legacy AF_UNIX socket channels are unused by
isolate_mediators and still plain-unpickle — noted in the module header.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture the security analysis behind the GPU-worker backend: the attacker model,
the asset list (host integrity / fs / net / cross-tenant host+GPU memory / DoS /
deser-RCE), the R0-R4 configuration ladder with the cost coupling (closing a deeper
threat forces a slower data path; the cliff is R2->R3, i.e. leaving the GPU), which
sandbox controls are compatible vs incompatible with the shared-GPU CUDA-IPC design,
the co-batch tenant-isolation invariant (the empty-invoke full-batch hole; the
Batcher/Interleaver = tenant boundary), and the worker->host restricted-unpickler fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pickler codec + warm-time preimport

Integrate the parallel isolation security/compat work (8f88986, f72a397,
151e5a5) with the local backward/fast-lane/unembed/steer line.

What came in from the remote:
- Worker->host restricted-unpickler codec (transport._RestrictedUnpickler /
  _safe_loads): worker-authored frames never plain-pickle.loads on the trusted
  host; find_class allows only torch dtype/device, the event rides as its string
  .value, exceptions as a (type-name, message) sentinel. Closes a worker->host
  RCE. transport.py auto-merged clean.
- Warm-time preimport: load deployment-allowed modules before seccomp lockdown
  freezes new file opens (import == open()) — the mitigation for the documented
  lockdown break and import-parity with the in-process whitelist.
- mediator-threat-models.md (R0-R4 ladder + cost coupling + co-batch tenant
  isolation + the codec fix) and the Triton-model deployment motivation.
- Triton-model + codec-security prototype tests.

Conflict resolution (isolation.py, integration doc):
- The remote added preimport to the OLD flat _STATE dict; this line had already
  refactored to the frozen IsoOptions dataclass. Kept IsoOptions and folded
  preimport into it (new field + added to pool_key, since the preimport set is
  warm-time and defines pool interchangeability). Dropped the remote's _base_opts
  / _WorkerPool._key in favor of _STATE["opts"] / IsoOptions.pool_key.
- Worker bootstrap runs the preimport loop off worker_iso_opts.preimport.
- Adopted the remote's corrected lockdown-timing wording (cold and pooled share
  the unified worker, which locks down before any job deserializes — the
  cold-vs-pool difference is recycle-vs-retire, not lockdown timing), replacing
  this line's now-inaccurate "cold deserializes before lockdown" claim.
- Both sides added a "§16"; kept §16 = backward read-path (already referenced by
  the committed §11 title and support matrix), renumbered the Triton section to
  §17, and fixed the two Triton matrix rows + the threat-models doc's two
  external §16 references. Merged the two support-matrix versions (kept the
  fast-lane/3-tier matrix, added the Triton rows + an unembed/steer row).

Also fixed a bug in the pulled codec-security test (never run per the doc's own
open-items list): the non-allowlisted probe class was function-local, so pickle
could not even ENCODE the frame (Can't get local object), masking the decode-time
refusal it meant to test — moved it to module scope.

Verified post-merge (hf-serve, A100): classifier 17/17; codec security PASS
(fidelity incl. CACHE dtype/device + gadget/non-allowlisted refusal); and
isolated trace (read/swap), unembed (UNEMBED frame through the new codec), steer
(SWAP through the codec), and acceptance (names/multi/exception-sentinel/hang)
all bit-identical max|Δ|=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lgebra codec

The worker->host (untrusted) direction no longer runs pickle's VM on worker
bytes. Instead the boundary transmits a CLOSED VALUE ALGEBRA — never live
objects — so there is no opcode that can call a function in the first place:

  Value = None | bool | int | float | str | bytes
        | list | tuple | dict | set  of Value
        | torch.dtype | torch.device
        | Array(...)  (torch tensors AND numpy arrays, OUT-OF-BAND)

This makes the boundary value-semantic, and makes "safe + correct" a property of
the type rather than a bet on a restricted unpickler:

- SAFE by construction: _codec_loads is pure data assembly (no globals, no
  find_class, no REDUCE), plus a size cap + bounds-checked reads for decode-bomb
  DoS. The previous restricted unpickler is removed entirely.
- FAITHFUL: pack_cuda's Array leaf is generalized from torch.is_tensor to also
  cover numpy.ndarray (bridged through torch, re-materialized host-side as an
  ndarray), so numpy `.save()`s now cross — they were silently refused before.
- HONEST contract: a value outside the algebra (custom object / framework type)
  is rejected at the WORKER, at ENCODE, with a clear BoundaryValueError naming it
  at its source — not an encode-ok / decode-refuse split.

tracer.cache() shipped a live CacheDict placeholder, which is not a value; the
worker now ships its token as a `{_ISO_CACHE_TAG: token}` marker (same shape as
the EXCEPTION sentinel) and the host swaps in its forward-filled cache by token.
(This also fixes cache under isolation, which the merged restricted unpickler had
broken — it refused CacheDict the same way.)

host->worker stays plain pickle (host-authored, trusted).

Verified (nnsight-tf: py3.11 / torch 2.11 / transformers 5.12):
- codec unit (test_isolated_codec_security.py): fidelity over the algebra incl.
  numpy + dtype/device; a __reduce__ gadget and a custom object rejected at encode
  before any __reduce__ runs; malformed/oversized/unknown-tag raise cleanly.
- full isolated GPU suite bit-identical max|Δ|=0: trace, unembed, steer, cache,
  backward, multitoken-iter, cross-invoke, pool, acceptance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mitives for the isolated tier

The two remaining single-write part-2 primitives, structural twins of tracer.steer:
both are replacement boundary writes that touch no host weights, so they ride the
existing Events.SWAP with no new event, no host handler, and no isolated/in-process
branch. Done in place under isolation each silently no-ops (the worker mutates its
delivered clone, no SWAP fires); the replacement swap makes them cross the boundary by
construction.

- tracer.patch(envoy, value): transplant a precomputed value into a module's output
  residual (activation patching / resampling). Cast to the residual's dtype/device so a
  value precomputed on CPU — the isolation case — transplants cleanly. Whole-tuple
  replacement (element [0]).
- tracer.ablate(envoy, mode="zero"): zero/mean knockout. mode="mean" is the self-contained
  within-sequence mean; reference-distribution (dataset) mean ablation is a precomputed
  value transplanted via tracer.patch — not derivable from a single forward, so kept
  distinct to avoid silent wrong-mean semantics. Unknown mode raises ValueError.

Verified bit-identical (max|Δ|=0) under forced isolation vs in-process on gpt2 + a renamed
model: test_isolated_patch.py 6/6, test_isolated_ablate.py 7/7, including the crux that the
in-place form is a silent no-op under isolation while the primitive takes effect. No
regression in steer/unembed/trace/acceptance/cache/cross_invoke/backward. Docs:
docs/developing/fast-lane.md §6 (built), new subsections, §7/§8 updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
… .carry() primitive

model.session() carries values across its inner traces in-process (each inner trace
pushes its locals up to the session frame; the session's exit-push surfaces only saves).
Under isolation each inner trace runs in a worker that shipped only its .save()'d locals
home, so the two-hop push was broken two ways:
  - a SAVED value used cross-trace was written into the session frame but its host id was
    never re-registered in Globals.saves, so the session's exit-push dropped it
    (UnboundLocalError);
  - a NON-saved value used cross-trace was never shipped at all (NameError).

The realized form of the last part-2 primitive (the run<->run handoff originally specced
as tracer.capture, which collided with the existing Tracer.capture(frame) AST method):

- Saved-case fix: when the isolated END target is a nested/session frame, the host writes
  the worker's values into it AND re-registers the saved values' host ids in Globals.saves,
  so the session's exit-push keeps them. Root (single-trace) writeback is unchanged. Makes
  the documented `hs = x.save()` -> use `hs` session pattern work under isolation.
- .carry() (universal value method, like .save(); plus nnsight.carry(x)): hand a value to a
  later trace in the session WITHOUT surfacing it as an output. The worker end() now ships
  saved-union-carried locals as (values, saved_names); the host writes all to the session
  frame (next trace sees them) but registers only the saved ones, so carried values drop at
  session exit — exactly in-process non-saved semantics, made explicit. With no .carry() in
  use the payload equals the prior saved-only one, so the single-trace path is unchanged.
  .carry() is portable: harmless in-process, load-bearing under isolation.

Root cause confirmed by host-side instrumentation (saved value reaches the session frame but
host Globals.saves stays empty -> session root-push drops it).

Verified (nnsight-tf, GPU7, gpt2 + renamed model): test_isolated_session_handoff.py 6/6 all
max|Δ|=0 (saved + carried handoff isolated==in-process; nnsight.carry==method; carried value
not surfaced to caller while saved is; .carry() in-process==isolated). No regression across
the isolated suite (trace/cache/backward/multitoken-iter/cross-invoke/acceptance/steer/patch/
ablate/unembed/pool) and in-process core test_lm.py 75/75.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
…ialize

The isolated worker installed its seccomp filter before the job loop (warm time), so the
very first conn.recv() — which unpickles the host's job message with standard pickle —
triggered a lazy transformers submodule import (transformers loads modeling submodules only
at unpickle time, so preimport=("transformers",) did NOT help) → open() → EPERM under
seccomp. EPERM is an OSError, which the recv loop's `except (EOFError, OSError): break`
swallowed → os._exit(0) → the host saw a pipe EOF and reported "worker died during
execution." This broke every lockdown=True trace (root cause confirmed by worker-side
instrumentation: death at conn.recv, importing transformers/models/gpt2/__init__.py).

The job message and mediator payload are host-authored, TRUSTED data; only the user
intervention code is untrusted. So lockdown belongs after deserialization, before user code
— exactly what _sandbox.lock_down's own docstring already stated. Move lock_down() out of
_pool_worker_main's warm section into _run_one_job, installed once (guarded by a worker
global) after the first job's payload is deserialized and before its intervention runs:
- the first conn.recv runs unlocked, so a fresh worker's first job needs no preimport=;
- one-way + once, so a warm pool's later jobs run under the first job's lockdown — a
  homogeneous model needs nothing (already imported), a different model needs preimport=;
- cold (pool_size=0) and pooled share the path, so both deserialize their first job first.

Containment is unchanged: user-code open/socket/exec under lockdown are still blocked and
now surface as a clean NNsightException (shipped via the EXCEPTION path) rather than a silent
death.

Verified (nnsight-tf, GPU7, gpt2): test_isolated_lockdown_safety.py 4/4 — read under
lockdown max|Δ|=0, fs/net blocked, and a NEW warm-pool case (3 traces on one pooled worker
under lockdown, all bit-identical). No regression: trace/cache/pool/session_handoff pass
(the lockdown=False default path is untouched). Docs: mediator-gpu-trace-integration.md
support matrix + lockdown-ordering notes updated (break -> fixed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
…es, session handoff, backward state

Bring docs/developing/mediator-gpu-trace-integration.md §8 (and the §10 cross-trace note) up to
date with the features landed this session (the detail already lived in fast-lane.md §6):
- support matrix: the part-2 primitive row now lists all of unembed/steer/patch/ablate; a new row
  documents session cross-trace handoff (.save() used in a later trace, and .carry()/nnsight.carry).
- backward row: reflects the current state — read-path bit-identical; multi-token backward is a
  clean-fail (in-process doesn't support it either); grad-through-a-swap cleanly errors (the swapped
  value is a host-side leaf, severing the host graph at the seam) — the next backward increment.
- §10 cross-trace note: the per-job reset clears Globals.shared too, and clarifies that the no-leak
  property is about UNRELATED traces — intentional in-session handoff is a separate supported path.

Docs-only; no code change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
… backward seam

An isolated SWAP installs the worker-computed value on the host as a detached leaf
(clone-on-receive strips grad_fn), severing the host autograd graph at the swap point. So a
downstream loss differentiated w.r.t. an UPSTREAM activation dead-ended at the swap ("no
gradient available ... off the backward path"), while in-process gradients flow through
swaps. The read-path backward split the chain rule once at the worker→host seam; a swap adds
a second seam that splits it the other way (host downstream → worker swap tape → host
pre-swap).

Fix: iterate the existing Events.BACKWARD exchange to a fixpoint over swap seams.
- Host (interleaver.py): handle_swap_event, under _iso_backward, makes the swap leaf
  requires_grad_(True) and retains it (_iso_grad_swaps) so the downstream forward tracks it
  and it is a backward target; handle_backward_event adds swap leaves to its targets and
  returns dL/d(swap leaf) under a reserved key (kept separate from reals so a read-then-
  swapped module sharing one requester path doesn't collide).
- Worker (isolation.py): WorkerMediator.swap keeps the worker-tape swap value (with grad_fn);
  reset alongside the other _bwd state.
- Worker backward (backwards.py): loop — send seeds, receive dL/d(swap leaf), backprop it
  through the swap tape to dL/d(delivered clone), re-seed the pre-swap graph, repeat; a read
  reached both directly and through a swap SUMS its gradient across rounds. With no swaps the
  loop is exactly the prior single exchange.

Verified (nnsight-tf, GPU7, gpt2 + renamed): test_isolated_grad_through_swap.py 5/5 all
max|Δ|=0 — grad through h*2, h+vec, tracer.steer, TWO chained swaps (loop fixpoint), and the
renamed model, isolated == in-process. No regression across the isolated suite 13/13
(read-path backward, multi-token-backward clean-fail parity, trace/steer/patch/ablate swaps
without backward, multitoken-iter/cross-invoke/session-handoff/cache/lockdown/acceptance/pool).
Docs: mediator-gpu-trace-integration.md §16 + support matrix (grad-through-swap DONE).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
…downgrade two overclaims

A deep audit found doc-vs-code drift across the isolation docs. Code is correct; the docs
lagged. Doc-only changes (verified against transport.py / isolation.py / _sandbox.py /
fastlane.py):

- threat-models §7 + line 11: the worker->host fix is the shipped closed value-algebra codec
  (transport._codec_dumps / _codec_loads), not a restricted unpickler. The documented
  _RestrictedUnpickler / _safe_loads do not exist (replaced in 043196c). Rewrote the Fix
  paragraph to the codec (closed algebra, pure data assembly, no find_class/REDUCE/pickle VM,
  size cap + bounds checks, BoundaryValueError at encode); deleted the
  "restricted-unpickler vs hand-rolled codec" subsection and replaced it with why a closed
  codec is stronger (no opcode can call anything, so the restricted-unpickler bypass class,
  e.g. CVE-2025-32434, cannot exist). Legacy-socket remediation pointer _safe_loads ->
  _codec_loads.
- integration §14: pool_key is the 5-tuple (device, arena_bytes, gpu_mem_fraction, lockdown,
  preimport), not 4 (preimport was missing).
- integration §3/§7/§9: back-patch renamed symbols ensure_provider -> ensure_isolated_provider,
  spawn_isolated_worker -> _spawn_worker, _worker_main -> _pool_worker_main (kept the one
  "previously _worker_main, now _pool_worker_main" history line intact).
- integration: clarify set_per_process_memory_fraction caps the allocator pool (the 20 GB
  footgun), distinct from the ~0.55 GiB CUDA-context cost it does not reduce.
- mediator-isolation-sandbox.md + gpu-sandbox.md: SUPERSEDED / pre-integration banners pointing
  to the authoritative integration + threat-model docs; dropped topk from the op list and noted
  "capture" shipped as .carry().

Posture claims downgraded to match the shipped footgun-containment model:
- threat-models §4/§5: the shipped seccomp is a default-ALLOW denylist of 7 fs/net/exec
  syscalls (plus GPU mem-fraction), i.e. R1 + footgun containment. Full R2 (allowlist-default
  seccomp + ptrace/clone/fork + namespaces + cgroups) is designed, not built; R2's
  "determined adversary" closure is the roadmap target, matching gpu-sandbox.md.
- fast-lane §7: the static gate rejects import/open/exec/socket AST nodes, but the fast lane
  runs in-process and allowlists numpy/torch calls, which is a footgun selector, not an
  adversarial boundary. Safety rests on the trust="local" cordon (unreachable for non-author
  code), made explicit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
…ulti-invoke framing

Investigating the "batched backward" open item showed it splits two ways, and checking the
in-process baseline first reframed it:

- Batched backward via LIST input (model.trace([A, B, ...])) already works under isolation,
  bit-identical. It is one mediator over the padded batch (batch_group=None, no per-invoke
  narrowing), so the worker's delivered clone and the host's retained real are both full-batch
  and shapes match; the read-path and grad-through-swap seam-stitch run unchanged on a
  (batch, seq, hidden) tensor. Added test_isolated_batched_backward.py (2-row, 3-row,
  upstream-block, batched grad-through-swap, renamed): all isolated-vs-in-process max|Δ|=0.
- Backward inside MULTIPLE tracer.invoke(...) contexts raises MissedProviderError IN-PROCESS
  too (the .grad provider is never registered across invoke contexts), so it is a core nnsight
  limitation, not an isolation gap. The prior doc framing ("cryptic shape mismatch / needs
  narrowed retention") predated checking the in-process baseline; same category as multi-token
  backward (parity, not a gap).

Doc-only + new test; no code change. mediator-gpu-trace-integration.md §16 + support-matrix
backward row corrected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_014ZUUF44B2tfuKBNFhDFedR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant