Skip to content

feat: fold llmproxy duplicate turns by passive pair detection#22

Merged
github-actions[bot] merged 28 commits into
mainfrom
feat/llmproxy-pair-detection
May 21, 2026
Merged

feat: fold llmproxy duplicate turns by passive pair detection#22
github-actions[bot] merged 28 commits into
mainfrom
feat/llmproxy-pair-detection

Conversation

@vaderyang
Copy link
Copy Markdown
Collaborator

Summary

User-visible fix for the Agent Turns list showing duplicate rows for one logical LLM call when traffic crosses an llmproxy (LiteLLM / haproxy_glm5). The duplicates come from two distinct phenomena, both passively detected and folded:

  1. Real proxy hops — client → proxy_container → upstream_container both traverse captured docker bridges → two records with the proxy_in leg strictly enclosing the proxy_out leg in event time.
  2. Multi-interface double-capture — libpcap on any interface sees the same packet on br0 and docker0 with different IP attribution → near-identical timestamps, differing 5-tuple.

Both fold via the same pairing rule (same session + same model + same token counts + same path/status + req_t within 100ms + differing (client_ip, server_ip)) — content+timing is the rule that survives docker SNAT. Role assignment:

  • mirror (same packet, different interface) when start AND end times agree within 500us → primary/secondary
  • strict nesting otherwise → proxy_in/proxy_out

Pure-passive: no llmproxy cooperation. Verified against the verified haproxy_glm5 turn pair on wuneng (019e3a95-bb7c-7eb3-8240-d3ecacb0c583 / …-d3d6fdd76249, same session gen-b93380c5210ed98a, 11345/128 tokens, start_gap 2ms / end_gap 1ms).

Mechanics

  • Zero schema migrationagent_turns.metadata is already a VARCHAR/JSON column. Pair info lands as metadata.proxy.{role, pair_id, peer_turn_id}.
  • Background sweeper (ts-storage::pair_sweeper) polls recently-finalized turns whose metadata.proxy.role IS NULL, runs ts_turn::pair_all, writes patches back via update_turn_metadata. Default 2s cadence, 30min lookback. Idempotent — already-paired turns are excluded from every sweep.
  • API filter/api/agent-turns hides proxy_out / mirror_secondary by default; ?include_proxy_hops=true returns every row. Each item gains optional proxy_role and proxy_peer_turn_id.
  • UIProxyBadge next to the Time column on paired rows (subtle "↔ via proxy" / "mirrored" indicator with hover tooltip pointing at the peer turn_id). Filter-bar checkbox "Show proxy hops" persists in URL state (?show_hops=1).

Trade-off (known)

The pair algorithm is currently greedy: each turn participates in at most one pair. For the haproxy_glm5 case which actually emits 3 captured legs (host-IP view + docker-IP view + real upstream forward), the sweeper folds the (host, docker) mirror pair but leaves the upstream-forward leg unpaired. User sees 2 rows instead of 3 — an improvement, but not yet 1-row-per-logical-call. Multi-hop pair chaining is a follow-up.

Test plan

  • ts-turn::proxy_pair — 10 unit tests cover the classification rule (real-data nesting case, mirror case, cross-session reject, same-view reject, time-gap exceeded, token-mismatch reject, ambiguous-non-nesting reject, metadata patch shape, multi-pair-per-session, lone-turn).
  • ts-storage::pair_sweeper — 3 unit tests (matched pair runs end-to-end, role assignment matches real wuneng shape, lone turn left alone).
  • ts-storage-duckdb::turnsquery_pair_candidates_returns_only_unpaired + update_turn_metadata_merges_into_existing_object + query_turns_hides_proxy_hops_by_default_and_surfaces_them_with_flag.
  • Workspace test suite green (800+ unit tests).
  • Deployed on wuneng — pair sweeper successfully classified 1 mirror pair in last 30min of live traffic; proxy_role populates on the API; UI bundle serves the new bundle.

🤖 Generated with Claude Code

Vader Yang and others added 24 commits May 15, 2026 17:01
…d link

Builds on the previous PR (selected id in URL): copying a list page
URL like `?preset=15m&selected=<id>` and opening it half an hour
later would compute `start=now-15m, end=now` from scratch — the
selected item is no longer in that window. The detail panel still
loaded (it queries by id) but the list behind it showed an unrelated
slice, the row had no highlight, prev/next disabled.

Fix the window without changing the original tab's behaviour:

1. List pages also write `?selected_at=<unix_s>` when an item is
   selected — taken from the item's start_time (agent turns) or
   request_time (llm calls / http exchanges). Cleared together with
   `selected` when the panel is closed.

2. `useToolbarUrlSync` reads `selected_at` during hydration. If the
   anchor falls outside the preset-derived window, override:
   - keep the preset's *duration* (the original user's "show me this
     much context" signal),
   - slide so `end = anchor + 60s` (small breathing pad keeps the
     item from sitting flush at the edge in a desc-by-time list),
   - promote `preset` to `custom` so subsequent URL writes carry
     absolute start/end and the shift survives navigation.

   No-op when the anchor is inside the window, absent, unparseable,
   or future-dated relative to a window that already includes it.

3. Pure helper `applySelectedAtAnchor` lives in its own module
   (`selected-at-anchor.ts`, no `@/` aliases) so it's directly
   testable under bun without the toolbar-store / react-router
   runtime chain. 7 unit tests cover the no-op cases, the stale-
   preset shift, default-1h fallback, and clock-skew anchors.

Effects:
- Original tab: relative preset still ticks `now` as usual; no
  surprise switch to `custom`.
- Fresh URL load: window auto-widens / slides to bracket the
  shared item; list, highlight, prev/next all work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ar logo

Three small UI changes batched into one branch — none of them touch
backend or data shapes:

* Overview "Avg TPOT" KPI surfaces as "Avg TPS" with units of tok/s
  (= 1000 / tpot_avg_ms). TPOT itself is what the backend stores;
  the conversion is one division at render time. "Generation speed"
  reads better in a glance than "milliseconds per token".

* Models table column "TPOT" → "Generation TPS", same unit swap.
  Sort key still points at tpot_avg under the hood but getSortValue
  inverts to 1000/tpot_avg so clicking the column desc gives
  fastest-first — matches what someone clicking "Generation TPS"
  expects.

* Agent Turns table column order rewritten around how operators
  actually triage a turn: Time, Agent, Client, Calls, Status, In,
  Out, then the less-frequently-scanned dimensions (Model, Wire
  API, Server, Duration) and the long User Input preview last.

* New TokenScope brand mark replaces the bare panel-toggle button
  at the top-left of the sidebar:
  - Expanded: wordmark on the left, collapse button on the right.
  - Collapsed: icon-only mark; click toggles to expand (the icon
    doubles as the expand affordance — discoverable, saves a row).
  Both variants share the same glyph (rounded "scope" frame
  containing three decreasing token bars) so they line up
  visually as the sidebar opens/closes. Stroke uses currentColor
  for dark-mode and theme inheritance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every chart had its own copy of:

  function formatAxisTime(epoch) {
    const d = new Date(epoch * 1000)
    return `${HH}:${MM}`
  }

Result: a 7-day window rendered ticks as a wrapping clock face
("00:00", "12:00", "00:00", "12:00", ...) with no day attached.
Same problem at 24h. Easy to mis-read.

Centralize the formatter in lib/format as `formatAxisTime(epoch, span)`
and have it pick the right shape based on the visible window:

  span < 24h       →  HH:MM           (5m / 15m / 1h / 6h presets)
  24h ≤ span < 7d  →  MM-DD HH:MM     (24h preset)
  span ≥ 7d        →  MM-DD           (7d preset; time-of-day is noise
                                       when ticks come ~daily)

Each chart derives span from its data (last timestamp − first), so the
formatter requires no toolbar dependency and naturally handles partial
ranges (e.g. tail of a 7d window after retention trimmed the head).

Replaces the inline copies in:
  - timeseries-line-chart  (Overview latency, Models, Performance)
  - request-volume-chart   (Overview)
  - latency-overview-chart (Overview)
  - stacked-bar-chart      (Performance, Traffic)

6 unit tests in lib/format.test.ts cover each duration bucket plus the
24h / 7d inclusive boundaries and the single-point fallback (span = 0
→ HH:MM). Tests assert *shape* not literal values so they pass under
any TZ.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ProxyPair, ProxyRole, PairCandidate, PairAssignment with the
classify_pair / pair_all entry points. No call sites yet — this is the
pure-data foundation that the storage sweeper + API filter will build on.

Pairing rule (verified against the haproxy_glm5 turn pair on wuneng:
turns 019e3a95-bb7c-7eb3-8240-d3ecacb0c583 / d3d6fdd76249, same session
gen-b93380c5210ed98a, 11345/128 tokens, start_gap 2ms / end_gap 1ms):

  - same session_id / agent_kind / wire_api
  - same call_count, total_input_tokens, total_output_tokens
  - same final_finish_reason and primary model
  - differing (client_ip, server_ip) view
  - |start_time gap| ≤ 100ms

Role:
  - mirror (same packet on br0 + docker0) when both start and end times
    agree within 500us
  - strict nesting (real proxy hop) when outer.start ≤ inner.start and
    outer.end ≥ inner.end
  - else: ambiguous, no pair

10 unit tests cover both real-data scenarios and the non-pair cases
(cross-session, same view, time-gap exceeded, tokens differ, ambiguous
non-nesting).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the background sweeper that scans recently-finalized turns,
classifies pairs via ts_turn::pair_all, and writes pair_id/role/peer
back via update_turn_metadata. Spawned alongside the storage sink in
pipeline.rs — one sweeper per process, owns its own Arc<dyn
StorageBackend>.

StorageBackend trait gains two methods with safe defaults so mock
backends don't need to change:

  - query_pair_candidates(start_us, end_us) → light projection of
    agent_turns rows whose metadata.proxy.role is unset (idempotent
    sweep guard)
  - update_turn_metadata(turn_id, patch) → shallow top-level JSON merge
    into agent_turns.metadata (no schema change; metadata is already a
    VARCHAR holding JSON)

DuckDB implementation:
  - SELECT projects via json_extract_string(metadata, '$.proxy.role')
  - UPDATE is read-modify-write to preserve any pre-existing metadata
    keys; no-op when turn_id is absent (sweeper races finalization)

Default schedule: 2s interval, 5min lookback. The lookback comfortably
exceeds tracker grace (1s) + storage flush jitter (~100ms) so neither
leg of a pair can land late enough to miss its peer.

Tests:
  - ts-storage pair_sweeper: 3/3 (matched pair, role assignment matches
    real wuneng haproxy_glm5 shape, lone turn ignored)
  - ts-storage-duckdb turns: pair_candidates returns only unpaired,
    update_turn_metadata merges with existing keys, noop on missing
    row

Workspace: 815+ unit tests all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
API surface for pair-folded turn list. By default, /api/agent-turns
hides the leg the pair sweeper marked hidden (proxy_out /
mirror_secondary) — one logical call collapses to one row. Pass
?include_proxy_hops=true to surface every captured row for diagnostics.

  - TurnListItem gains proxy_role + proxy_peer_turn_id (skip_serializing
    when absent → direct turns serialize unchanged)
  - TurnsQuery + TurnsParams gain include_proxy_hops: bool (default
    false)
  - query_turns DuckDB SELECT projects metadata; row reader parses
    metadata.proxy.{role, peer_turn_id}
  - WHERE clause adds the hide-by-default filter via
    json_extract_string(metadata, '$.proxy.role')

Tests: new query_turns_hides_proxy_hops_by_default_and_surfaces_them_with_flag
exercises both default-hide and include-flag, asserting field
propagation and total-count consistency. Workspace test suite stays
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-visible fold for llmproxy duplicates. The Agent Turns list now:

  - Renders a small inline badge next to the Time column on rows the
    backend marked as proxy_in / mirror_primary (e.g. "↔ via proxy").
    Hover shows the peer turn_id for navigation.
  - Adds a "Show proxy hops" checkbox in the filter bar. Off by default
    (collapsed view = single row per logical call); when on, the hidden
    proxy_out / mirror_secondary peer surfaces too, getting its own
    "proxy hop" / "mirror copy" badge.
  - Sticky in the URL as ?show_hops=1 so a shared link preserves the
    user's view choice.

AgentTurnListItem in types/api.ts gains optional proxy_role /
proxy_peer_turn_id matching the backend additions; useAgentTurns hook
forwards includeProxyHops to the API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Field-tuning the default after deploying on wuneng. The metadata.proxy.role
IS NULL filter keeps already-paired turns out of every sweep so a wider
lookback has bounded per-tick cost — the only thing 30min buys us is
backfilling pairs that took a turn to flush from one shard before the
peer landed in another. 5min was tight enough to miss real haproxy_glm5
peers spread across shards in production traffic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the 2-member pair model with arbitrary-size ProxyGroup so the
haproxy_glm5 case — host-IP view + docker-IP view + real upstream
forward, all three captured under the same session — collapses into ONE
row in the default list. Previously the greedy "closest peer first"
rule paired the 0ms mirror and left the real-hop leg unpaired.

ts-turn proxy_pair rewritten:
  - PairAssignment → ProxyGroup{members: Vec<GroupMember>}
  - pair_all → group_all: bucket by content fingerprint, time-cluster
    within 100ms, pick canonical = widest-span (lex tiebreak), assign
    per-member roles (mirror_secondary for time-tied peers, proxy_out
    for nested peers, ambiguous-time peers dropped). Canonical role
    upgrades to proxy_in whenever the group contains any proxy_out;
    falls back to mirror_primary for pure-mirror groups.
  - metadata_for emits both peer_turn_ids (full list, sorted lex) and
    peer_turn_id (first peer, for pre-N-leg API consumers).

ts-storage pair_sweeper: SweepStats now reports both pairs_assigned
(group count = duplicate calls folded) and turns_tagged (per-row
metadata writes — distinguishes "1 fat 3-leg group" from "3 mirror
pairs" in metrics).

API:
  - TurnListItem gains proxy_peer_turn_ids: Option<Vec<String>>;
    proxy_peer_turn_id retained as the first peer for backward compat.
  - DuckDB row reader extracts both forms.

Console:
  - AgentTurnListItem mirrors the schema.
  - ProxyBadge tooltip lists every peer; label shows "(+N hops)" when
    the group has more than one peer.

Tests:
  - ts-turn proxy_pair: 11 unit tests including the verified
    haproxy_three_leg_collapses_into_single_group scenario (a_br0
    canonical = proxy_in, b_dock0 = mirror_secondary, c_hop =
    proxy_out, all sharing one group_id).
  - ts-storage pair_sweeper: 4 unit tests including 3-leg
    metadata-patch correctness.
  - Workspace test suite: green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /api/agent-turns/{id}/proxy-view and a "Proxy View" tab on the
Agent Turn detail panel, gated on the turn being part of a proxy group
(metadata.proxy.role set).

The endpoint aggregates every member of the group:
  - Per-member snapshot (client/server IP, ports, role, e2e latency,
    request_model, wire_api, raw request + response headers parsed
    from the stored JSON blob).
  - Header diff across legs, with three kinds:
      * common — same (name, value) in every leg (collapsed in UI)
      * modified — every leg sent it but the proxy rewrote the value
        (e.g. Host)
      * per_leg — only some legs carry it (e.g. x-litellm-call-id on
        proxy_in, anthropic-request-id on proxy_out)
    Names match case-insensitively; canonical-case spelling preserved.
  - Optional model_rewrite when the canonical and upstream legs'
    request bodies advertise different `model` field values.
  - Latency breakdown: client_observed_ms − upstream_observed_ms =
    proxy_overhead_ms when both are available.

UI (proxy-view-tab.tsx) renders, in order:
  - Topology row per leg with role chip + IP:port + e2e latency
  - Latency breakdown 3-stat card
  - Model rewrite banner when present
  - Response header diff (modified + per-leg expanded by default, common
    collapsed under <details>)
  - Request header diff (secondary; usually just Host rewrite)

Backend tests (7): header diff classification (common/modified/per_leg),
case-insensitive header matching, model rewrite detect/none, latency
breakdown happy + mirror-only-without-overhead path, body model
extraction edge cases, headers JSON parse round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous proxy-view commit added the handler but missed the
.route(...) registration in lib.rs, so the endpoint fell through to
the SPA index. Adds the missing line right next to /calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces "<N>-leg via proxy" / "mirrored" chip under the duration in
the GanttNav header whenever the turn is part of a proxy group. Tells
the user upfront — without opening the Proxy view tab — that the
timeline they're looking at is one captured vantage point of a larger
group.

Extracted readProxyMeta / proxyGroupSize into lib/proxy-meta.ts so the
same JSON-walking logic serves both the detail panel tab gate and the
GanttNav badge. ProxyBadge in the list page intentionally keeps reading
the flat proxy_role field (it's already projected by the list API; no
need to re-parse metadata).

Tooltip on the chip lists every peer turn_id so the user can copy one
out and navigate to it manually.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same logical LLM call captured twice — once at the LiteLLM listener
(client_port:LITELLM_PORT, e.g. :4000) and once at LiteLLM's outbound
to the real upstream (client_port:UPSTREAM_PORT, e.g. :9008) — both
landed in the same agent_turn as separate llm_calls rows. The turn
detail panel rendered all of them, so a 12-call agent run showed 24
steps in the timeline and 24 CallCards on the right.

Adds a client-side grouping in lib/call-pair.ts that mirrors the
backend turn-level rule (same fingerprint + ≤100ms time window +
distinct (client:port, server:port)), surfaces the canonical leg as
the visible row, hides the proxy hops by default. A 'Show proxy hops
(N)' toggle in the tab bar flips back to the raw view. Canonical
CallCards get a small '+N' chip in the header.

State is lifted to AgentTurnDetailPanel so GanttNav and the CallCard
list stay in sync — the timeline bars match the cards.

No backend / schema change: llm_calls has no metadata column today,
and adding one for purely-presentational folding would be heavy. The
trade-off is that agent_turns.call_count still reports the raw count;
surfacing a 'logical' count is a follow-up if it matters.

Tests: 8 unit tests in lib/call-pair.test.ts covering the 2-leg
client→litellm pair (using the user's verified data shape), 3-leg
haproxy br0+docker0+upstream, time-gap rejection, content-fingerprint
rejection, same-view rejection, order preservation, and pure direct
calls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…URLs

Live wuneng data showed every captured pair failed to fold because the
client SDK sent /v1/chat/completions to LiteLLM (port 4000) while
LiteLLM forwarded the bare /chat/completions to the upstream (port
9008). Including the path in the content fingerprint dropped the pair
rate to ~0.

Tokens + model + wire_api + status + finish + stream-flag is
sufficient content equivalence — matches what the backend
proxy_pair::group_all rule on turns has always used.

Regression test added in call-pair.test.ts using the exact path-pair
shape (/chat/completions vs /v1/chat/completions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related additions when the turn isn't itself part of a backend
proxy group but its calls were captured at multiple vantages:

GanttNav (Timeline sidebar)
  - Canonical bars with folded hops now carry a thin blue underline
    sized to the same span as the main bar — reads as a 'shadow' of
    the leg.
  - The latency column shows a small Layers icon next to the ms count
    on the canonical row.
  - Border-left flips blue (low-prio relative to slow/error tones) to
    catch the eye in long timelines.

Proxy view tab (re-enabled for in-turn case)
  - Tab gate widened from `proxyRole` only to `proxyRole || hopCount > 0`.
  - ProxyViewTab takes `hasBackendPair` + `canonicalCalls` +
    `hopsByCanonical`. When the backend hasn't paired the turn but
    the client-side fold caught duplicates, it renders the new
    InTurnProxyView instead of fetching /proxy-view.
  - InTurnProxyView lays out one card per canonical-with-hops,
    showing each leg's 5-tuple + e2e latency + per-hop overhead delta
    (canonical e2e − hop e2e) + model-rewrite chip when the model
    field differs.
  - Header-diff (response x-litellm-* etc.) deferred for in-turn —
    would require parsing the stored headers JSON client-side; v1
    surfaces topology + timing + model which covers the user's most
    common question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llmproxy-pair-detection

# Conflicts:
#	console/src/components/layout/sidebar.tsx
In-session clicks on agent-turn rows write ?selected_at=<unix_s> to the
URL so a subsequent share-link recipient can recover the item's window.
But useToolbarUrlSync was running applySelectedAtAnchor on EVERY
searchParams change — every click → URL update → URL→store effect
re-runs → helper sees that 'now' has advanced a few seconds → the
just-clicked item falls outside the (slightly-newer) preset window
→ window auto-shifts → list goes empty.

Gate the anchor with a useRef so it fires once per mount of the
AppLayout (which mounts useToolbarUrlSync). External shared links still
get the rescue behavior — the helper runs on the FIRST hydration of
that fresh load. After that, the URL → store sync no longer touches the
toolbar window in response to selected_at changes.

The existing applySelectedAtAnchor unit tests cover the rescue
semantic and still pass; this fix is purely about when the helper
gets called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live wuneng data shows a 4-leg topology where LiteLLM advertises
`glm5` (alias) to the client and rewrites it to `GLM-5.1` for the
upstream. Leg 1 carries the alias; legs 2-4 carry the rewritten name.
With `model` in the content key, leg 1 never clusters with the
others — the user still sees the alias-leg as a duplicate row.

Drop `model` from contentKey (same fix as `request_path` earlier).
Tokens + wire_api + finish + status + stream-flag is sufficient
content equivalence. Model rewrite is intentionally NOT pairing-key
material because it's exactly what the Proxy view tab exists to
display per-leg.

Tests: + pairs-even-when-model-differs (the 2-leg shape) and the
full 4-leg topology from the user's reported case
(019e3edf-…/seq=1..4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rint

LiteLLM (and similar LLM proxies) translate API styles across the
client/upstream boundary. Live wuneng setups include the Anthropic
→ OpenAI bridge: client SDK speaks /v1/messages with finish_reason=
end_turn, LiteLLM forwards /v1/chat/completions with finish_reason=
stop. All three of wire_api, final_finish_reason, and primary_model
translate alongside each other, so requiring them to match dropped
the pair rate on those topologies to zero.

Frontend lib/call-pair.ts::contentKey: drop wire_api + finish_reason
(model + request_path were already out). Remaining keys: is_stream,
status_code, input_tokens, output_tokens. Combined with the 100ms
time window and the distinct-5-tuple requirement, false positives
are still effectively nil — these are the API-format-invariant
fields proxies pass through unchanged.

Backend ts-turn::proxy_pair::content_fingerprint: drop wire_api,
final_finish_reason, primary_model. Remaining keys: session_id (the
strongest signal — agent profiles content-hash on first user
message), agent_kind, call_count, total_input_tokens,
total_output_tokens.

Tests: + frontend pairs-across-api-styles (Anthropic ingress, OpenAI
upstream), + backend pairs_across_api_style_translation matching the
same scenario at turn level.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Agent run failed (exit 1). See workflow logs.


🤖 Reviewed by viviworkflow run

github-actions[bot]
github-actions Bot previously approved these changes May 21, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I have enough information to write the review.

Summary

PR #22 introduces a comprehensive LLM proxy pair-detection system with a background sweeper, multi-leg folding, and a proxy-view tab that surfaces what proxies mutated across captured legs. The implementation is thorough—spanning Rust storage layer, API routes, and React components—with strong test coverage for the pairing algorithm. The design cleanly handles the haproxy 3-leg case (mirror pair + upstream hop) and correctly excludes translating fields from the content fingerprint. One schema drift between Rust and TypeScript must be fixed before merge. APPROVE after fixing the blocking issue.

Blocking

  • console/src/types/api.ts:90AgentTurnListItem missing source_id field. The Rust TurnListItem (server/ts-storage/src/query.rs:230) includes pub source_id: String, and the API handler (server/ts-storage-duckdb/src/turns.rs:322-323) populates it from the database row. The TypeScript interface silently ignores this field, but it should be mirrored for type safety. Add: source_id: string after turn_id: string in the interface.

Suggestions

  • server/ts-api/src/routes/agent_turns.rs:347-357 — The proxy_view endpoint fetches each peer's role individually via query_turn_by_id. For groups with 3+ members this is fine, but it could batch-fetch all peer IDs in one query (similar to query_calls_by_ids). Low priority since proxy groups are tiny in practice.

  • server/ts-storage-duckdb/src/turns.rs:204-254 — SQL string interpolation for filter lists uses manual quote escaping via replace('\'', "''"). While safe for current inputs, consider using parameterized queries with unnest(?::varchar[]) for the IN lists. Not urgent—DuckDB's prepared statement cache makes this acceptable.

  • console/src/lib/call-pair.ts:173-174MIRROR_TIME_TOLERANCE_MS is declared but intentionally unused (silenced via void). The comment explains it's kept for future role-naming work, but an unused constant with explicit silencing is confusing. Either implement the mirror-role classification or remove the constant until needed.

Questions

  • Why does the TS AgentTurnListItem omit source_id? The Rust struct includes it, and the DB query returns it (turns.rs:284-289, column 1). Was this an intentional omission or an oversight during schema sync?

Verified

  • Schema mirror: Verified TurnListItem Rust↔TS alignment. Found missing source_id in TS interface.
  • SQL injection safety: Manual quote escaping in filter construction is safe for current inputs (turns.rs:204-254).
  • No body-scan smell: Confirmed no LENGTH(body), MAX(body), or arg_max(body, ...) patterns in storage layer.
  • Query key completeness: useAgentTurns includes includeProxyHops in queryKey (use-agent-turns.ts:32). All other hooks include their filter params.
  • Route registration: /settings route registered in app.tsx:44.
  • Pair sweeper lookback: Default is 1800s (30 min) per pair_sweeper.rs:63, matching the commit message.
  • Caller compatibility: query_pair_candidates and update_turn_metadata added to StorageBackend trait with default implementations (backend.rs:121-142)—existing in-memory test backends still compile.

🤖 Reviewed by viviworkflow run

@vaderyang vaderyang dismissed github-actions[bot]’s stale review May 21, 2026 08:55

The merge-base changed after approval.

Vader Yang and others added 2 commits May 21, 2026 17:06
…tection

# Conflicts:
#	console/src/hooks/use-url-sync.ts
#	console/src/pages/agent-turns.tsx
#	console/src/pages/llm-calls.tsx
…torage

PR#19 added `query_distinct_agent_kinds` to the `StorageBackend`
trait for the agent-kind filter dropdown. After merging main
into this branch the pair_sweeper test stub was missing it, so
`cargo test` failed with E0046.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vader Yang and others added 2 commits May 21, 2026 18:46
…tection

# Conflicts:
#	console/src/components/turn-detail/call-card.tsx
#	console/src/pages/agent-turn-detail-panel.tsx
PR#23 added an `include_bodies: bool` parameter to
`StorageBackend::query_turn_calls` and `query_calls_by_ids` for
lite-mode bodies. After merging main, the pair_sweeper test stub
still had the old 2-arg signatures (E0050).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds proxy duplicate detection and folding across 21 commits, spanning backend (Rust), storage (DuckDB), and console (React). The core feature is identifying LLM calls captured at multiple network vantage points (haproxy/LiteLLM, multi-interface capture) and collapsing them into a single user-visible row with an optional "Proxy View" tab showing header diffs, model rewrites, and latency breakdown. The implementation is thorough, well-tested, and follows the repo's patterns. The only substantive concern is stale JSDoc. APPROVE with one minor fix.

Blocking

None.

Suggestions

  • console/src/lib/call-pair.ts:68-70 — JSDoc says the content fingerprint includes "wire_api, model, tokens, finish, stream flag, status code, path" but the actual implementation (lines 84-89) excludes wire_api, model, finish_reason, and request_path (all dropped in commits 15, 18, 19). Update the comment to match the final fingerprint: is_stream, status_code, input_tokens, output_tokens. Stale docs mislead maintainers.

Questions

  • server/ts-turn/src/proxy_pair.rs:228-229 — The time_clusters function measures the gap from the latest member's start_time, not the first (set[i].start_time_us - last_start <= MAX_REQ_TIME_GAP_US). This is intentional per the docstring ("grow a cluster as long as the next turn falls within MAX_REQ_TIME_GAP_US of the latest member's start_time"), but it permits a cluster to span >100ms if a chain of turns arrives sequentially (e.g., t0=0, t1=90, t2=180 → cluster includes all three because each adjacent pair is ≤100ms apart). Is this the intended greedy behavior for dense call patterns? If so, no change needed — the tests validate it.

Verified

  • Schema mirror: Rust TurnListItem (ts-storage/src/query.rs:255-269) matches TS AgentTurnListItem (types/api.ts:114-121). All three proxy fields (proxy_role, proxy_peer_turn_id, proxy_peer_turn_ids) align with correct types and serde annotations.
  • SQL injection: agent_kind IN and client_ip IN clauses use format!("'{}'", s.replace('\'', "''")) — standard DuckDB escaping, safe.
  • queryKey: useAgentTurnProxyView uses ["agent-turn-proxy-view", id] — includes the varying input.
  • Caller compatibility: The new include_proxy_hops param has a safe default (false) and only affects /api/agent-turns list queries; detail endpoints unaffected.
  • API route: /api/agent-turns/{id}/proxy-view registered in ts-api/src/lib.rs:138-141.

🤖 Reviewed by viviworkflow run

@github-actions github-actions Bot merged commit e84ad05 into main May 21, 2026
1 check passed
@github-actions github-actions Bot deleted the feat/llmproxy-pair-detection branch May 21, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant