feat(services): Path view + Overview agent charts (deploy roll-up) by vaderyang · Pull Request #27 · Netis/TokenScope

vaderyang · 2026-05-20T07:34:57Z

Integration / deploy branch that stacks the open Services-page work
(PR #25) and the LLM-proxy pair-detection work (PR #22), then adds
six new commits on top that make up this PR's reviewable change:

Commit	What
`6882cd3`	Path view — new tab on `/services` that renders the service→service topology as a directed SVG graph. Backend `GET /api/services/topology` returns `{nodes, edges}` (proxy edges from pair sweeper + synthetic `__clients__` edges into entry-point services).
`7b2f0aa`	Inferred edges — when an inbound `client_ip` matches the `server_ip` of a known service (e.g. LiteLLM forwarding without a pair-sweeper match), draw the edge from that service instead of the anonymous clients node. Dashed-blue line, distinct from solid-blue proxy edges.
`bf4887f`	Perf — 7d window 10× faster. `arg_max(body, LENGTH(body))` over a 7-day window scanned 5+ GB of bodies (17 s on prod); replaced with a clipped 24 h window + `ROW_NUMBER() OVER (...) WHERE rn <= 5` top-N sampling + body-shape filtering in Rust. Services 17.8 s → 1.5 s; topology 12.9 s → 1.3 s.
`fea1d83`	Drop weak classifier rule. "uvicorn + ≥ 3 distinct models → litellm" was window-width-sensitive: a vLLM serving Qwen3.5-35B picks up stray model names over 7 d and gets flipped to LiteLLM. Removed; real LiteLLM still detected via `x-litellm-*` header.
`3b35166`	Model view as a tab in Services, sidebar tidy. Models entry removed (now reachable as Services → Model tab); `/models` route still resolves. Sidebar "Traffic" relabelled "Usage" (route unchanged).
`8e191a1`	Overview agent charts. New endpoints `GET /api/agent-turns/summary` and `GET /api/agent-turns/activity` aggregate `agent_turns` by `agent_kind`. Two recharts on Overview: stacked-area activity timeseries and horizontal-bar distribution.

Stack note

The 31 commits below 6882cd3 come from the open base PRs:

PR feat: fold llmproxy duplicate turns by passive pair detection #22 — feat: fold llmproxy duplicate turns by passive pair detection (provides metadata.proxy.{role,pair_id,peer_turn_ids} used by the Path view's proxy edges)
PR feat(services): per-endpoint Services page (server_ip:port → models + perf) #25 — feat(services): per-endpoint Services page (the Table view this PR's Path tab sits alongside)

Once #22 and #25 land, this PR's effective diff against main will be the six commits above.

Verification (live on wuneng)

Endpoint	Latency	Output
`GET /api/services?start&end` (7d)	1.5 s (was 17.8 s)	22 service rows
`GET /api/services/topology?start&end` (7d)	1.3 s (was 12.9 s)	10 nodes, 13 edges
`GET /api/agent-turns/summary?start&end` (1d)	< 100 ms	3 agent_kinds (generic 68 538, hermes 105, openclaw 84)
`GET /api/agent-turns/activity?start&end` (1d)	< 200 ms	108 buckets

Path view edge break-down on prod (last hour):

proxy   (5)  127.0.0.1:4000 (litellm) → multiple sglang/vllm backends
proxy   (1)  172.16.103.81:9000 → 172.17.0.4:30000  (haproxy → docker sglang)
inferred(2)  127.0.0.1:4000 (litellm) → 127.0.0.1:9000 (sglang)   etc.
client  (6)  __clients__ → entry-point services

Test plan

Open /services → Table view loads in ~1.5 s on 7d window.
Switch to Path tab — graph renders with three edge styles + count labels.
Switch to Model tab — embedded ModelsPage works as before.
Sidebar shows "Usage" (not "Traffic"); Models entry gone.
Overview shows Agent Activity (stacked area) and Agent Distribution (horizontal bar) between the latency row and the model panels.
cargo test -p ts-storage-duckdb apps passes (21 tests, classifier fallback removed).

🤖 Generated with Claude Code

…d link Builds on the previous PR (selected id in URL): copying a list page URL like `?preset=15m&selected=<id>` and opening it half an hour later would compute `start=now-15m, end=now` from scratch — the selected item is no longer in that window. The detail panel still loaded (it queries by id) but the list behind it showed an unrelated slice, the row had no highlight, prev/next disabled. Fix the window without changing the original tab's behaviour: 1. List pages also write `?selected_at=<unix_s>` when an item is selected — taken from the item's start_time (agent turns) or request_time (llm calls / http exchanges). Cleared together with `selected` when the panel is closed. 2. `useToolbarUrlSync` reads `selected_at` during hydration. If the anchor falls outside the preset-derived window, override: - keep the preset's *duration* (the original user's "show me this much context" signal), - slide so `end = anchor + 60s` (small breathing pad keeps the item from sitting flush at the edge in a desc-by-time list), - promote `preset` to `custom` so subsequent URL writes carry absolute start/end and the shift survives navigation. No-op when the anchor is inside the window, absent, unparseable, or future-dated relative to a window that already includes it. 3. Pure helper `applySelectedAtAnchor` lives in its own module (`selected-at-anchor.ts`, no `@/` aliases) so it's directly testable under bun without the toolbar-store / react-router runtime chain. 7 unit tests cover the no-op cases, the stale- preset shift, default-1h fallback, and clock-skew anchors. Effects: - Original tab: relative preset still ticks `now` as usual; no surprise switch to `custom`. - Fresh URL load: window auto-widens / slides to bracket the shared item; list, highlight, prev/next all work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ar logo Three small UI changes batched into one branch — none of them touch backend or data shapes: * Overview "Avg TPOT" KPI surfaces as "Avg TPS" with units of tok/s (= 1000 / tpot_avg_ms). TPOT itself is what the backend stores; the conversion is one division at render time. "Generation speed" reads better in a glance than "milliseconds per token". * Models table column "TPOT" → "Generation TPS", same unit swap. Sort key still points at tpot_avg under the hood but getSortValue inverts to 1000/tpot_avg so clicking the column desc gives fastest-first — matches what someone clicking "Generation TPS" expects. * Agent Turns table column order rewritten around how operators actually triage a turn: Time, Agent, Client, Calls, Status, In, Out, then the less-frequently-scanned dimensions (Model, Wire API, Server, Duration) and the long User Input preview last. * New TokenScope brand mark replaces the bare panel-toggle button at the top-left of the sidebar: - Expanded: wordmark on the left, collapse button on the right. - Collapsed: icon-only mark; click toggles to expand (the icon doubles as the expand affordance — discoverable, saves a row). Both variants share the same glyph (rounded "scope" frame containing three decreasing token bars) so they line up visually as the sidebar opens/closes. Stroke uses currentColor for dark-mode and theme inheritance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every chart had its own copy of: function formatAxisTime(epoch) { const d = new Date(epoch * 1000) return `${HH}:${MM}` } Result: a 7-day window rendered ticks as a wrapping clock face ("00:00", "12:00", "00:00", "12:00", ...) with no day attached. Same problem at 24h. Easy to mis-read. Centralize the formatter in lib/format as `formatAxisTime(epoch, span)` and have it pick the right shape based on the visible window: span < 24h → HH:MM (5m / 15m / 1h / 6h presets) 24h ≤ span < 7d → MM-DD HH:MM (24h preset) span ≥ 7d → MM-DD (7d preset; time-of-day is noise when ticks come ~daily) Each chart derives span from its data (last timestamp − first), so the formatter requires no toolbar dependency and naturally handles partial ranges (e.g. tail of a 7d window after retention trimmed the head). Replaces the inline copies in: - timeseries-line-chart (Overview latency, Models, Performance) - request-volume-chart (Overview) - latency-overview-chart (Overview) - stacked-bar-chart (Performance, Traffic) 6 unit tests in lib/format.test.ts cover each duration bucket plus the 24h / 7d inclusive boundaries and the single-point fallback (span = 0 → HH:MM). Tests assert *shape* not literal values so they pass under any TZ. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds ProxyPair, ProxyRole, PairCandidate, PairAssignment with the classify_pair / pair_all entry points. No call sites yet — this is the pure-data foundation that the storage sweeper + API filter will build on. Pairing rule (verified against the haproxy_glm5 turn pair on wuneng: turns 019e3a95-bb7c-7eb3-8240-d3ecacb0c583 / d3d6fdd76249, same session gen-b93380c5210ed98a, 11345/128 tokens, start_gap 2ms / end_gap 1ms): - same session_id / agent_kind / wire_api - same call_count, total_input_tokens, total_output_tokens - same final_finish_reason and primary model - differing (client_ip, server_ip) view - |start_time gap| ≤ 100ms Role: - mirror (same packet on br0 + docker0) when both start and end times agree within 500us - strict nesting (real proxy hop) when outer.start ≤ inner.start and outer.end ≥ inner.end - else: ambiguous, no pair 10 unit tests cover both real-data scenarios and the non-pair cases (cross-session, same view, time-gap exceeded, tokens differ, ambiguous non-nesting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the background sweeper that scans recently-finalized turns, classifies pairs via ts_turn::pair_all, and writes pair_id/role/peer back via update_turn_metadata. Spawned alongside the storage sink in pipeline.rs — one sweeper per process, owns its own Arc<dyn StorageBackend>. StorageBackend trait gains two methods with safe defaults so mock backends don't need to change: - query_pair_candidates(start_us, end_us) → light projection of agent_turns rows whose metadata.proxy.role is unset (idempotent sweep guard) - update_turn_metadata(turn_id, patch) → shallow top-level JSON merge into agent_turns.metadata (no schema change; metadata is already a VARCHAR holding JSON) DuckDB implementation: - SELECT projects via json_extract_string(metadata, '$.proxy.role') - UPDATE is read-modify-write to preserve any pre-existing metadata keys; no-op when turn_id is absent (sweeper races finalization) Default schedule: 2s interval, 5min lookback. The lookback comfortably exceeds tracker grace (1s) + storage flush jitter (~100ms) so neither leg of a pair can land late enough to miss its peer. Tests: - ts-storage pair_sweeper: 3/3 (matched pair, role assignment matches real wuneng haproxy_glm5 shape, lone turn ignored) - ts-storage-duckdb turns: pair_candidates returns only unpaired, update_turn_metadata merges with existing keys, noop on missing row Workspace: 815+ unit tests all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

API surface for pair-folded turn list. By default, /api/agent-turns hides the leg the pair sweeper marked hidden (proxy_out / mirror_secondary) — one logical call collapses to one row. Pass ?include_proxy_hops=true to surface every captured row for diagnostics. - TurnListItem gains proxy_role + proxy_peer_turn_id (skip_serializing when absent → direct turns serialize unchanged) - TurnsQuery + TurnsParams gain include_proxy_hops: bool (default false) - query_turns DuckDB SELECT projects metadata; row reader parses metadata.proxy.{role, peer_turn_id} - WHERE clause adds the hide-by-default filter via json_extract_string(metadata, '$.proxy.role') Tests: new query_turns_hides_proxy_hops_by_default_and_surfaces_them_with_flag exercises both default-hide and include-flag, asserting field propagation and total-count consistency. Workspace test suite stays green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User-visible fold for llmproxy duplicates. The Agent Turns list now: - Renders a small inline badge next to the Time column on rows the backend marked as proxy_in / mirror_primary (e.g. "↔ via proxy"). Hover shows the peer turn_id for navigation. - Adds a "Show proxy hops" checkbox in the filter bar. Off by default (collapsed view = single row per logical call); when on, the hidden proxy_out / mirror_secondary peer surfaces too, getting its own "proxy hop" / "mirror copy" badge. - Sticky in the URL as ?show_hops=1 so a shared link preserves the user's view choice. AgentTurnListItem in types/api.ts gains optional proxy_role / proxy_peer_turn_id matching the backend additions; useAgentTurns hook forwards includeProxyHops to the API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Field-tuning the default after deploying on wuneng. The metadata.proxy.role IS NULL filter keeps already-paired turns out of every sweep so a wider lookback has bounded per-tick cost — the only thing 30min buys us is backfilling pairs that took a turn to flush from one shard before the peer landed in another. 5min was tight enough to miss real haproxy_glm5 peers spread across shards in production traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the 2-member pair model with arbitrary-size ProxyGroup so the haproxy_glm5 case — host-IP view + docker-IP view + real upstream forward, all three captured under the same session — collapses into ONE row in the default list. Previously the greedy "closest peer first" rule paired the 0ms mirror and left the real-hop leg unpaired. ts-turn proxy_pair rewritten: - PairAssignment → ProxyGroup{members: Vec<GroupMember>} - pair_all → group_all: bucket by content fingerprint, time-cluster within 100ms, pick canonical = widest-span (lex tiebreak), assign per-member roles (mirror_secondary for time-tied peers, proxy_out for nested peers, ambiguous-time peers dropped). Canonical role upgrades to proxy_in whenever the group contains any proxy_out; falls back to mirror_primary for pure-mirror groups. - metadata_for emits both peer_turn_ids (full list, sorted lex) and peer_turn_id (first peer, for pre-N-leg API consumers). ts-storage pair_sweeper: SweepStats now reports both pairs_assigned (group count = duplicate calls folded) and turns_tagged (per-row metadata writes — distinguishes "1 fat 3-leg group" from "3 mirror pairs" in metrics). API: - TurnListItem gains proxy_peer_turn_ids: Option<Vec<String>>; proxy_peer_turn_id retained as the first peer for backward compat. - DuckDB row reader extracts both forms. Console: - AgentTurnListItem mirrors the schema. - ProxyBadge tooltip lists every peer; label shows "(+N hops)" when the group has more than one peer. Tests: - ts-turn proxy_pair: 11 unit tests including the verified haproxy_three_leg_collapses_into_single_group scenario (a_br0 canonical = proxy_in, b_dock0 = mirror_secondary, c_hop = proxy_out, all sharing one group_id). - ts-storage pair_sweeper: 4 unit tests including 3-leg metadata-patch correctness. - Workspace test suite: green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds GET /api/agent-turns/{id}/proxy-view and a "Proxy View" tab on the Agent Turn detail panel, gated on the turn being part of a proxy group (metadata.proxy.role set). The endpoint aggregates every member of the group: - Per-member snapshot (client/server IP, ports, role, e2e latency, request_model, wire_api, raw request + response headers parsed from the stored JSON blob). - Header diff across legs, with three kinds: * common — same (name, value) in every leg (collapsed in UI) * modified — every leg sent it but the proxy rewrote the value (e.g. Host) * per_leg — only some legs carry it (e.g. x-litellm-call-id on proxy_in, anthropic-request-id on proxy_out) Names match case-insensitively; canonical-case spelling preserved. - Optional model_rewrite when the canonical and upstream legs' request bodies advertise different `model` field values. - Latency breakdown: client_observed_ms − upstream_observed_ms = proxy_overhead_ms when both are available. UI (proxy-view-tab.tsx) renders, in order: - Topology row per leg with role chip + IP:port + e2e latency - Latency breakdown 3-stat card - Model rewrite banner when present - Response header diff (modified + per-leg expanded by default, common collapsed under <details>) - Request header diff (secondary; usually just Host rewrite) Backend tests (7): header diff classification (common/modified/per_leg), case-insensitive header matching, model rewrite detect/none, latency breakdown happy + mirror-only-without-overhead path, body model extraction edge cases, headers JSON parse round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous proxy-view commit added the handler but missed the .route(...) registration in lib.rs, so the endpoint fell through to the SPA index. Adds the missing line right next to /calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…at/llmproxy-pair-detection

…into feat/llmproxy-pair-detection

…at/llmproxy-pair-detection

…eat/llmproxy-pair-detection

Surfaces "<N>-leg via proxy" / "mirrored" chip under the duration in the GanttNav header whenever the turn is part of a proxy group. Tells the user upfront — without opening the Proxy view tab — that the timeline they're looking at is one captured vantage point of a larger group. Extracted readProxyMeta / proxyGroupSize into lib/proxy-meta.ts so the same JSON-walking logic serves both the detail panel tab gate and the GanttNav badge. ProxyBadge in the list page intentionally keeps reading the flat proxy_role field (it's already projected by the list API; no need to re-parse metadata). Tooltip on the chip lists every peer turn_id so the user can copy one out and navigate to it manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same logical LLM call captured twice — once at the LiteLLM listener (client_port:LITELLM_PORT, e.g. :4000) and once at LiteLLM's outbound to the real upstream (client_port:UPSTREAM_PORT, e.g. :9008) — both landed in the same agent_turn as separate llm_calls rows. The turn detail panel rendered all of them, so a 12-call agent run showed 24 steps in the timeline and 24 CallCards on the right. Adds a client-side grouping in lib/call-pair.ts that mirrors the backend turn-level rule (same fingerprint + ≤100ms time window + distinct (client:port, server:port)), surfaces the canonical leg as the visible row, hides the proxy hops by default. A 'Show proxy hops (N)' toggle in the tab bar flips back to the raw view. Canonical CallCards get a small '+N' chip in the header. State is lifted to AgentTurnDetailPanel so GanttNav and the CallCard list stay in sync — the timeline bars match the cards. No backend / schema change: llm_calls has no metadata column today, and adding one for purely-presentational folding would be heavy. The trade-off is that agent_turns.call_count still reports the raw count; surfacing a 'logical' count is a follow-up if it matters. Tests: 8 unit tests in lib/call-pair.test.ts covering the 2-leg client→litellm pair (using the user's verified data shape), 3-leg haproxy br0+docker0+upstream, time-gap rejection, content-fingerprint rejection, same-view rejection, order preservation, and pure direct calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…URLs Live wuneng data showed every captured pair failed to fold because the client SDK sent /v1/chat/completions to LiteLLM (port 4000) while LiteLLM forwarded the bare /chat/completions to the upstream (port 9008). Including the path in the content fingerprint dropped the pair rate to ~0. Tokens + model + wire_api + status + finish + stream-flag is sufficient content equivalence — matches what the backend proxy_pair::group_all rule on turns has always used. Regression test added in call-pair.test.ts using the exact path-pair shape (/chat/completions vs /v1/chat/completions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related additions when the turn isn't itself part of a backend proxy group but its calls were captured at multiple vantages: GanttNav (Timeline sidebar) - Canonical bars with folded hops now carry a thin blue underline sized to the same span as the main bar — reads as a 'shadow' of the leg. - The latency column shows a small Layers icon next to the ms count on the canonical row. - Border-left flips blue (low-prio relative to slow/error tones) to catch the eye in long timelines. Proxy view tab (re-enabled for in-turn case) - Tab gate widened from `proxyRole` only to `proxyRole || hopCount > 0`. - ProxyViewTab takes `hasBackendPair` + `canonicalCalls` + `hopsByCanonical`. When the backend hasn't paired the turn but the client-side fold caught duplicates, it renders the new InTurnProxyView instead of fetching /proxy-view. - InTurnProxyView lays out one card per canonical-with-hops, showing each leg's 5-tuple + e2e latency + per-hop overhead delta (canonical e2e − hop e2e) + model-rewrite chip when the model field differs. - Header-diff (response x-litellm-* etc.) deferred for in-turn — would require parsing the stored headers JSON client-side; v1 surfaces topology + timing + model which covers the user's most common question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…llmproxy-pair-detection # Conflicts: # console/src/components/layout/sidebar.tsx

In-session clicks on agent-turn rows write ?selected_at=<unix_s> to the URL so a subsequent share-link recipient can recover the item's window. But useToolbarUrlSync was running applySelectedAtAnchor on EVERY searchParams change — every click → URL update → URL→store effect re-runs → helper sees that 'now' has advanced a few seconds → the just-clicked item falls outside the (slightly-newer) preset window → window auto-shifts → list goes empty. Gate the anchor with a useRef so it fires once per mount of the AppLayout (which mounts useToolbarUrlSync). External shared links still get the rescue behavior — the helper runs on the FIRST hydration of that fresh load. After that, the URL → store sync no longer touches the toolbar window in response to selected_at changes. The existing applySelectedAtAnchor unit tests cover the rescue semantic and still pass; this fix is purely about when the helper gets called. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live wuneng data shows a 4-leg topology where LiteLLM advertises `glm5` (alias) to the client and rewrites it to `GLM-5.1` for the upstream. Leg 1 carries the alias; legs 2-4 carry the rewritten name. With `model` in the content key, leg 1 never clusters with the others — the user still sees the alias-leg as a duplicate row. Drop `model` from contentKey (same fix as `request_path` earlier). Tokens + wire_api + finish + status + stream-flag is sufficient content equivalence. Model rewrite is intentionally NOT pairing-key material because it's exactly what the Proxy view tab exists to display per-leg. Tests: + pairs-even-when-model-differs (the 2-leg shape) and the full 4-leg topology from the user's reported case (019e3edf-…/seq=1..4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rint LiteLLM (and similar LLM proxies) translate API styles across the client/upstream boundary. Live wuneng setups include the Anthropic → OpenAI bridge: client SDK speaks /v1/messages with finish_reason= end_turn, LiteLLM forwards /v1/chat/completions with finish_reason= stop. All three of wire_api, final_finish_reason, and primary_model translate alongside each other, so requiring them to match dropped the pair rate on those topologies to zero. Frontend lib/call-pair.ts::contentKey: drop wire_api + finish_reason (model + request_path were already out). Remaining keys: is_stream, status_code, input_tokens, output_tokens. Combined with the 100ms time window and the distinct-5-tuple requirement, false positives are still effectively nil — these are the API-format-invariant fields proxies pass through unchanged. Backend ts-turn::proxy_pair::content_fingerprint: drop wire_api, final_finish_reason, primary_model. Remaining keys: session_id (the strongest signal — agent profiles content-hash on first user message), agent_kind, call_count, total_input_tokens, total_output_tokens. Tests: + frontend pairs-across-api-styles (Anthropic ingress, OpenAI upstream), + backend pairs_across_api_style_translation matching the same scenario at turn level. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Clicking on a turn with hundreds of agentic iterations would freeze the browser. The /api/agent-turns/{id}/calls endpoint returned every call's full request_body + response_body + headers; an 878-call turn on real data lands a 168 MB JSON response that the browser can't parse or render. Fix is in two parts that ship together: **Server** (StorageBackend trait + DuckDB impl + API route) - `query_turn_calls(turn_id, include_bodies: bool)` and `query_calls_by_ids(call_ids, include_bodies: bool)` now accept a flag. When false, the SQL projection selects `NULL::VARCHAR` for the four heavy fields — DuckDB never reads the body pages off disk and they don't transfer to Rust as Strings. - New `?lite=1` query param on `GET /api/agent-turns/{id}/calls` flips `include_bodies = false`. Default behavior unchanged for every existing caller. - `tokens_estimated` derivation falls back to `false` in lite mode (it inspects response_body); documented on the trait. **Console** (auto-opt-in for large turns + lazy-load on expand) - `useAgentTurnCalls(id, lite)` passes `?lite=1` when caller asks. - `AgentTurnDetailPanel` watches `turn.call_count`; above 200 it flips lite mode on. Renders a small amber banner so the user knows bodies are being lazy-loaded. - `CallCard` lazy-fetches `/api/llm-calls/{id}` only when the user expands a card whose inline bodies are null. Gated on `expanded` so a mega-turn with 800 collapsed cards doesn't fire 800 background requests at mount. - Tools index / classifier already null-safe — no extra changes. Real-world impact on the 878-call turn observed in production: list response shrinks from 168 MB to under 1 MB; detail page now loads in well under a second; expanding any single call fetches its ~190 KB of bodies independently. Tests: - ts-storage-duckdb: extended `query_turn_calls_orders_and_sequences` to assert lite mode strips all four heavy fields and preserves every other field byte-for-byte. - console: 111 existing tests pass, no behavior change for small-turn workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The three Agent Session list hooks (`useAgentSessions`, `useAgentSessionDetail`, `useSessionTurns`) were missing the `placeholderData: (prev) => prev` setting that every other list hook in the app uses (`useAgentTurns`, `useLlmCalls`, `useHttpExchanges`, `useMetrics`, etc.). Without it, every auto-refresh tick / toolbar key change wipes the query cache to undefined before the new response lands — react-query renders the loading skeleton, then the new data — and the user sees a full-page flash on every refresh while other list pages do a frame-perfect swap. Setting `placeholderData: (prev) => prev` keeps the last-known data visible while a background refetch is in flight. New data drops in when the response arrives; no skeleton, no blanked-out list. Caught by user: "Agent Session 界面每次刷新都会重刷整个页面, 而不是像其他页面一样看上去重刷幅度很小". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… perf) New "Services" page that aggregates llm_calls by the actual serving endpoint (server_ip, server_port) — answering "what's 172.16.103.81:9000 serving, and how is it performing?". Why not reuse `llm_metrics`? Its pre-aggregated grouping sets stop at `server_ip` and don't carry server_port — two vLLM instances on the same host (port 8000 / 9000) would collapse into one row. ## Backend - `ts_storage::query::ServiceRow` + `ServicesQuery` (one row per endpoint with distinct models, wire APIs, call/error counts, TTFT/E2E avg + p95, total tokens, first/last seen). - `StorageBackend::query_services` trait method + DuckDB impl. Query is `GROUP BY (server_ip, server_port)` on `llm_calls`; models / wire_apis come back as `list_distinct(array_agg(...))`, bridged to Rust as JSON strings (DuckDB rust bindings have no `FromSql for Vec<String>`). - `GET /api/services?start=&end=&sort_by=&sort_order=&limit=` serves it. `sort_by` whitelist matches the table column names. ## Console - Sidebar adds "Services" between "Models" and "Agent Sessions" with a `Server` icon. - `ServicesPage` table: Endpoint • Models (chips) • Wire APIs • Calls (+stream %) • Error % • TTFT avg/p95 • E2E avg/p95 • In/Out tokens • Last seen (relative). Headers click-to-sort in-place — no refetch on resort. - `useServices` hook follows the same `placeholderData: prev` pattern as every other list hook (no flash on refresh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…p/litellm) Adds an App column to the Services page that classifies each endpoint into one of a fixed enum from cheap wire-traffic signals. ## Signals used (highest-confidence first) | App | Signal | |-------------|--------------------------------------------------------------| | `ollama` | path `/api/chat` / `/api/generate` / `/api/tags` | | `llamacpp` | path `/completion` / `/tokenize` / `/props` (root-level) | | `litellm` | response header `x-litellm-*` OR `Server: litellm` | | `openai` | request `Host: api.openai.com` | | `anthropic` | request `Host: api.anthropic.com` | | `gemini` | request `Host: generativelanguage.googleapis.com` | | `openai-compat` | `Server: uvicorn` — vLLM and SGLang both, body sample | | | follow-up will disambiguate | | `litellm` | tiebreaker: an `openai-compat` endpoint serving ≥ 3 distinct | | | models (real signal from wuneng's 127.0.0.1:4000) | | (none) | nothing matches — UI shows muted "unknown" badge | ## Implementation - `ts-storage-duckdb/src/apps.rs` — pure-function classifier with 12 unit tests covering each rule + edge cases (Ollama compat mode serving `/v1/chat/completions`, multi-model uvicorn tiebreaker, path-wins-over-uvicorn precedence, header-absent fallback). - SQL aggregate now also pulls `arg_min(response_headers, LENGTH(...))` and the matching request_headers as a per-group sample plus `list_distinct(array_agg(request_path))[1:16]`. `arg_min` picks the shortest non-null blob deterministically — small enough that streaming it to Rust costs nothing. - New fields on `ServiceRow`: `app`, `server_header`, `request_paths`. - Console renders a colored `AppBadge` per row with a `title=Server:` tooltip so the user can sanity-check the label. ## What ships vs. follow-up vLLM and SGLang both run under uvicorn and don't have a distinctive custom header. Today they both label as `openai-compat`. A follow-up will pull one small response body per group and look for `chatcmpl-tool-<hex>` (vLLM's tool_call_id pattern, observed in production) vs. SGLang's distinct response shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Services-page aggregate uses `arg_min(headers, LENGTH(headers))` to pick one representative header sample per endpoint. Without a shape filter it picks ANY shortest non-null value — including rows where the response parser stashed an empty/corrupted string. That fed `null` (or similar) to the classifier and dropped four real endpoints (the GLM-5.1 cluster on port 9000) to `unknown` even though every other call from those endpoints carries a clean `Server: uvicorn` blob. Restrict the sample to JSON arrays of at least 30 chars (`[%` pattern). The shortest real header list captured in production is ~140 chars; 30 is a comfortable floor that excludes literal `null`, `[]`, `{}`, and any other malformed short response without losing genuine samples. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`arg_min(headers, LENGTH(headers))` was still returning NULL for endpoints with mixed-header data (e.g. SSE/streaming calls where the parser captured something the LIKE filter doesn't catch). Switch to `MAX(response_headers)` — lexicographic on a column whose values all start with `[[` makes it a stable arbitrary pick AND it doesn't have arg_min's failure mode of picking anomalously short malformed values. Filter to `[%` to guarantee the picked sample is shaped like a JSON array (drops literal "null", "{}", etc.).

Per the user's ask: every endpoint must land on a concrete label. Replace the `openai-compat` placeholder by stacking up cheap signals already present in `llm_calls`: **New SQL aggregates** (alongside the existing header / paths sample): - `list_distinct(array_agg(finish_reason))[1:32]` — distinct finish_reasons in the window - `arg_max(request_body, LENGTH(request_body))` — largest captured request body (deepest agentic history; only materialises once, length comparison is u64-cheap) - `arg_max(response_body, LENGTH(response_body))` — largest captured response body (capped at 8 KB so streamed/oversized rows don't bloat the read) **New classifier signals** (in order, highest confidence first): 1. SGLang-specific paths (`/generate`, `/health_generate`, `/get_server_info`, `/flush_cache`, `/encode`, profile endpoints). 2. vLLM-specific paths (`/version`, `/v1/score`). 3. SGLang-exclusive finish_reasons (`matched_stop`, `matched_eos`, `stop_str`) — works even when responses are SSE-streamed, since finish_reason is captured from the final SSE event regardless. 4. Response body fingerprint: - `"id":"chatcmpl-tool-…"` (vLLM's tool_call_id format) - `"system_fingerprint":"fp_…"` (vLLM only; SGLang leaves it null) 5. Request body fingerprint: `chatcmpl-tool-` substring — agentic replays carry assistant.tool_calls history back to the server, and the previous round's tool_call_id reveals vLLM. 6. Uvicorn fallback: - ≥3 models → LiteLLM (multi-model tiebreaker, real wuneng signal) - Model starts with `glm` / `deepseek` → SGLang (reference deployment) - Otherwise → vLLM (more common) Console: drop the `openai-compat` badge color since the label is no longer emitted by the classifier. 22 classifier tests (was 12) covering every new rule + the beats-the-heuristic precedence cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a tab switcher to the Services page (default Table, alternative Path). The Path view fetches a new GET /api/services/topology endpoint and renders a directed SVG graph: * Nodes are real (server_ip:server_port) endpoints — colored by app class — plus one synthetic "clients" node aggregating all upstream callers. * Edges come in two kinds: - `proxy` (solid blue) — definitive hops confirmed by the pair_sweeper (litellm -> sglang, haproxy -> docker backend, …). - `client` (dashed grey) — synthetic edges from the clients node into every service that receives non-proxy_out traffic. So even endpoints without a paired upstream still render connected. Layout is a BFS-by-depth column layout from the clients node; sibling order within a column is stable (call_count desc). Edge stroke width scales with turn_count so the hot paths stand out. Backend pieces: * ServicesTopologyQuery / TopologyNode / TopologyEdge / ServicesTopology types in ts-storage::query. * DuckDB impl in metrics.rs — two SQL passes (proxy edges from pair_sweeper-written metadata.proxy.pair_id; client entry edges from any non-proxy_out turn) joined to llm_calls for server_port (agent_turns doesn't carry it). * GET /api/services/topology route, time-windowed by ?start&end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When pair_sweeper hasn't paired a turn but the inbound `client_ip` matches the server_ip of an existing service node (the canonical "LiteLLM accepts a user call and immediately forwards to a backend" pattern), draw an "inferred" edge from the caller-service to the destination instead of routing the traffic into the anonymous `__clients__` super-node. Resolution rule when caller_ip has multiple services: prefer `litellm` first, then any other proxy-class app, then highest call_count. Skip resolution entirely when the *target* itself is a proxy (litellm/haproxy/nginx) — co-host vllm is the destination's neighbour, not its caller, and attributing inbound litellm traffic to vllm produced backwards edges before this guard was added. Inferred edges render as a dashed mid-blue line with the count label; proxy stays solid blue, client stays dashed grey. Legend updated to spell out which is which. Live data after deploy: * `127.0.0.1:4000 (litellm) → 127.0.0.1:9000 (sglang)` inferred 11 * `127.0.0.1:4000 (litellm) → 127.0.0.1:9008 (vllm)` inferred 1 * `__clients__:0 → 172.16.103.81:4210 (litellm)` client 3221 (the bulk of real user traffic into the main LiteLLM — correctly stays a client edge because the *target* is litellm) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A 7-day Services / Path query was taking 17 s on prod because the per-endpoint aggregation included `arg_max(request_body, LENGTH(request_body)) FILTER (LENGTH(body) BETWEEN ...)` — DuckDB materializes every body in the window (5+ GB on prod) just to pick one short sample per (server_ip, server_port). Split body sampling out of the main aggregation into `fetch_app_samples`: * Window clipped to last 24 h of the user's range. App classification doesn't change over the wider window, and most views are "now" so this is a no-op in practice. * Top-5 most-recent rows per endpoint via `ROW_NUMBER() OVER (PARTITION BY server_ip, server_port ORDER BY request_time DESC)` + `WHERE rn <= 5`. DuckDB ranks in place and only emits 5 rows per group — no full body scan. * Body / header shape filtering (`LIKE '{%'`, length bounds) moved to Rust. The 5 returned rows give us plenty of candidates to find a representative sample. * Distinct request_paths / finish_reasons come from a separate cheap dim query over the same clipped window. Wall-clock on prod (7 d window, 662 k llm_calls rows): before: services=17.8 s topology=12.9 s after: services= 1.5 s topology= 1.3 s App classification output unchanged for endpoints with active recent traffic (verified against the prior 1 h baseline). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "uvicorn endpoint serving ≥ 3 distinct models = LiteLLM" rule turned out to be window-width-sensitive: a vLLM serving one real model picks up 2-4 stray model names from misconfigured clients (`text-embedding-ada-002`, `test`, …) over a 7-day window and gets flipped to litellm. Verified misclassifications on prod: before after 172.17.0.7:8000 vllm ↛ litellm → vllm ✓ Qwen3.5-35B 172.17.0.9:9000 sglang ↛ litellm → sglang ✓ GLM-5.1 haproxy 172.17.0.4:30000 sglang ↛ litellm → sglang ✓ GLM-5.1 docker Real LiteLLM endpoints (.81:4210, 127.0.0.1:4000) still classify via the `x-litellm-*` response header rule. The GLM/DeepSeek model-name heuristic still routes those families to sglang on uvicorn. The fallback was the weakest signal in the chain — body / path / header evidence already weighed above is enough. Removes the rule, its dedicated unit test (sglang_via_glm_model_ heuristic already covers the case the test was using), and the table-of-rules doc-comment row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Services page gets a third tab "Model" that renders the existing ModelsPage component. The standalone `/models` route still resolves for shared links, but the sidebar entry is removed — Models was a service-level cross-cut, sitting one click deeper under Services makes the IA more honest. Sidebar tweaks: * Drop Models entry (now reachable via Services → Model tab). * Rename "Traffic" → "Usage" — clearer label for what the page actually shows (token throughput, byte volume, etc.). Route stays /traffic so existing bookmarks keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two new charts on the Overview page so the operator can see, at a glance: * Agent Activity — stacked-area timeseries of agent_turn counts, bucketed by a server-chosen window (1-min for ~1h ranges, 30-min for 1d, 4h for 30d), split by agent_kind. * Agent Distribution — horizontal bar chart of total turns per agent_kind in the selected window. Backend: * `AgentSummaryQuery` / `AgentKindSummary` and `AgentActivityQuery` / `AgentActivityPoint` types in `ts-storage::query`. * DuckDB impls in metrics.rs: `query_agent_summary` (one row per agent_kind), `query_agent_activity` (per-bucket counts split by agent_kind, bucket size auto-picked from window width). * Routes `GET /api/agent-turns/summary` + `GET /api/agent-turns/activity` (with optional `?bucket=` override). Console: * Types `AgentKindSummary` / `AgentActivityPoint` and the two response shapes. * Hooks `useAgentSummary` / `useAgentActivity` keyed on the toolbar window (placeholderData keeps prior data during refetch — no flash). * Charts `AgentActivityChart` (recharts AreaChart stacked) and `AgentDistributionChart` (recharts BarChart horizontal). * Inserted as a new row between the existing "middle row" and the model panels — same width budget as the rest of Overview. Live data after deploy (1 d window): generic=68538, hermes=105, openclaw=84 turns; 108 activity buckets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both list pages now accept a CSV `Server Port` filter in the head row alongside the existing `Client IP`. URL-serialized as `?server_port=4210,9000` so shared links carry the filter. * `llm_calls`: direct `WHERE server_port IN (...)` — fast. * `agent_turns`: `agent_turns` has no `server_port` column, so we resolve through the turn's first `call_ids` entry against `llm_calls` via an EXISTS subquery — same shortcut the topology query uses. A turn's calls almost always hit one endpoint in practice, so first-call resolution is a safe approximation. Verified live (1h window, prod): * `GET /api/llm-calls?server_port=9000` → all returned calls have server_port=9000 (GLM-5.1 sglang). * `GET /api/agent-turns?server_port=4210` → returned turns are litellm-bound (server_ip 172.16.103.81). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

Summary

Agent run failed (exit 1). See workflow logs.

🤖 Reviewed by vivi • workflow run

github-actions

Based on my review of PR #27, here are my findings:

Summary

PR #27 adds a Services Path view with topology graph, Overview agent activity/distribution charts, and delivers a critical 10× performance fix for 7-day window queries. The performance fix correctly replaces the problematic arg_max(body, LENGTH(body)) pattern with a ROW_NUMBER-based top-N approach that clips body sampling to 24h. The window-width-sensitive classifier rule was correctly removed. APPROVE — no blocking issues.

Verified

Body-scan fix: fetch_app_samples at metrics.rs:788-955 uses ROW_NUMBER() OVER (PARTITION BY server_ip, server_port ORDER BY request_time DESC) WHERE rn <= 5 + clipped 24h window — avoids the 5+ GB body materialization that caused the 17s stall. This is the canonical fix referenced in the repo's bite-history.
Schema mirror: ServiceRow Rust (query.rs:54-90) ↔ TS (api.ts:70-94) matches field-for-field: server_ip, server_port, models, wire_apis, request_paths, call_count, error_count, stream_count, total_input_tokens, total_output_tokens, ttft_avg_ms, ttft_p95_ms, e2e_avg_ms, e2e_p95_ms, first_seen_ms, last_seen_ms, app, server_header all align.
Schema mirror: AgentKindSummary Rust (query.rs:153-160) ↔ TS (api.ts:133-141) matches: agent_kind, turn_count, total_input_tokens, total_output_tokens, avg_duration_ms, last_seen_ms.
Schema mirror: AgentActivityPoint Rust (query.rs:173-178) ↔ TS (api.ts:149-153) matches: timestamp_ms, agent_kind, turn_count.
Route registration: /api/agent-turns/summary, /api/agent-turns/activity, and /api/agent-turns/{id}/proxy-view are all registered in lib.rs:139-157.
queryKey correctness: useAgentSummary, useAgentActivity, useServicesTopology all include {start, end} in their queryKeys (use-agent-overview.ts:10,21; use-services-topology.ts:11).
Classifier removal: The window-width-sensitive "uvicorn + ≥3 models → litellm" rule is removed with a clear comment at apps.rs:192-198 explaining the misclassification risk at 7d windows.

Suggestions

metrics.rs:811 — The 24h sample window clip (SAMPLE_WINDOW_US = 24 * 60 * 60 * 1_000_000) is hardcoded. If app classification ever becomes time-sensitive (e.g., an endpoint that switches serving software), this would need to expand. Currently fine — the comment at 803-804 notes app classification doesn't change over wider windows.
metrics.rs:1414-1416 — The is_proxy_app helper matches only litellm, haproxy, nginx. If another proxy type emerges (e.g., envoy), it should be added here to prevent the self-loop attribution issue described at 1420-1426.

Questions

Why does the sidebar relabel "Traffic" → "Usage" without changing the route? The commit message says "route unchanged" for bookmark compatibility, which makes sense — but the label "Usage" for /traffic might confuse users who bookmark the old label. Not a merge blocker, just a UX consistency note.

🤖 Reviewed by vivi • workflow run

…and-flash # Conflicts: # console/src/hooks/use-llm-calls.ts # console/src/hooks/use-url-sync.ts # console/src/pages/agent-turns.tsx # console/src/pages/llm-calls.tsx # console/src/pages/overview.tsx # server/ts-api/src/routes/llm_calls.rs

Vader Yang and others added 30 commits May 15, 2026 17:01

Merge remote-tracking branch 'origin/feat/selected-id-in-url' into fe…

8d6d506

…at/llmproxy-pair-detection

Merge remote-tracking branch 'origin/feat/ui-tps-and-column-reorder' …

a0dbbe2

…into feat/llmproxy-pair-detection

Merge remote-tracking branch 'origin/feat/selected-at-anchor' into fe…

33115d8

…at/llmproxy-pair-detection

Merge remote-tracking branch 'origin/feat/axis-time-multi-day' into f…

17c0da9

…eat/llmproxy-pair-detection

fix(console): drop unused ContentKey interface (vite tsc strict)

1cc5aa7

Merge remote-tracking branch 'origin/feat/settings-in-app' into feat/…

90abdfc

…llmproxy-pair-detection # Conflicts: # console/src/components/layout/sidebar.tsx

Vader Yang and others added 7 commits May 20, 2026 12:13

vaderyang mentioned this pull request May 20, 2026

feat(ci): headless PR review agent (phase 1) #28

Merged

5 tasks

github-actions Bot reviewed May 21, 2026

View reviewed changes

github-actions Bot previously approved these changes May 21, 2026

View reviewed changes

vaderyang dismissed github-actions[bot]’s stale review via d1ff46e May 21, 2026 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(services): Path view + Overview agent charts (deploy roll-up)#27

feat(services): Path view + Overview agent charts (deploy roll-up)#27
vaderyang wants to merge 39 commits into
mainfrom
feat/deploy-services-and-flash

vaderyang commented May 20, 2026

Uh oh!

github-actions Bot left a comment •

edited by vaderyang

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaderyang commented May 20, 2026

Stack note

Verification (live on wuneng)

Test plan

Uh oh!

github-actions Bot left a comment • edited by vaderyang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Summary

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Summary

Verified

Suggestions

Questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot left a comment •

edited by vaderyang

Loading