feat(services): Path view + Overview agent charts (deploy roll-up)#27
feat(services): Path view + Overview agent charts (deploy roll-up)#27vaderyang wants to merge 39 commits into
Conversation
…d link
Builds on the previous PR (selected id in URL): copying a list page
URL like `?preset=15m&selected=<id>` and opening it half an hour
later would compute `start=now-15m, end=now` from scratch — the
selected item is no longer in that window. The detail panel still
loaded (it queries by id) but the list behind it showed an unrelated
slice, the row had no highlight, prev/next disabled.
Fix the window without changing the original tab's behaviour:
1. List pages also write `?selected_at=<unix_s>` when an item is
selected — taken from the item's start_time (agent turns) or
request_time (llm calls / http exchanges). Cleared together with
`selected` when the panel is closed.
2. `useToolbarUrlSync` reads `selected_at` during hydration. If the
anchor falls outside the preset-derived window, override:
- keep the preset's *duration* (the original user's "show me this
much context" signal),
- slide so `end = anchor + 60s` (small breathing pad keeps the
item from sitting flush at the edge in a desc-by-time list),
- promote `preset` to `custom` so subsequent URL writes carry
absolute start/end and the shift survives navigation.
No-op when the anchor is inside the window, absent, unparseable,
or future-dated relative to a window that already includes it.
3. Pure helper `applySelectedAtAnchor` lives in its own module
(`selected-at-anchor.ts`, no `@/` aliases) so it's directly
testable under bun without the toolbar-store / react-router
runtime chain. 7 unit tests cover the no-op cases, the stale-
preset shift, default-1h fallback, and clock-skew anchors.
Effects:
- Original tab: relative preset still ticks `now` as usual; no
surprise switch to `custom`.
- Fresh URL load: window auto-widens / slides to bracket the
shared item; list, highlight, prev/next all work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ar logo
Three small UI changes batched into one branch — none of them touch
backend or data shapes:
* Overview "Avg TPOT" KPI surfaces as "Avg TPS" with units of tok/s
(= 1000 / tpot_avg_ms). TPOT itself is what the backend stores;
the conversion is one division at render time. "Generation speed"
reads better in a glance than "milliseconds per token".
* Models table column "TPOT" → "Generation TPS", same unit swap.
Sort key still points at tpot_avg under the hood but getSortValue
inverts to 1000/tpot_avg so clicking the column desc gives
fastest-first — matches what someone clicking "Generation TPS"
expects.
* Agent Turns table column order rewritten around how operators
actually triage a turn: Time, Agent, Client, Calls, Status, In,
Out, then the less-frequently-scanned dimensions (Model, Wire
API, Server, Duration) and the long User Input preview last.
* New TokenScope brand mark replaces the bare panel-toggle button
at the top-left of the sidebar:
- Expanded: wordmark on the left, collapse button on the right.
- Collapsed: icon-only mark; click toggles to expand (the icon
doubles as the expand affordance — discoverable, saves a row).
Both variants share the same glyph (rounded "scope" frame
containing three decreasing token bars) so they line up
visually as the sidebar opens/closes. Stroke uses currentColor
for dark-mode and theme inheritance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every chart had its own copy of:
function formatAxisTime(epoch) {
const d = new Date(epoch * 1000)
return `${HH}:${MM}`
}
Result: a 7-day window rendered ticks as a wrapping clock face
("00:00", "12:00", "00:00", "12:00", ...) with no day attached.
Same problem at 24h. Easy to mis-read.
Centralize the formatter in lib/format as `formatAxisTime(epoch, span)`
and have it pick the right shape based on the visible window:
span < 24h → HH:MM (5m / 15m / 1h / 6h presets)
24h ≤ span < 7d → MM-DD HH:MM (24h preset)
span ≥ 7d → MM-DD (7d preset; time-of-day is noise
when ticks come ~daily)
Each chart derives span from its data (last timestamp − first), so the
formatter requires no toolbar dependency and naturally handles partial
ranges (e.g. tail of a 7d window after retention trimmed the head).
Replaces the inline copies in:
- timeseries-line-chart (Overview latency, Models, Performance)
- request-volume-chart (Overview)
- latency-overview-chart (Overview)
- stacked-bar-chart (Performance, Traffic)
6 unit tests in lib/format.test.ts cover each duration bucket plus the
24h / 7d inclusive boundaries and the single-point fallback (span = 0
→ HH:MM). Tests assert *shape* not literal values so they pass under
any TZ.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ProxyPair, ProxyRole, PairCandidate, PairAssignment with the
classify_pair / pair_all entry points. No call sites yet — this is the
pure-data foundation that the storage sweeper + API filter will build on.
Pairing rule (verified against the haproxy_glm5 turn pair on wuneng:
turns 019e3a95-bb7c-7eb3-8240-d3ecacb0c583 / d3d6fdd76249, same session
gen-b93380c5210ed98a, 11345/128 tokens, start_gap 2ms / end_gap 1ms):
- same session_id / agent_kind / wire_api
- same call_count, total_input_tokens, total_output_tokens
- same final_finish_reason and primary model
- differing (client_ip, server_ip) view
- |start_time gap| ≤ 100ms
Role:
- mirror (same packet on br0 + docker0) when both start and end times
agree within 500us
- strict nesting (real proxy hop) when outer.start ≤ inner.start and
outer.end ≥ inner.end
- else: ambiguous, no pair
10 unit tests cover both real-data scenarios and the non-pair cases
(cross-session, same view, time-gap exceeded, tokens differ, ambiguous
non-nesting).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the background sweeper that scans recently-finalized turns,
classifies pairs via ts_turn::pair_all, and writes pair_id/role/peer
back via update_turn_metadata. Spawned alongside the storage sink in
pipeline.rs — one sweeper per process, owns its own Arc<dyn
StorageBackend>.
StorageBackend trait gains two methods with safe defaults so mock
backends don't need to change:
- query_pair_candidates(start_us, end_us) → light projection of
agent_turns rows whose metadata.proxy.role is unset (idempotent
sweep guard)
- update_turn_metadata(turn_id, patch) → shallow top-level JSON merge
into agent_turns.metadata (no schema change; metadata is already a
VARCHAR holding JSON)
DuckDB implementation:
- SELECT projects via json_extract_string(metadata, '$.proxy.role')
- UPDATE is read-modify-write to preserve any pre-existing metadata
keys; no-op when turn_id is absent (sweeper races finalization)
Default schedule: 2s interval, 5min lookback. The lookback comfortably
exceeds tracker grace (1s) + storage flush jitter (~100ms) so neither
leg of a pair can land late enough to miss its peer.
Tests:
- ts-storage pair_sweeper: 3/3 (matched pair, role assignment matches
real wuneng haproxy_glm5 shape, lone turn ignored)
- ts-storage-duckdb turns: pair_candidates returns only unpaired,
update_turn_metadata merges with existing keys, noop on missing
row
Workspace: 815+ unit tests all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
API surface for pair-folded turn list. By default, /api/agent-turns
hides the leg the pair sweeper marked hidden (proxy_out /
mirror_secondary) — one logical call collapses to one row. Pass
?include_proxy_hops=true to surface every captured row for diagnostics.
- TurnListItem gains proxy_role + proxy_peer_turn_id (skip_serializing
when absent → direct turns serialize unchanged)
- TurnsQuery + TurnsParams gain include_proxy_hops: bool (default
false)
- query_turns DuckDB SELECT projects metadata; row reader parses
metadata.proxy.{role, peer_turn_id}
- WHERE clause adds the hide-by-default filter via
json_extract_string(metadata, '$.proxy.role')
Tests: new query_turns_hides_proxy_hops_by_default_and_surfaces_them_with_flag
exercises both default-hide and include-flag, asserting field
propagation and total-count consistency. Workspace test suite stays
green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-visible fold for llmproxy duplicates. The Agent Turns list now:
- Renders a small inline badge next to the Time column on rows the
backend marked as proxy_in / mirror_primary (e.g. "↔ via proxy").
Hover shows the peer turn_id for navigation.
- Adds a "Show proxy hops" checkbox in the filter bar. Off by default
(collapsed view = single row per logical call); when on, the hidden
proxy_out / mirror_secondary peer surfaces too, getting its own
"proxy hop" / "mirror copy" badge.
- Sticky in the URL as ?show_hops=1 so a shared link preserves the
user's view choice.
AgentTurnListItem in types/api.ts gains optional proxy_role /
proxy_peer_turn_id matching the backend additions; useAgentTurns hook
forwards includeProxyHops to the API.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Field-tuning the default after deploying on wuneng. The metadata.proxy.role IS NULL filter keeps already-paired turns out of every sweep so a wider lookback has bounded per-tick cost — the only thing 30min buys us is backfilling pairs that took a turn to flush from one shard before the peer landed in another. 5min was tight enough to miss real haproxy_glm5 peers spread across shards in production traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the 2-member pair model with arbitrary-size ProxyGroup so the
haproxy_glm5 case — host-IP view + docker-IP view + real upstream
forward, all three captured under the same session — collapses into ONE
row in the default list. Previously the greedy "closest peer first"
rule paired the 0ms mirror and left the real-hop leg unpaired.
ts-turn proxy_pair rewritten:
- PairAssignment → ProxyGroup{members: Vec<GroupMember>}
- pair_all → group_all: bucket by content fingerprint, time-cluster
within 100ms, pick canonical = widest-span (lex tiebreak), assign
per-member roles (mirror_secondary for time-tied peers, proxy_out
for nested peers, ambiguous-time peers dropped). Canonical role
upgrades to proxy_in whenever the group contains any proxy_out;
falls back to mirror_primary for pure-mirror groups.
- metadata_for emits both peer_turn_ids (full list, sorted lex) and
peer_turn_id (first peer, for pre-N-leg API consumers).
ts-storage pair_sweeper: SweepStats now reports both pairs_assigned
(group count = duplicate calls folded) and turns_tagged (per-row
metadata writes — distinguishes "1 fat 3-leg group" from "3 mirror
pairs" in metrics).
API:
- TurnListItem gains proxy_peer_turn_ids: Option<Vec<String>>;
proxy_peer_turn_id retained as the first peer for backward compat.
- DuckDB row reader extracts both forms.
Console:
- AgentTurnListItem mirrors the schema.
- ProxyBadge tooltip lists every peer; label shows "(+N hops)" when
the group has more than one peer.
Tests:
- ts-turn proxy_pair: 11 unit tests including the verified
haproxy_three_leg_collapses_into_single_group scenario (a_br0
canonical = proxy_in, b_dock0 = mirror_secondary, c_hop =
proxy_out, all sharing one group_id).
- ts-storage pair_sweeper: 4 unit tests including 3-leg
metadata-patch correctness.
- Workspace test suite: green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /api/agent-turns/{id}/proxy-view and a "Proxy View" tab on the
Agent Turn detail panel, gated on the turn being part of a proxy group
(metadata.proxy.role set).
The endpoint aggregates every member of the group:
- Per-member snapshot (client/server IP, ports, role, e2e latency,
request_model, wire_api, raw request + response headers parsed
from the stored JSON blob).
- Header diff across legs, with three kinds:
* common — same (name, value) in every leg (collapsed in UI)
* modified — every leg sent it but the proxy rewrote the value
(e.g. Host)
* per_leg — only some legs carry it (e.g. x-litellm-call-id on
proxy_in, anthropic-request-id on proxy_out)
Names match case-insensitively; canonical-case spelling preserved.
- Optional model_rewrite when the canonical and upstream legs'
request bodies advertise different `model` field values.
- Latency breakdown: client_observed_ms − upstream_observed_ms =
proxy_overhead_ms when both are available.
UI (proxy-view-tab.tsx) renders, in order:
- Topology row per leg with role chip + IP:port + e2e latency
- Latency breakdown 3-stat card
- Model rewrite banner when present
- Response header diff (modified + per-leg expanded by default, common
collapsed under <details>)
- Request header diff (secondary; usually just Host rewrite)
Backend tests (7): header diff classification (common/modified/per_leg),
case-insensitive header matching, model rewrite detect/none, latency
breakdown happy + mirror-only-without-overhead path, body model
extraction edge cases, headers JSON parse round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous proxy-view commit added the handler but missed the .route(...) registration in lib.rs, so the endpoint fell through to the SPA index. Adds the missing line right next to /calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…at/llmproxy-pair-detection
…into feat/llmproxy-pair-detection
…at/llmproxy-pair-detection
…eat/llmproxy-pair-detection
Surfaces "<N>-leg via proxy" / "mirrored" chip under the duration in the GanttNav header whenever the turn is part of a proxy group. Tells the user upfront — without opening the Proxy view tab — that the timeline they're looking at is one captured vantage point of a larger group. Extracted readProxyMeta / proxyGroupSize into lib/proxy-meta.ts so the same JSON-walking logic serves both the detail panel tab gate and the GanttNav badge. ProxyBadge in the list page intentionally keeps reading the flat proxy_role field (it's already projected by the list API; no need to re-parse metadata). Tooltip on the chip lists every peer turn_id so the user can copy one out and navigate to it manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same logical LLM call captured twice — once at the LiteLLM listener (client_port:LITELLM_PORT, e.g. :4000) and once at LiteLLM's outbound to the real upstream (client_port:UPSTREAM_PORT, e.g. :9008) — both landed in the same agent_turn as separate llm_calls rows. The turn detail panel rendered all of them, so a 12-call agent run showed 24 steps in the timeline and 24 CallCards on the right. Adds a client-side grouping in lib/call-pair.ts that mirrors the backend turn-level rule (same fingerprint + ≤100ms time window + distinct (client:port, server:port)), surfaces the canonical leg as the visible row, hides the proxy hops by default. A 'Show proxy hops (N)' toggle in the tab bar flips back to the raw view. Canonical CallCards get a small '+N' chip in the header. State is lifted to AgentTurnDetailPanel so GanttNav and the CallCard list stay in sync — the timeline bars match the cards. No backend / schema change: llm_calls has no metadata column today, and adding one for purely-presentational folding would be heavy. The trade-off is that agent_turns.call_count still reports the raw count; surfacing a 'logical' count is a follow-up if it matters. Tests: 8 unit tests in lib/call-pair.test.ts covering the 2-leg client→litellm pair (using the user's verified data shape), 3-leg haproxy br0+docker0+upstream, time-gap rejection, content-fingerprint rejection, same-view rejection, order preservation, and pure direct calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…URLs Live wuneng data showed every captured pair failed to fold because the client SDK sent /v1/chat/completions to LiteLLM (port 4000) while LiteLLM forwarded the bare /chat/completions to the upstream (port 9008). Including the path in the content fingerprint dropped the pair rate to ~0. Tokens + model + wire_api + status + finish + stream-flag is sufficient content equivalence — matches what the backend proxy_pair::group_all rule on turns has always used. Regression test added in call-pair.test.ts using the exact path-pair shape (/chat/completions vs /v1/chat/completions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related additions when the turn isn't itself part of a backend
proxy group but its calls were captured at multiple vantages:
GanttNav (Timeline sidebar)
- Canonical bars with folded hops now carry a thin blue underline
sized to the same span as the main bar — reads as a 'shadow' of
the leg.
- The latency column shows a small Layers icon next to the ms count
on the canonical row.
- Border-left flips blue (low-prio relative to slow/error tones) to
catch the eye in long timelines.
Proxy view tab (re-enabled for in-turn case)
- Tab gate widened from `proxyRole` only to `proxyRole || hopCount > 0`.
- ProxyViewTab takes `hasBackendPair` + `canonicalCalls` +
`hopsByCanonical`. When the backend hasn't paired the turn but
the client-side fold caught duplicates, it renders the new
InTurnProxyView instead of fetching /proxy-view.
- InTurnProxyView lays out one card per canonical-with-hops,
showing each leg's 5-tuple + e2e latency + per-hop overhead delta
(canonical e2e − hop e2e) + model-rewrite chip when the model
field differs.
- Header-diff (response x-litellm-* etc.) deferred for in-turn —
would require parsing the stored headers JSON client-side; v1
surfaces topology + timing + model which covers the user's most
common question.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llmproxy-pair-detection # Conflicts: # console/src/components/layout/sidebar.tsx
In-session clicks on agent-turn rows write ?selected_at=<unix_s> to the URL so a subsequent share-link recipient can recover the item's window. But useToolbarUrlSync was running applySelectedAtAnchor on EVERY searchParams change — every click → URL update → URL→store effect re-runs → helper sees that 'now' has advanced a few seconds → the just-clicked item falls outside the (slightly-newer) preset window → window auto-shifts → list goes empty. Gate the anchor with a useRef so it fires once per mount of the AppLayout (which mounts useToolbarUrlSync). External shared links still get the rescue behavior — the helper runs on the FIRST hydration of that fresh load. After that, the URL → store sync no longer touches the toolbar window in response to selected_at changes. The existing applySelectedAtAnchor unit tests cover the rescue semantic and still pass; this fix is purely about when the helper gets called. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live wuneng data shows a 4-leg topology where LiteLLM advertises `glm5` (alias) to the client and rewrites it to `GLM-5.1` for the upstream. Leg 1 carries the alias; legs 2-4 carry the rewritten name. With `model` in the content key, leg 1 never clusters with the others — the user still sees the alias-leg as a duplicate row. Drop `model` from contentKey (same fix as `request_path` earlier). Tokens + wire_api + finish + status + stream-flag is sufficient content equivalence. Model rewrite is intentionally NOT pairing-key material because it's exactly what the Proxy view tab exists to display per-leg. Tests: + pairs-even-when-model-differs (the 2-leg shape) and the full 4-leg topology from the user's reported case (019e3edf-…/seq=1..4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rint LiteLLM (and similar LLM proxies) translate API styles across the client/upstream boundary. Live wuneng setups include the Anthropic → OpenAI bridge: client SDK speaks /v1/messages with finish_reason= end_turn, LiteLLM forwards /v1/chat/completions with finish_reason= stop. All three of wire_api, final_finish_reason, and primary_model translate alongside each other, so requiring them to match dropped the pair rate on those topologies to zero. Frontend lib/call-pair.ts::contentKey: drop wire_api + finish_reason (model + request_path were already out). Remaining keys: is_stream, status_code, input_tokens, output_tokens. Combined with the 100ms time window and the distinct-5-tuple requirement, false positives are still effectively nil — these are the API-format-invariant fields proxies pass through unchanged. Backend ts-turn::proxy_pair::content_fingerprint: drop wire_api, final_finish_reason, primary_model. Remaining keys: session_id (the strongest signal — agent profiles content-hash on first user message), agent_kind, call_count, total_input_tokens, total_output_tokens. Tests: + frontend pairs-across-api-styles (Anthropic ingress, OpenAI upstream), + backend pairs_across_api_style_translation matching the same scenario at turn level. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clicking on a turn with hundreds of agentic iterations would freeze
the browser. The /api/agent-turns/{id}/calls endpoint returned every
call's full request_body + response_body + headers; an 878-call turn
on real data lands a 168 MB JSON response that the browser can't
parse or render.
Fix is in two parts that ship together:
**Server** (StorageBackend trait + DuckDB impl + API route)
- `query_turn_calls(turn_id, include_bodies: bool)` and
`query_calls_by_ids(call_ids, include_bodies: bool)` now accept a
flag. When false, the SQL projection selects `NULL::VARCHAR` for
the four heavy fields — DuckDB never reads the body pages off disk
and they don't transfer to Rust as Strings.
- New `?lite=1` query param on `GET /api/agent-turns/{id}/calls`
flips `include_bodies = false`. Default behavior unchanged for
every existing caller.
- `tokens_estimated` derivation falls back to `false` in lite mode
(it inspects response_body); documented on the trait.
**Console** (auto-opt-in for large turns + lazy-load on expand)
- `useAgentTurnCalls(id, lite)` passes `?lite=1` when caller asks.
- `AgentTurnDetailPanel` watches `turn.call_count`; above 200 it
flips lite mode on. Renders a small amber banner so the user
knows bodies are being lazy-loaded.
- `CallCard` lazy-fetches `/api/llm-calls/{id}` only when the user
expands a card whose inline bodies are null. Gated on `expanded`
so a mega-turn with 800 collapsed cards doesn't fire 800
background requests at mount.
- Tools index / classifier already null-safe — no extra changes.
Real-world impact on the 878-call turn observed in production:
list response shrinks from 168 MB to under 1 MB; detail page now
loads in well under a second; expanding any single call fetches its
~190 KB of bodies independently.
Tests:
- ts-storage-duckdb: extended `query_turn_calls_orders_and_sequences`
to assert lite mode strips all four heavy fields and preserves
every other field byte-for-byte.
- console: 111 existing tests pass, no behavior change for
small-turn workflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three Agent Session list hooks (`useAgentSessions`, `useAgentSessionDetail`, `useSessionTurns`) were missing the `placeholderData: (prev) => prev` setting that every other list hook in the app uses (`useAgentTurns`, `useLlmCalls`, `useHttpExchanges`, `useMetrics`, etc.). Without it, every auto-refresh tick / toolbar key change wipes the query cache to undefined before the new response lands — react-query renders the loading skeleton, then the new data — and the user sees a full-page flash on every refresh while other list pages do a frame-perfect swap. Setting `placeholderData: (prev) => prev` keeps the last-known data visible while a background refetch is in flight. New data drops in when the response arrives; no skeleton, no blanked-out list. Caught by user: "Agent Session 界面每次刷新都会重刷整个页面, 而不是像其他页面一样看上去重刷幅度很小". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… perf) New "Services" page that aggregates llm_calls by the actual serving endpoint (server_ip, server_port) — answering "what's 172.16.103.81:9000 serving, and how is it performing?". Why not reuse `llm_metrics`? Its pre-aggregated grouping sets stop at `server_ip` and don't carry server_port — two vLLM instances on the same host (port 8000 / 9000) would collapse into one row. ## Backend - `ts_storage::query::ServiceRow` + `ServicesQuery` (one row per endpoint with distinct models, wire APIs, call/error counts, TTFT/E2E avg + p95, total tokens, first/last seen). - `StorageBackend::query_services` trait method + DuckDB impl. Query is `GROUP BY (server_ip, server_port)` on `llm_calls`; models / wire_apis come back as `list_distinct(array_agg(...))`, bridged to Rust as JSON strings (DuckDB rust bindings have no `FromSql for Vec<String>`). - `GET /api/services?start=&end=&sort_by=&sort_order=&limit=` serves it. `sort_by` whitelist matches the table column names. ## Console - Sidebar adds "Services" between "Models" and "Agent Sessions" with a `Server` icon. - `ServicesPage` table: Endpoint • Models (chips) • Wire APIs • Calls (+stream %) • Error % • TTFT avg/p95 • E2E avg/p95 • In/Out tokens • Last seen (relative). Headers click-to-sort in-place — no refetch on resort. - `useServices` hook follows the same `placeholderData: prev` pattern as every other list hook (no flash on refresh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…p/litellm) Adds an App column to the Services page that classifies each endpoint into one of a fixed enum from cheap wire-traffic signals. ## Signals used (highest-confidence first) | App | Signal | |-------------|--------------------------------------------------------------| | `ollama` | path `/api/chat` / `/api/generate` / `/api/tags` | | `llamacpp` | path `/completion` / `/tokenize` / `/props` (root-level) | | `litellm` | response header `x-litellm-*` OR `Server: litellm` | | `openai` | request `Host: api.openai.com` | | `anthropic` | request `Host: api.anthropic.com` | | `gemini` | request `Host: generativelanguage.googleapis.com` | | `openai-compat` | `Server: uvicorn` — vLLM and SGLang both, body sample | | | follow-up will disambiguate | | `litellm` | tiebreaker: an `openai-compat` endpoint serving ≥ 3 distinct | | | models (real signal from wuneng's 127.0.0.1:4000) | | (none) | nothing matches — UI shows muted "unknown" badge | ## Implementation - `ts-storage-duckdb/src/apps.rs` — pure-function classifier with 12 unit tests covering each rule + edge cases (Ollama compat mode serving `/v1/chat/completions`, multi-model uvicorn tiebreaker, path-wins-over-uvicorn precedence, header-absent fallback). - SQL aggregate now also pulls `arg_min(response_headers, LENGTH(...))` and the matching request_headers as a per-group sample plus `list_distinct(array_agg(request_path))[1:16]`. `arg_min` picks the shortest non-null blob deterministically — small enough that streaming it to Rust costs nothing. - New fields on `ServiceRow`: `app`, `server_header`, `request_paths`. - Console renders a colored `AppBadge` per row with a `title=Server:` tooltip so the user can sanity-check the label. ## What ships vs. follow-up vLLM and SGLang both run under uvicorn and don't have a distinctive custom header. Today they both label as `openai-compat`. A follow-up will pull one small response body per group and look for `chatcmpl-tool-<hex>` (vLLM's tool_call_id pattern, observed in production) vs. SGLang's distinct response shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Services-page aggregate uses `arg_min(headers, LENGTH(headers))`
to pick one representative header sample per endpoint. Without a
shape filter it picks ANY shortest non-null value — including rows
where the response parser stashed an empty/corrupted string. That
fed `null` (or similar) to the classifier and dropped four real
endpoints (the GLM-5.1 cluster on port 9000) to `unknown` even
though every other call from those endpoints carries a clean
`Server: uvicorn` blob.
Restrict the sample to JSON arrays of at least 30 chars (`[%`
pattern). The shortest real header list captured in production is
~140 chars; 30 is a comfortable floor that excludes literal `null`,
`[]`, `{}`, and any other malformed short response without losing
genuine samples.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`arg_min(headers, LENGTH(headers))` was still returning NULL for
endpoints with mixed-header data (e.g. SSE/streaming calls where the
parser captured something the LIKE filter doesn't catch).
Switch to `MAX(response_headers)` — lexicographic on a column whose
values all start with `[[` makes it a stable arbitrary pick AND it
doesn't have arg_min's failure mode of picking anomalously short
malformed values. Filter to `[%` to guarantee the picked sample is
shaped like a JSON array (drops literal "null", "{}", etc.).
Per the user's ask: every endpoint must land on a concrete label. Replace the `openai-compat` placeholder by stacking up cheap signals already present in `llm_calls`: **New SQL aggregates** (alongside the existing header / paths sample): - `list_distinct(array_agg(finish_reason))[1:32]` — distinct finish_reasons in the window - `arg_max(request_body, LENGTH(request_body))` — largest captured request body (deepest agentic history; only materialises once, length comparison is u64-cheap) - `arg_max(response_body, LENGTH(response_body))` — largest captured response body (capped at 8 KB so streamed/oversized rows don't bloat the read) **New classifier signals** (in order, highest confidence first): 1. SGLang-specific paths (`/generate`, `/health_generate`, `/get_server_info`, `/flush_cache`, `/encode`, profile endpoints). 2. vLLM-specific paths (`/version`, `/v1/score`). 3. SGLang-exclusive finish_reasons (`matched_stop`, `matched_eos`, `stop_str`) — works even when responses are SSE-streamed, since finish_reason is captured from the final SSE event regardless. 4. Response body fingerprint: - `"id":"chatcmpl-tool-…"` (vLLM's tool_call_id format) - `"system_fingerprint":"fp_…"` (vLLM only; SGLang leaves it null) 5. Request body fingerprint: `chatcmpl-tool-` substring — agentic replays carry assistant.tool_calls history back to the server, and the previous round's tool_call_id reveals vLLM. 6. Uvicorn fallback: - ≥3 models → LiteLLM (multi-model tiebreaker, real wuneng signal) - Model starts with `glm` / `deepseek` → SGLang (reference deployment) - Otherwise → vLLM (more common) Console: drop the `openai-compat` badge color since the label is no longer emitted by the classifier. 22 classifier tests (was 12) covering every new rule + the beats-the-heuristic precedence cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a tab switcher to the Services page (default Table, alternative
Path). The Path view fetches a new GET /api/services/topology endpoint
and renders a directed SVG graph:
* Nodes are real (server_ip:server_port) endpoints — colored by app
class — plus one synthetic "clients" node aggregating all upstream
callers.
* Edges come in two kinds:
- `proxy` (solid blue) — definitive hops confirmed by the
pair_sweeper (litellm -> sglang, haproxy -> docker backend, …).
- `client` (dashed grey) — synthetic edges from the clients node
into every service that receives non-proxy_out traffic. So
even endpoints without a paired upstream still render
connected.
Layout is a BFS-by-depth column layout from the clients node; sibling
order within a column is stable (call_count desc). Edge stroke width
scales with turn_count so the hot paths stand out.
Backend pieces:
* ServicesTopologyQuery / TopologyNode / TopologyEdge /
ServicesTopology types in ts-storage::query.
* DuckDB impl in metrics.rs — two SQL passes (proxy edges from
pair_sweeper-written metadata.proxy.pair_id; client entry edges
from any non-proxy_out turn) joined to llm_calls for server_port
(agent_turns doesn't carry it).
* GET /api/services/topology route, time-windowed by ?start&end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When pair_sweeper hasn't paired a turn but the inbound `client_ip`
matches the server_ip of an existing service node (the canonical
"LiteLLM accepts a user call and immediately forwards to a backend"
pattern), draw an "inferred" edge from the caller-service to the
destination instead of routing the traffic into the anonymous
`__clients__` super-node.
Resolution rule when caller_ip has multiple services: prefer
`litellm` first, then any other proxy-class app, then highest
call_count. Skip resolution entirely when the *target* itself is a
proxy (litellm/haproxy/nginx) — co-host vllm is the destination's
neighbour, not its caller, and attributing inbound litellm traffic
to vllm produced backwards edges before this guard was added.
Inferred edges render as a dashed mid-blue line with the count
label; proxy stays solid blue, client stays dashed grey. Legend
updated to spell out which is which.
Live data after deploy:
* `127.0.0.1:4000 (litellm) → 127.0.0.1:9000 (sglang)` inferred 11
* `127.0.0.1:4000 (litellm) → 127.0.0.1:9008 (vllm)` inferred 1
* `__clients__:0 → 172.16.103.81:4210 (litellm)` client 3221
(the bulk of real user traffic into the main LiteLLM —
correctly stays a client edge because the *target* is litellm)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A 7-day Services / Path query was taking 17 s on prod because the
per-endpoint aggregation included `arg_max(request_body,
LENGTH(request_body)) FILTER (LENGTH(body) BETWEEN ...)` — DuckDB
materializes every body in the window (5+ GB on prod) just to pick
one short sample per (server_ip, server_port).
Split body sampling out of the main aggregation into
`fetch_app_samples`:
* Window clipped to last 24 h of the user's range. App
classification doesn't change over the wider window, and most
views are "now" so this is a no-op in practice.
* Top-5 most-recent rows per endpoint via `ROW_NUMBER() OVER
(PARTITION BY server_ip, server_port ORDER BY request_time DESC)`
+ `WHERE rn <= 5`. DuckDB ranks in place and only emits 5 rows
per group — no full body scan.
* Body / header shape filtering (`LIKE '{%'`, length bounds) moved
to Rust. The 5 returned rows give us plenty of candidates to find
a representative sample.
* Distinct request_paths / finish_reasons come from a separate
cheap dim query over the same clipped window.
Wall-clock on prod (7 d window, 662 k llm_calls rows):
before: services=17.8 s topology=12.9 s
after: services= 1.5 s topology= 1.3 s
App classification output unchanged for endpoints with active recent
traffic (verified against the prior 1 h baseline).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "uvicorn endpoint serving ≥ 3 distinct models = LiteLLM" rule turned out to be window-width-sensitive: a vLLM serving one real model picks up 2-4 stray model names from misconfigured clients (`text-embedding-ada-002`, `test`, …) over a 7-day window and gets flipped to litellm. Verified misclassifications on prod: before after 172.17.0.7:8000 vllm ↛ litellm → vllm ✓ Qwen3.5-35B 172.17.0.9:9000 sglang ↛ litellm → sglang ✓ GLM-5.1 haproxy 172.17.0.4:30000 sglang ↛ litellm → sglang ✓ GLM-5.1 docker Real LiteLLM endpoints (.81:4210, 127.0.0.1:4000) still classify via the `x-litellm-*` response header rule. The GLM/DeepSeek model-name heuristic still routes those families to sglang on uvicorn. The fallback was the weakest signal in the chain — body / path / header evidence already weighed above is enough. Removes the rule, its dedicated unit test (sglang_via_glm_model_ heuristic already covers the case the test was using), and the table-of-rules doc-comment row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Services page gets a third tab "Model" that renders the existing
ModelsPage component. The standalone `/models` route still resolves
for shared links, but the sidebar entry is removed — Models was a
service-level cross-cut, sitting one click deeper under Services
makes the IA more honest.
Sidebar tweaks:
* Drop Models entry (now reachable via Services → Model tab).
* Rename "Traffic" → "Usage" — clearer label for what the page
actually shows (token throughput, byte volume, etc.). Route
stays /traffic so existing bookmarks keep working.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new charts on the Overview page so the operator can see, at a
glance:
* Agent Activity — stacked-area timeseries of agent_turn counts,
bucketed by a server-chosen window (1-min for ~1h ranges, 30-min
for 1d, 4h for 30d), split by agent_kind.
* Agent Distribution — horizontal bar chart of total turns per
agent_kind in the selected window.
Backend:
* `AgentSummaryQuery` / `AgentKindSummary` and
`AgentActivityQuery` / `AgentActivityPoint` types in
`ts-storage::query`.
* DuckDB impls in metrics.rs: `query_agent_summary` (one row per
agent_kind), `query_agent_activity` (per-bucket counts split by
agent_kind, bucket size auto-picked from window width).
* Routes `GET /api/agent-turns/summary` + `GET
/api/agent-turns/activity` (with optional `?bucket=` override).
Console:
* Types `AgentKindSummary` / `AgentActivityPoint` and the two
response shapes.
* Hooks `useAgentSummary` / `useAgentActivity` keyed on the
toolbar window (placeholderData keeps prior data during
refetch — no flash).
* Charts `AgentActivityChart` (recharts AreaChart stacked) and
`AgentDistributionChart` (recharts BarChart horizontal).
* Inserted as a new row between the existing "middle row" and
the model panels — same width budget as the rest of Overview.
Live data after deploy (1 d window): generic=68538, hermes=105,
openclaw=84 turns; 108 activity buckets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both list pages now accept a CSV `Server Port` filter in the head row alongside the existing `Client IP`. URL-serialized as `?server_port=4210,9000` so shared links carry the filter. * `llm_calls`: direct `WHERE server_port IN (...)` — fast. * `agent_turns`: `agent_turns` has no `server_port` column, so we resolve through the turn's first `call_ids` entry against `llm_calls` via an EXISTS subquery — same shortcut the topology query uses. A turn's calls almost always hit one endpoint in practice, so first-call resolution is a safe approximation. Verified live (1h window, prod): * `GET /api/llm-calls?server_port=9000` → all returned calls have server_port=9000 (GLM-5.1 sglang). * `GET /api/agent-turns?server_port=4210` → returned turns are litellm-bound (server_ip 172.16.103.81). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Based on my review of PR #27, here are my findings:
Summary
PR #27 adds a Services Path view with topology graph, Overview agent activity/distribution charts, and delivers a critical 10× performance fix for 7-day window queries. The performance fix correctly replaces the problematic arg_max(body, LENGTH(body)) pattern with a ROW_NUMBER-based top-N approach that clips body sampling to 24h. The window-width-sensitive classifier rule was correctly removed. APPROVE — no blocking issues.
Verified
-
Body-scan fix:
fetch_app_samplesat metrics.rs:788-955 usesROW_NUMBER() OVER (PARTITION BY server_ip, server_port ORDER BY request_time DESC) WHERE rn <= 5+ clipped 24h window — avoids the 5+ GB body materialization that caused the 17s stall. This is the canonical fix referenced in the repo's bite-history. -
Schema mirror:
ServiceRowRust (query.rs:54-90) ↔ TS (api.ts:70-94) matches field-for-field:server_ip,server_port,models,wire_apis,request_paths,call_count,error_count,stream_count,total_input_tokens,total_output_tokens,ttft_avg_ms,ttft_p95_ms,e2e_avg_ms,e2e_p95_ms,first_seen_ms,last_seen_ms,app,server_headerall align. -
Schema mirror:
AgentKindSummaryRust (query.rs:153-160) ↔ TS (api.ts:133-141) matches:agent_kind,turn_count,total_input_tokens,total_output_tokens,avg_duration_ms,last_seen_ms. -
Schema mirror:
AgentActivityPointRust (query.rs:173-178) ↔ TS (api.ts:149-153) matches:timestamp_ms,agent_kind,turn_count. -
Route registration:
/api/agent-turns/summary,/api/agent-turns/activity, and/api/agent-turns/{id}/proxy-vieware all registered in lib.rs:139-157. -
queryKey correctness:
useAgentSummary,useAgentActivity,useServicesTopologyall include{start, end}in their queryKeys (use-agent-overview.ts:10,21; use-services-topology.ts:11). -
Classifier removal: The window-width-sensitive "uvicorn + ≥3 models → litellm" rule is removed with a clear comment at apps.rs:192-198 explaining the misclassification risk at 7d windows.
Suggestions
-
metrics.rs:811 — The 24h sample window clip (
SAMPLE_WINDOW_US = 24 * 60 * 60 * 1_000_000) is hardcoded. If app classification ever becomes time-sensitive (e.g., an endpoint that switches serving software), this would need to expand. Currently fine — the comment at 803-804 notes app classification doesn't change over wider windows. -
metrics.rs:1414-1416 — The
is_proxy_apphelper matches onlylitellm,haproxy,nginx. If another proxy type emerges (e.g.,envoy), it should be added here to prevent the self-loop attribution issue described at 1420-1426.
Questions
- Why does the sidebar relabel "Traffic" → "Usage" without changing the route? The commit message says "route unchanged" for bookmark compatibility, which makes sense — but the label "Usage" for
/trafficmight confuse users who bookmark the old label. Not a merge blocker, just a UX consistency note.
🤖 Reviewed by vivi • workflow run
…and-flash # Conflicts: # console/src/hooks/use-llm-calls.ts # console/src/hooks/use-url-sync.ts # console/src/pages/agent-turns.tsx # console/src/pages/llm-calls.tsx # console/src/pages/overview.tsx # server/ts-api/src/routes/llm_calls.rs
Integration / deploy branch that stacks the open Services-page work
(PR #25) and the LLM-proxy pair-detection work (PR #22), then adds
six new commits on top that make up this PR's reviewable change:
6882cd3/servicesthat renders the service→service topology as a directed SVG graph. BackendGET /api/services/topologyreturns{nodes, edges}(proxy edges from pair sweeper + synthetic__clients__edges into entry-point services).7b2f0aaclient_ipmatches theserver_ipof a known service (e.g. LiteLLM forwarding without a pair-sweeper match), draw the edge from that service instead of the anonymous clients node. Dashed-blue line, distinct from solid-blue proxy edges.bf4887farg_max(body, LENGTH(body))over a 7-day window scanned 5+ GB of bodies (17 s on prod); replaced with a clipped 24 h window +ROW_NUMBER() OVER (...) WHERE rn <= 5top-N sampling + body-shape filtering in Rust. Services 17.8 s → 1.5 s; topology 12.9 s → 1.3 s.fea1d83x-litellm-*header.3b35166/modelsroute still resolves. Sidebar "Traffic" relabelled "Usage" (route unchanged).8e191a1GET /api/agent-turns/summaryandGET /api/agent-turns/activityaggregateagent_turnsbyagent_kind. Two recharts on Overview: stacked-area activity timeseries and horizontal-bar distribution.Stack note
The 31 commits below
6882cd3come from the open base PRs:metadata.proxy.{role,pair_id,peer_turn_ids}used by the Path view's proxy edges)Once #22 and #25 land, this PR's effective diff against
mainwill be the six commits above.Verification (live on wuneng)
GET /api/services?start&end(7d)GET /api/services/topology?start&end(7d)GET /api/agent-turns/summary?start&end(1d)GET /api/agent-turns/activity?start&end(1d)Path view edge break-down on prod (last hour):
Test plan
/services→ Table view loads in ~1.5 s on 7d window.cargo test -p ts-storage-duckdb appspasses (21 tests, classifier fallback removed).🤖 Generated with Claude Code