Skip to content

feat(server): add request-ID middleware for request correlation #932

@sauagarwa

Description

@sauagarwa

Problem Statement

The openshell-server gained request-level logging via TraceLayer (#892), but every request is anonymous — there is no unique identifier tying a request's log lines together. Under concurrent load, interleaved log output from multiple requests sharing the same method and path is indistinguishable:

INFO request{method=POST path=/openshell.v1.OpenShell/CreateSandbox}: response status=200 latency_ms=45
INFO request{method=POST path=/openshell.v1.OpenShell/CreateSandbox}: response status=200 latency_ms=312

Which CreateSandbox call took 312ms? Without a request ID, operators have no way to:

  • Correlate logs across the middleware stack, handler, and downstream services for a single request
  • Reference requests in bug reports — clients cannot quote a request ID when reporting failures
  • Trace requests end-to-end — the sandbox inference proxy already passes through x-request-id headers (tested at crates/openshell-sandbox/src/l7/inference.rs:508), but the gateway never generates or propagates one
  • Build toward distributed tracing — a request ID is the minimum unit of correlation before adopting full OpenTelemetry

This is a natural follow-up to #892, completing the request observability story.

Proposed Design

Add a request-ID middleware using tower-http's request-id feature that generates a UUID for each inbound request (or preserves a client-supplied one), records it in the tracing span, and returns it in the response.

Implementation sketch

1. Enable the request-id feature on tower-http

In workspace Cargo.toml:

tower-http = { version = "0.6", features = ["cors", "trace", "request-id"] }

2. Implement MakeRequestId using the existing uuid crate

use tower_http::request_id::{MakeRequestId, RequestId};
use http::HeaderValue;

#[derive(Clone)]
struct UuidRequestId;

impl MakeRequestId for UuidRequestId {
    fn make_request_id<B>(&mut self, _req: &Request<B>) -> Option<RequestId> {
        let id = uuid::Uuid::new_v4().to_string();
        Some(RequestId::new(HeaderValue::from_str(&id).unwrap()))
    }
}

3. Layer ordering in multiplex.rs

SetRequestId must run before TraceLayer so the span captures the ID. PropagateRequestId runs after (on the response path) to copy the ID into the response headers.

use tower_http::request_id::{SetRequestIdLayer, PropagateRequestIdLayer};

let x_request_id = HeaderName::from_static("x-request-id");

let grpc_service = ServiceBuilder::new()
    .layer(SetRequestIdLayer::new(x_request_id.clone(), UuidRequestId))
    .layer(
        TraceLayer::new_for_http()
            .make_span_with(make_request_span)
            .on_request(())
            .on_response(log_response),
    )
    .layer(PropagateRequestIdLayer::new(x_request_id.clone()))
    .service(grpc_service);

// Same for http_service

4. Record request ID in the tracing span

Update make_request_span to extract the ID from the request header (which SetRequestIdLayer has already inserted):

fn make_request_span<B>(req: &Request<B>) -> Span {
    let path = req.uri().path();
    let request_id = req
        .headers()
        .get("x-request-id")
        .and_then(|v| v.to_str().ok())
        .unwrap_or("-");

    if matches!(path, "/health" | "/healthz" | "/readyz") {
        tracing::debug_span!(
            "request",
            method = %req.method(),
            path,
            request_id,
        )
    } else {
        tracing::info_span!(
            "request",
            method = %req.method(),
            path,
            request_id,
        )
    }
}

Expected log output

INFO request{method=POST path=/openshell.v1.OpenShell/CreateSandbox request_id=a1b2c3d4-...}: response status=200 latency_ms=45
INFO request{method=POST path=/openshell.v1.OpenShell/CreateSandbox request_id=e5f6a7b8-...}: response status=200 latency_ms=312

Client-supplied IDs

If a client sends x-request-id: my-correlation-id, SetRequestIdLayer (with overwrite: false, the default) preserves it. This lets CLI and SDK callers trace their own requests without server coordination.

Scope boundaries

  • Header name: x-request-id — the de facto standard, and already used by the sandbox inference proxy
  • Health check endpoints: Will receive request IDs like any other request. The health listener on the separate unauthenticated port (health_router() in lib.rs:195) does not get this middleware — it has no Tower layer stack
  • gRPC metadata: gRPC clients can set x-request-id as metadata; tonic maps custom metadata to HTTP/2 headers transparently
  • Response header: The response always includes x-request-id, whether server-generated or client-supplied
  • ID format: UUID v4 (128-bit, ~2^122 unique values). No risk of collision across instances

Alternatives Considered

  1. Full OpenTelemetry with traceparent (W3C Trace Context) — provides distributed trace/span IDs, baggage propagation, and exporter integration. Significantly heavier: requires opentelemetry, opentelemetry-otlp, tracing-opentelemetry crates and a collector endpoint. The right long-term direction, but request-ID is the pragmatic first step that delivers immediate value. The two are not mutually exclusive — x-request-id can coexist with traceparent.

  2. Custom middleware without tower-http — the logic is ~30 lines, but tower-http's SetRequestIdLayer and PropagateRequestIdLayer handle edge cases (header overwrite policy, type-safe RequestId newtype, integration with the tower ecosystem). No reason to reimplement.

  3. Sequential integer IDs — simpler than UUIDs but not safe across multiple gateway instances in a Kubernetes deployment. UUIDs are globally unique without coordination.

  4. Always overwrite client-supplied IDs — would break client correlation. The default overwrite: false behavior preserves client IDs, which is the expected behavior for proxies and API gateways.

  5. x-trace-id or x-correlation-id header name — less widely adopted than x-request-id. The sandbox inference proxy already uses x-request-id in its passthrough test, so consistency favors this name.

Agent Investigation

Explored crates/openshell-server/src/ and the workspace configuration. Key findings:

  • No request ID exists anywhere in the server. Grep for request_id, x-request-id, trace-id, correlation-id across all server source files returns zero matches (outside sandbox inference proxy tests).
  • TraceLayer is applied at multiplex.rs:66-81 to both the gRPC and HTTP inner services via ServiceBuilder. The span currently records only method and path (make_request_span at line 236). Adding a request_id field is a ~3-line change.
  • tower-http v0.6.8 is the workspace version (Cargo.toml line 28). Only cors and trace features are enabled. The request-id feature adds SetRequestIdLayer, PropagateRequestIdLayer, and the MakeRequestId trait — no new transitive dependencies.
  • uuid v1.10 with v4 feature is already a workspace dependency (Cargo.toml line 101), used throughout the server for sandbox IDs, policy IDs, auth nonces, and session tokens.
  • The sandbox inference proxy already passes x-request-id through — there's a test at crates/openshell-sandbox/src/l7/inference.rs:508 that verifies the header is forwarded. This means the gateway generating the ID would flow through to inference backends automatically.
  • The health listener has no middleware (lib.rs:195 uses health_router().into_make_service() directly). This is intentional — it's unauthenticated and needs no request ID.
  • MultiplexedService implements hyper::service::Service, not tower::Service, confirming that the middleware must wrap the inner services (gRPC and HTTP), not the multiplexer — same constraint identified in feat: add request-level HTTP/gRPC tracing to openshell-server #892.
  • No OpenTelemetry dependencies exist in the workspace. The tracing subscriber (tracing_bus.rs:56) uses registry() + fmt::layer() + a custom SandboxLogLayer. Request-ID middleware does not require OpenTelemetry.
  • feat(server): metrics instrumentation #909 (metrics instrumentation) proposes a Tower middleware at the same location (multiplex.rs). The request-ID layer should be composed before the metrics layer so metrics can optionally use the request ID as a trace exemplar in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions