Skip to content

Field level usage metrics with errors#8062

Open
jdolle wants to merge 67 commits into
mainfrom
usage-field-errors
Open

Field level usage metrics with errors#8062
jdolle wants to merge 67 commits into
mainfrom
usage-field-errors

Conversation

@jdolle

@jdolle jdolle commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Background

Part of the subgraph visibility initiative. This PR adds coordinate level resolution count tracking, and tracks errors for coordinates by error code.

This can be broken down into several components:

  1. Clickhouse table schema, which sets up the structure of the data in our database for our UI's time periods.
  2. A new gateway plugin that uses subgraph calls to extract field and error counts
  3. Modifies usage data pipeline to accept and process additional data on the v2 usage data.
  4. And renders this new data on our UI on the explorer and coordinate insights pages.

Description

The tables and materialized views are in Clickhouse. These support two query patterns -- only centered around the coordinates and another that is hash (operation) specific.

This supports our two current usage patterns -- the first being in use today on the explorer page, and the other being a proposed new feature to show the success/error results by field inside an operation. In either case, we want to support filtering by other metrics such as the error code.

The Hive client has been adjusted to add a subgraph call between starting and sending the request data to be batched. Field counts are added to the existing operations data and a new structure is being submitted for errors.

Rather than use a materialized view from the source table for every time period, I've introduced cascading updates. This can impact insert time, but these materialized views are inexpensive (no joins or costly functions), so I do not anticipate an issue. Regardless, it may be best to enable async inserts in the future if we've not already done so.

Screenshot 2026-06-11 at 7 43 58 PM Screenshot 2026-06-08 at 8 41 40 PM Screenshot 2026-06-08 at 8 32 26 PM

Gateway Benchmark Comparison

To ensure gateway performance is not impacted too heavily, a benchmark was ran. This was ran locally, in constant mode, with 10 cpu. Here are the results:

No Usage Reporting

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     checks.........................: 100.00% ✓ 1266159     ✗ 0
     data_received..................: 37 GB   615 MB/s
     data_sent......................: 491 MB  8.1 MB/s
     http_req_blocked...............: avg=1.8µs   min=0s     med=1µs    max=7.46ms   p(90)=2µs    p(95)=2µs    p(99.9)=51.84µs
     http_req_connecting............: avg=520ns   min=0s     med=0s     max=7.44ms   p(90)=0s     p(95)=0s     p(99.9)=0s
     http_req_duration..............: avg=6.98ms  min=2.07ms med=6.46ms max=150.65ms p(90)=7.96ms p(95)=8.83ms p(99.9)=48.86ms
       { expected_response:true }...: avg=6.98ms  min=2.07ms med=6.46ms max=150.65ms p(90)=7.96ms p(95)=8.83ms p(99.9)=48.86ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 422153
     http_req_receiving.............: avg=51.36µs min=9µs    med=24µs   max=39.03ms  p(90)=68µs   p(95)=147µs  p(99.9)=1.95ms
     http_req_sending...............: avg=6.75µs  min=1µs    med=3µs    max=29.54ms  p(90)=6µs    p(95)=16µs   p(99.9)=504.84µs
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s     max=0s       p(90)=0s     p(95)=0s     p(99.9)=0s
     http_req_waiting...............: avg=6.92ms  min=2.04ms med=6.4ms  max=150.61ms p(90)=7.87ms p(95)=8.73ms p(99.9)=48.76ms
     http_reqs......................: 422153  7006.314119/s
     iteration_duration.............: avg=7.1ms   min=2.39ms med=6.59ms max=150.81ms p(90)=8.1ms  p(95)=8.98ms p(99.9)=49.03ms
     iterations.....................: 422053  7004.654456/s
     success_rate...................: 100.00% ✓ 422053      ✗ 0
     vus............................: 50      min=50        max=50
     vus_max........................: 50      min=50        max=50

With Usage Reporting

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     checks.........................: 100.00% ✓ 1162125     ✗ 0
     data_received..................: 34 GB   563 MB/s
     data_sent......................: 451 MB  7.5 MB/s
     http_req_blocked...............: avg=1.57µs  min=0s     med=1µs    max=3.7ms    p(90)=2µs    p(95)=2µs     p(99.9)=65µs
     http_req_connecting............: avg=179ns   min=0s     med=0s     max=2.22ms   p(90)=0s     p(95)=0s      p(99.9)=0s
     http_req_duration..............: avg=7.61ms  min=2.38ms med=6.96ms max=132.13ms p(90)=9.07ms p(95)=10.55ms p(99.9)=52.4ms
       { expected_response:true }...: avg=7.61ms  min=2.38ms med=6.96ms max=132.13ms p(90)=9.07ms p(95)=10.55ms p(99.9)=52.4ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 387475
     http_req_receiving.............: avg=58.88µs min=9µs    med=25µs   max=24.17ms  p(90)=76.6µs p(95)=170µs   p(99.9)=2.32ms
     http_req_sending...............: avg=7.63µs  min=1µs    med=3µs    max=19.33ms  p(90)=6µs    p(95)=18µs    p(99.9)=619.05µs
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s     max=0s       p(90)=0s     p(95)=0s      p(99.9)=0s
     http_req_waiting...............: avg=7.55ms  min=2.34ms med=6.9ms  max=131.72ms p(90)=8.97ms p(95)=10.43ms p(99.9)=52.32ms
     http_reqs......................: 387475  6408.88252/s
     iteration_duration.............: avg=7.73ms  min=3.17ms med=7.07ms max=132.38ms p(90)=9.21ms p(95)=10.71ms p(99.9)=52.58ms
     iterations.....................: 387375  6407.228508/s
     success_rate...................: 100.00% ✓ 387375      ✗ 0
     vus............................: 50      min=50        max=50
     vus_max........................: 50      min=50        max=50

With Gateway Plugin Usage Reporting

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     checks.........................: 100.00% ✓ 627147      ✗ 0     
     data_received..................: 18 GB   305 MB/s
     data_sent......................: 243 MB  4.0 MB/s
     http_req_blocked...............: avg=1.98µs  min=0s     med=1µs     max=15.3ms  p(90)=2µs     p(95)=2µs     p(99.9)=92.85µs
     http_req_connecting............: avg=460ns   min=0s     med=0s      max=2.88ms  p(90)=0s      p(95)=0s      p(99.9)=0s     
     http_req_duration..............: avg=14.21ms min=2.19ms med=13.58ms max=95.82ms p(90)=15.53ms p(95)=16.95ms p(99.9)=52.85ms
       { expected_response:true }...: avg=14.21ms min=2.19ms med=13.58ms max=95.82ms p(90)=15.53ms p(95)=16.95ms p(99.9)=52.85ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 209149
     http_req_receiving.............: avg=64.52µs min=9µs    med=26µs    max=37.52ms p(90)=94µs    p(95)=202µs   p(99.9)=2.24ms 
     http_req_sending...............: avg=7.96µs  min=1µs    med=3µs     max=7.29ms  p(90)=7µs     p(95)=24µs    p(99.9)=560.7µs
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s      p(99.9)=0s     
     http_req_waiting...............: avg=14.14ms min=2.15ms med=13.52ms max=95.78ms p(90)=15.4ms  p(95)=16.81ms p(99.9)=52.8ms 
     http_reqs......................: 209149  3469.366722/s
     iteration_duration.............: avg=14.35ms min=4.26ms med=13.7ms  max=96.71ms p(90)=15.68ms p(95)=17.12ms p(99.9)=53.2ms 
     iterations.....................: 209049  3467.707921/s
     success_rate...................: 100.00% ✓ 209049      ✗ 0     
     vus............................: 50      min=50        max=50  
     vus_max........................: 50      min=50        max=50  

@jdolle jdolle requested a review from n1ru4l May 22, 2026 01:07
@jdolle jdolle self-assigned this May 22, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This migration adds tables and materialized views for tracking GraphQL coordinate errors. Feedback focuses on schema optimizations and consistency, specifically: removing LowCardinality from the short-lived source table, standardizing ZSTD(1) codecs for hash columns, applying LowCardinality to coordinate strings in aggregated tables, ensuring consistent UUID types for target columns, and adding missing database prefixes to materialized view names.

Comment thread packages/migrations/src/clickhouse-actions/018-usage-coordinate-errors.ts Outdated
Comment thread packages/migrations/src/clickhouse-actions/018-usage-coordinate-errors.ts Outdated
Comment thread packages/migrations/src/clickhouse-actions/018-usage-coordinate-errors.ts Outdated
Comment thread packages/migrations/src/clickhouse-actions/018-usage-coordinate-errors.ts Outdated
@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🐋 This PR was built and pushed to the following Docker images:

Targets: build

Platforms: linux/amd64

Image Tag: 3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🚀 Snapshot Release (alpha)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package Version Info
@graphql-hive/apollo 0.48.2-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
@graphql-hive/cli 0.60.2-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
@graphql-hive/core 0.22.0-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
@graphql-hive/envelop 0.40.7-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
@graphql-hive/gateway-plugin-console-sdk 0.1.0-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
@graphql-hive/yoga 0.48.2-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎
hive 11.4.0-alpha-20260612231313-3b03aa0a1cfd7c59e95bf2849f17005ca7b6817e npm ↗︎ unpkg ↗︎

@jdolle jdolle marked this pull request as draft May 29, 2026 15:35
Comment thread packages/libraries/gateway-usage/package.json Outdated
jdolle and others added 20 commits May 29, 2026 15:52
…e-errors.ts

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…e-errors.ts

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…e-errors.ts

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…e-errors.ts

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@jdolle jdolle force-pushed the usage-field-errors branch from 1a2d10d to 7fd451f Compare May 29, 2026 23:04
Comment thread packages/services/api/src/modules/schema/resolvers/Target.ts Outdated
Comment thread packages/services/usage/src/usage-processor-2.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants