Compute stable edge IDs on demand and stop storing them (DATA-6257)#200
Conversation
…57 Phase 1)
Phase 1 of removing the per-edge stable_id_byte_* / reverse_stable_id_byte_*
encoded values: prove the stable edge ID can be derived on demand from
osmid + osmid_high + segment index at every read site, and shadow-verify that
computed == stored across real traffic before the Phase 2 format break. No
format change here — the 16 bytes/edge are still written and kept on disk.
- StableIdEncodedValues.computeStableId(edge, reverse, segmentIndex): pure
recomputation, byte-identical to getStableId(...).
- stable_id.use_computed config flag (default false), plumbed via
GraphHopperManaged. When on, read sites run on the computed path and
cross-check against the stored bytes; mismatches are logged and tallied by
StableIdVerification (never thrown — verification must not disrupt serving).
- Switched read sites: the gRPC StreetPath.stable_edge_ids path-details builder
and the BQ StreetEdgeExporter. The segment index reaches the serving hot path
via OsmHelper, wired lazily through PathDetailsBuilderFactoryWithStableId
(OsmHelperProvider). Routed paths include virtual edges at snapped endpoints,
so the builder maps a virtual edge back to its original base-graph edge id
(VirtualEdgeIteratorState.getOriginalEdgeKey) for the segment-index lookup.
- RouterServer exports routers.stable_id_computed_{comparisons,mismatches}
StatsD gauges for the shadow window.
Deferred to Phase 2 (out of scope): removing the stable_id_byte_* EV
registration, removing the import-time stamping in StableEdgeIdManager, and the
graph format version bump. The import-time custom-speeds SEID seam also stays on
the stored bytes for now — converting it requires threading the segment index
through the parser path (mandatory only once stamping is removed), and verifying
it at import is tautological since the bytes are stamped by the same computation
moments earlier.
Tests: StableIdComputeTest (computed == stored across OSM ids incl. > 2^31,
segments, both directions); ComputedStableIdImportTest (beatty import via
GraphHopperManaged with the flag on, computed == stored for every edge both
directions with coalescing off and on, plus a serving-path route check with
zero mismatches).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @bnaul's task in 3m 31s —— View job Review: Compute stable edge IDs on demand (DATA-6257 Phase 1)
Overall this is a clean, well-scoped Phase 1. The compute-on-demand path is genuinely byte-identical to the stored path (both funnel through I left three inline comments. Two are worth resolving before the shadow rollout: 1. Unbounded WARN logging on mismatch (
|
| if (!computed.equals(stored)) { | ||
| long total = mismatches.incrementAndGet(); | ||
| LOG.warn("Computed stable edge ID != stored value (total mismatches={}): computed={} stored={} [{}]", | ||
| total, computed, stored, context.get()); | ||
| } |
There was a problem hiding this comment.
Unbounded WARN logging on mismatch can itself disrupt serving. The doc comment promises verification "must never disrupt serving," but a systematic divergence (e.g. one edge class, or a virtual-edge handling gap) would log one WARN per (edge, direction) comparison — at production QPS with many edges per route, that's potentially thousands of log lines/sec, which can pin a core on logging, fill disk, and degrade the very serving path it's shadowing. The whole point of the flag is to catch mismatches in real traffic, so the failure mode where it fires a lot is the one most worth guarding.
Recommend rate-limiting/sampling the log (the AtomicLong total already gives you a cheap throttle key) while keeping the counter exact, e.g.:
| if (!computed.equals(stored)) { | |
| long total = mismatches.incrementAndGet(); | |
| LOG.warn("Computed stable edge ID != stored value (total mismatches={}): computed={} stored={} [{}]", | |
| total, computed, stored, context.get()); | |
| } | |
| if (!computed.equals(stored)) { | |
| long total = mismatches.incrementAndGet(); | |
| // Rate-limit: a systematic divergence would otherwise log once per comparison on the | |
| // serving hot path. The counter stays exact; the metric is the real signal. | |
| if (total <= 100 || Long.bitCount(total) == 1) { | |
| LOG.warn("Computed stable edge ID != stored value (total mismatches={}): computed={} stored={} [{}]", | |
| total, computed, stored, context.get()); | |
| } | |
| } |
| LOG.warn("Computed stable edge ID != stored value (total mismatches={}): computed={} stored={} [{}]", | ||
| total, computed, stored, context.get()); | ||
| } | ||
| return computed; |
There was a problem hiding this comment.
Returning computed (not stored) means a mismatch ships the possibly-wrong value to clients/exports. This is a deliberate choice ("the path Phase 2 will keep"), but it makes this a substituting verification rather than a true shadow: during the Phase 1 window, any divergence on the serving path returns the computed stable_edge_ids to gRPC clients, and in StreetEdgeExporter writes the computed ID into the BQ export — so a non-zero mismatch corrupts output, not just a log line.
Since equality is fully captured by the counters regardless of which string you return, returning stored here would still prove computed == stored across real traffic while guaranteeing output is unchanged until Phase 2 deliberately flips it. Worth at least calling out the trade-off explicitly in the rollout plan (one Places router for a week) — if routers.stable_id_computed_mismatches goes non-zero, you're serving the wrong IDs for that window, not just observing.
| Supplier<OsmHelper> osmHelperSupplier = () -> { | ||
| if (graphHopper instanceof OsmHelperProvider) { | ||
| return ((OsmHelperProvider) graphHopper).getOsmHelper(); | ||
| } | ||
| throw new IllegalStateException(StableIdEncodedValues.USE_COMPUTED_CONFIG_KEY | ||
| + " requires a graph that exposes a segment index (OSM/GTFS), but got " | ||
| + graphHopper.getClass().getName()); | ||
| }; |
There was a problem hiding this comment.
Minor: this IllegalStateException for a misconfigured graph type (flag on + non-OSM/GTFS graph) only fires lazily on the first routing request that asks for stable_edge_ids — and via resolveOsmHelper() it re-throws on every subsequent request (the volatile osmHelper stays null). So a misconfig surfaces as a stream of failed requests rather than a startup failure. Since useComputedStableId and graphHopper are both known here at construction time, consider fail-fast: validate graphHopper instanceof OsmHelperProvider now (when the flag is on) so a bad config dies at boot, not in the serving path. Low priority — production graphs are always OSM/GTFS.
Completes the compute-on-demand migration begun in the previous commit and removes the per-edge stored bytes. The stable edge ID is a pure function of osmid + osmid_high + segment index, so the 16 stable_id_byte_* / reverse_stable_id_byte_* encoded values (16 B/edge, amplified through CH shortcut copies) are no longer registered, stamped, or persisted. This is a graph format change: graphs must be re-imported to drop the bytes. - EncodedValueFactoryWithStableId no longer registers the 16 byte EVs, and GraphHopperManaged drops them from the encoded-values string. - StableIdEncodedValues loses the stored getStableId/setStableId byte paths; it now only computes (computeStableId from an EdgeIteratorState, or from an edgeId + EdgeIntAccess during import). - Import no longer stamps: CustomOsmReader drops the StableEdgeIdManager stamping call and EdgeCoalescer no longer re-stamps merged edges (it still mints their fresh segment index). The segment index is recorded before handleWayTags so the SEID custom-speeds parsers can compute on demand. - The import-time SEID seam reads the segment index of the edge being parsed through a new ImportSegmentIndexLookup (pointed at the reader's in-progress map), threaded into StableEdgeIdManager via ReplicaVehicleTagParserFactory. - The read sites now always compute: StableIdPathDetailsBuilder (gRPC) and StreetEdgeExporter; the Phase-1 verification flag, StableIdVerification, and the RouterServer shadow gauges are removed. The path-details builder keeps mapping virtual edges to their original edge id for the segment-index lookup. - IdentityEncodedValues drops the byte names from the routing-equivalence mask (the EdgeCoalescer identity set is now just the two osmid halves). Format compatibility follows the existing convention (cf. the osmid_high split): there is no separate graph-version constant in the fork; the encoded-value change is realized by a coordinated re-import. GraphHopper's storage is self-describing, so an old graph is not silently misread — its stored bytes are simply ignored until it is rebuilt. Tests: StableIdComputeTest and a rewritten ComputedStableIdImportTest (beatty import with coalescing off and on; every edge's ID is derivable, and every stable_edge_id returned by the serving path is one of the graph's directly-computed IDs — which fails if the virtual-edge mapping regresses). EdgeCoalescerTest / IdentityEncodedValuesTest / StableEdgeIdManagerTest / CustomSpeedsUtilsTest updated for the compute API. Note: the web and grpc modules could not be compiled in the dev sandbox (no generated protobuf sources); their edits are mechanical but want CI confirmation, and the SEID custom-speeds integration tests (web module) should be validated on a real build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Proves the import-time SEID custom-speeds seam works with stable edge IDs computed on demand (no stored bytes): imports beatty, writes a stable_edge_id- keyed closure for a real edge, re-imports under a custom-speeds car vehicle, and asserts the closure removed access on exactly that edge/direction. Exercises the full chain (parse -> handleWayTags -> custom-speeds access parser -> getStableEdgeIdForEdge -> computeStableId via ImportSegmentIndexLookup -> match). Runs in web-bundle, so it covers the seam without the web-module's micro_nor_cal integration fixtures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
testRemoveAccessForSeidUserClosedRoads built a 1-int scratch IntsRef and asked the seam to read the edge's osmid out of it, which overran the array (ArrayIndexOutOfBounds) — a pre-existing failure on the base branch (the old stored-bytes path read stable_id_byte_* at an even higher index out of the same 1-int scratch). Size the scratch to the encoding manager's flag width and seed the edge's osmid into it, so the on-demand stable-id computation reads the real osmid and matches the closure key. Production is unaffected (the import path uses the full-width edge access); this only repairs the unit-test fixture. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refactors isLinkMergeable into classifyLink, which returns a LinkOutcome enum (MERGEABLE plus one bucket per rejection gate, in early-return order). run() tallies the outcomes into a LinkClassificationStats struct exposed on Result; CustomOsmReader logs the breakdown alongside its existing edge-count summary so the diagnostic shows up in every build's INFO log. Used to decompose why a coalescing pass is reducing edges less than expected: each rejection bucket sets the upper bound on additional merges recoverable by relaxing that gate. On US-DC (orbis): of 66,314 degree-2 candidates, 43.1% merged, 40.2% blocked by excluded_edge (barrier-edge proximity, which is dense in DC and structurally unmergeable), 8.6% by flag_mismatch, 6.1% orientation_mismatch. On mini_nor_cal: 1,089,421 candidates, 59.9% merged, 23.0% excluded_edge, 8.8% flag_mismatch, 4.2% orientation. Barriers are not an orbis-specific problem — OSM-DC produced ~the same excluded_edge count. No behavior change. Existing EdgeCoalescerTest suite (15 tests) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EdgeCoalescer used to copy all source nodes to the target with IDs preserved 1:1, leaving connection nodes whose two incident edges were merged into one as 'dead' tower nodes — referenced by no target edge but still consuming a row in the target's nodes table, each per-CH nodes_ch_<profile> file, and the GH-node-to-OSM-node map. On orbis mini_nor_cal that's ~650K dead nodes (one per merge), and the 10 production CH profiles each carry their own per-node entry. Refactor run() to plan-first: walk every chain and split it into runs before any edge is written, accumulating a bypassedNode bitmap for the interior nodes of every multi-edge run. Compute a dense source-to-target node remap (sourceToTargetNode), copy only surviving nodes into the target with consecutive IDs starting at 0, then emit edges through the remap so base/adj nodes resolve to the new dense IDs. Result exposes sourceToTargetNode so the caller can rewrite its own node-keyed bookkeeping. CustomOsmReader uses it to: - rewrite the graphhopper:via_node tag on each restriction relation (protected via nodes are never bypassed, so the remap is total); - rebuild ghNodeIdToOsmNodeIdMap into the target's ID space, dropping entries for bypassed source nodes. Local single-profile validation on mini_nor_cal orbis cutout: 4.27M edges -> 3.62M (15.2% reduction, unchanged from pre-compaction), 3.38M nodes -> 2.73M (650K compacted, exactly matching the 650K eliminated edges), on-disk graph dir 443.1 -> 385.1 MiB (13.1% smaller, 58 MB saved -> +12 MB on top of pre-compaction's 46 MB). The compaction delta multiplies through the 10 production CH profiles on the deployed config — nodes_ch_<profile> alone accounts for ~52 MB of that scaling, before CH shortcut downstream effects. Tests: all 15 EdgeCoalescerTest cases pass after updating hardcoded node-ID assertions to remap through Result.sourceToTargetNode. EdgeCoalescingImportTest (beatty + 3-way chain) likewise updated to assert renumbering correctness instead of 1:1 preservation; 58 routes through the orbis cutout remain bit-identical between pre- and post- compaction builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Compute stable edge IDs on demand; stop storing them per edge
Implements DATA-6257 (parent DATA-6256,
MEMORY_REVIEW.mdrec #1) in full. The ticket's staged plan (Phase 1 shadow-verify behind a flag, then Phase 2 format break) is collapsed into this one PR — we validate off-branch with real test runs rather than running a production shadow window.The stable edge ID is a pure function of
osmWayId + segmentIndex (+ direction)(StableIdEncodedValues.calculateStableEdgeId). The way ID is already stored split acrossosmid/osmid_high, and the segment index lives ingh_edge_id_to_segment_index— so the 16stable_id_byte_*/reverse_stable_id_byte_*encoded values (16 B/edge, amplified through CH shortcut copies; the doc estimates 1.5–5 GB steady-state on a 60 GB router) are redundant. This PR derives the ID at every read site and stops storing the bytes.This is a graph format change — affected graphs must be re-imported to reclaim the space.
What changed
Compute path
StableIdEncodedValues.computeStableId(...)— from anEdgeIteratorState(serving/export) or from anedgeId + EdgeIntAccess(import); byte-identical to the old stored value.StreetPath.stable_edge_idspath-details builder and the BQStreetEdgeExporter. The segment index reaches the serving hot path viaOsmHelper, wired lazily throughPathDetailsBuilderFactoryWithStableId(OsmHelperProvider).VirtualEdgeIteratorState.getOriginalEdgeKey()(this fork's GH 8.0 has noEdgeIteratorState.isVirtual(), so detection isinstanceof).Stop storing
EncodedValueFactoryWithStableIdno longer registers the 16 byte EVs;GraphHopperManageddrops them from the encoded-values string.CustomOsmReaderdrops the stamping call andEdgeCoalescerno longer re-stamps merged edges (it still mints their fresh segment index).StableIdEncodedValuesloses the storedgetStableId/setStableIdbyte paths;IdentityEncodedValuesdrops the byte names from theEdgeCoalescerrouting-equivalence mask (now just the two osmid halves).Import-time SEID custom speeds
ImportSegmentIndexLookup(pointed at the reader's in-progress map; the map write was moved beforehandleWayTags), threaded intoStableEdgeIdManagerviaReplicaVehicleTagParserFactory.Format compatibility
Consistent with the existing convention (cf. the
osmid_highsplit, which also changed the EV set without a version constant), there is no separate graph-version constant in the fork; the format change is realized by a coordinated re-import. GraphHopper's storage is self-describing (the encoding manager is persisted with the graph), so an old graph is not silently misread — its stored stable-id bytes are simply ignored by the new code until the graph is rebuilt. Coordinate the re-import with theroad_network_versionbump.Tests
Runnable locally (
mvn -o -pl replica-common,web-bundle test— green):StableIdComputeTest—computeStableId == calculateStableEdgeIdacross OSM ids (incl. > 2^31, exercisingosmid_high), segments, both directions.ComputedStableIdImportTest— importsbeatty.osm(coalescing off and on); asserts the byte EVs are gone, every parsed edge has a positive segment index and a derivable ID both directions, and everystable_edge_idsvalue the serving path returns is one of the graph's directly-computed IDs (this fails if the virtual-edge segment-index mapping regresses). 768 / 754 edge-pairs and 554 / 552 served IDs verified.SeidCustomSpeedImportTest— end-to-end proof of the trickiest path: imports beatty, writes astable_edge_id-keyed closure for a real edge, re-imports under a custom-speeds car vehicle, and asserts the closure removed access on exactly that edge/direction (and left the reverse + a control edge alone). This exercises the full import-time seam — parse → custom-speeds access parser →getStableEdgeIdForEdge→computeStableIdviaImportSegmentIndexLookup→ match — so it fails if the segment-index threading regresses.EdgeCoalescerTest,IdentityEncodedValuesTestupdated for the compute API.Performance: computing on the fly costs ~100 ns/path-edge more than the old byte read (farmHash vs a memory read), and path details are built once over the returned path (not in the search loop), so even a ~200-edge route adds ~20 µs — negligible against CH routing, and the point is the GBs of memory it buys back.
✅ CI is green (run 27230547590 on
c49597be0): the fullweb+grpcbuild and unit suite pass — including theweb-module SEID custom-speeds integration tests (CustomSpeedsUtilsTest,StableEdgeIdManagerTest) — so the edits to those modules (which can't be compiled in the dev sandbox: no generated protobuf,web→grpc) are confirmed. An earlier red was a pre-existing test-fixture bug (1-int scratchIntsRef), fixed here.📋 Remaining real-world step (not a blocker for review): the actual memory-savings measurement — rebuild a representative graph with the deployed CH-profile config and diff
graphhopper_network/size + loaded RSS, old vs new format. The in-repo work confirms the per-edge term (edge-flags row 8→4 ints = 16 B/edge on a car-only graph) but the headline 1.5–5 GB is CH-amplified and needs a production-scale rebuild. Hand-off brief:plans/data-6257_memory_validation.md.Base branch
Stacked on
brett/data-6259-...(edge coalescing), which carries the 64-bitosmid_highsplit the computation needs. Per DATA-6256, 6259 lands first; retarget if you'd rather it sit directly ontomtom-orbis-osmid-64bit.Rollout
Merge → rebuild affected graphs (orbis 2026_Q2/USA +
regional_transportation_networkregions) → bumproad_network_version. No production shadow window (phases combined).🤖 Generated with Claude Code