Support 64-bit OSM way IDs by splitting osmid encoded value#194
Conversation
43c98a9 to
8900f29
Compare
8900f29 to
ed18e47
Compare
So marking PR #194 ready-for-review kicks off an automatic review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TomTom Orbis assigns OSM way IDs above Integer.MAX_VALUE (CA goes up to 7.7B), which OsmIdTagParser previously rejected because the osmid IntEncodedValue is only 31 bits. Split the ID across two encoded values (osmid low + osmid_high, 31 bits each = 62 usable bits) and reassemble it everywhere downstream: StableIdEncodedValues, the street-edge exporter, and a new OsmIdPathDetailsBuilder so the routing API still returns a full long for the osmid path detail rather than falling back to the generic IntDetails wiring. Verified by rebuilding the patched JAR and importing the full TomTom Orbis 2026-05-14 California PBF (200M nodes, 34M ways) — completes end-to-end in ~13 min and produces a 9.3M-node graph. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
79740af to
c314078
Compare
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
Claude encountered an error after 1m 12s —— View job PR Review In Progress
|
Keeps this branch's workflow byte-identical to the default branch so the review passes claude-code-action's validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Claude finished @bnaul's task in 4m 37s —— View job PR ReviewLGTM overall — the split-encoding approach is the right call here (it sticks to the public IntEncodedValue API, matches the pattern already used for the 8-byte SEID, and the bit-math reassembles cleanly). I dug into the parts that looked most likely to bite and didn't find a correctness or thread-safety problem. What I checkedBit math ( SEID stability for legacy data — preserved. Encoded-value wiring order — correct. TagParser layering — correct. Concurrency — no new shared mutable state on the read path. Performance — negligible. One extra Path-detail factory edge cases — handled. Requesting Inline comments
Operational noteThe encoded-values-string change shifts edge-record layout, so existing graph snapshots won't load — the PR body already calls this out. Worth re-confirming the builder + runner image tags ship together when this is rolled out. Todo
|
| long wayId = (1L << 33) + 12345L; | ||
| ReaderWay way = new ReaderWay(wayId); | ||
| IntsRef relationFlags = new IntsRef(50); | ||
|
|
||
| // Check that handleOsmIdWayTags correctly sets OSM ID for edge | ||
| // handleOsmIdWayTags should round-trip the full 64-bit way ID. | ||
| customOSMParsers.handleOsmIdWayTags(edgeId, edgeIntAccess, way, relationFlags); | ||
| assertEquals(osmEncValue.getInt(false, edgeId, edgeIntAccess), (int) way.getId()); | ||
| assertEquals(wayId, OsmIdTagParser.readOsmWayId( | ||
| osmIdLowEnc.getInt(false, edgeId, edgeIntAccess), | ||
| osmIdHighEnc.getInt(false, edgeId, edgeIntAccess))); | ||
|
|
||
| // Check that the OSM ID is still stored properly after handleWayTags | ||
| // handleWayTags should not clobber the previously-written ID (it skips OsmIdTagParser). | ||
| customOSMParsers.handleWayTags(edgeId, edgeIntAccess, way, relationFlags); | ||
| assertEquals(osmEncValue.getInt(false, edgeId, edgeIntAccess), (int) way.getId()); | ||
| assertEquals(wayId, OsmIdTagParser.readOsmWayId( | ||
| osmIdLowEnc.getInt(false, edgeId, edgeIntAccess), | ||
| osmIdHighEnc.getInt(false, edgeId, edgeIntAccess))); |
There was a problem hiding this comment.
Nit: the test only exercises the round-trip happy path. The new rejection logic in OsmIdTagParser.handleWayTags — id < 0 and id > MAX_SUPPORTED_ID — isn't covered, and neither is the 2^31 boundary (which used to throw and now must succeed). A couple of cheap assertThrows / boundary cases would lock in the contract:
// Just-above-int32 boundary still round-trips (previously threw).
ReaderWay justOverInt = new ReaderWay((1L << 31));
customOSMParsers.handleOsmIdWayTags(edgeId, edgeIntAccess, justOverInt, relationFlags);
assertEquals(1L << 31, OsmIdTagParser.readOsmWayId(
osmIdLowEnc.getInt(false, edgeId, edgeIntAccess),
osmIdHighEnc.getInt(false, edgeId, edgeIntAccess)));
// Above 62-bit max is rejected.
ReaderWay tooBig = new ReaderWay(1L << 62);
assertThrows(RuntimeException.class,
() -> customOSMParsers.handleOsmIdWayTags(edgeId, edgeIntAccess, tooBig, relationFlags));| } else if (name.equals("osmid_high")) { | ||
| return new IntEncodedValueImpl("osmid_high", 31, false); | ||
| } else if (name.startsWith("osmid")) { | ||
| return new IntEncodedValueImpl("osmid", 31, false); |
There was a problem hiding this comment.
Minor robustness: this works today only because equals("osmid_high") is checked before startsWith("osmid") — "osmid_high".startsWith("osmid") is true, so swapping the order would silently mis-create osmid_high as a value named "osmid". Tightening the second branch to equals("osmid") removes that footgun and reads more consistently with the new equals branch above:
| } else if (name.equals("osmid_high")) { | |
| return new IntEncodedValueImpl("osmid_high", 31, false); | |
| } else if (name.startsWith("osmid")) { | |
| return new IntEncodedValueImpl("osmid", 31, false); | |
| } else if (name.equals("osmid_high")) { | |
| return new IntEncodedValueImpl("osmid_high", 31, false); | |
| } else if (name.equals("osmid")) { | |
| return new IntEncodedValueImpl("osmid", 31, false); |

Background
We started ingesting TomTom Orbis into BigQuery (
vendor/tomtom/orbis_processing.pyin the model repo). Orbis is TomTom's open-map product, shipped as a.osm.pbfextract with TomTom-specific tags layered on top of standard OSM tags. As a sanity check that the data is usable for routing, I tried building a GraphHopper graph from the California Orbis PBF using this fork'simport-sandboxcommand — which is when this issue surfaced.The build crashed deep in pass 2:
OsmIdTagParserrejects any way withid > Integer.MAX_VALUE, and theosmidIntEncodedValueis declared with 31 bits. Orbis assigns its own ID space, with way IDs running up to ~7.7B in the CA extract — about 3.6× the int32 ceiling. Standard OSM is still safely under int32, but it's been creeping up over the years too (~1.4B today), so the runway isn't infinite even without Orbis.What this PR does
Splits the
osmidvalue across two 31-bitIntEncodedValues —osmid(low 31 bits) and a newosmid_high(next 31 bits) — for a total of 62 usable bits. Reassembles to alongat every read site.GraphHopper core only ships
IntEncodedValue; there's noLongEncodedValue, and adding one would mean touching upstream-GH internals (new EV type, packed-layout changes, new getters/setters onEdgeIteratorState). The split-int approach uses only the existing public API, and it's the pattern this fork already uses for the 64-bit stable edge IDs (stable_id_byte_0..7).Touch points:
OsmIdTagParser— writes both halves; guard raised from 32-bit to 62-bit.RouterConstants— addsOSM_ID_HIGH_ENCODED_VALUE+OSM_ID_LOW_BITS(31).EncodedValueFactoryWithStableId— registersosmid_highalongsideosmid.TagParserFactoryWithOsmId— wires both EVs into the single tag parser; usesNoOpTagParserforosmid_highso GH doesn't try to parse it independently.GraphHopperManaged— addsosmid_hightosetEncodedValuesString.StableIdEncodedValues— newreadOsmWayId(edge)helper;setStableIdnow reads the full long.StreetEdgeExporter— reads via the helper; exportedstreet_edges.csvosmidcolumn is unchanged in format (always was a long-formatted string).PathDetailsBuilderFactoryWithStableId+ newOsmIdPathDetailsBuilder— explicit builder for theosmidpath detail so the API returns the fullLonginstead of the genericIntDetailstruncating to 31 bits.TestCustomOSMParsersexercises a way ID above 2^31 to confirm the round-trip;StableEdgeIdManagerTestswitched frominttolongvia the new helper.Safety for non-Orbis data
((long)0 << 31) | (id & 0x7FFF_FFFF) == id, so existing Geofabrik-derived SEIDs are bit-identical. No regression in SEID-keyed downstream artifacts (custom speed files, trajectory speeds, road closures).setIntper edge.Operational caveat
Adding
osmid_highto the encoded-values string changes the edge-record layout, so existing pre-built graph snapshots in GCS won't load against a runner built from this branch. Thegraphhopper_graph_builderinputs_hashshould invalidate downstream artifacts naturally once a new runner image tag is rolled in, but the builder + runner image tags should ship together in the same model-repo change rather than independently.API change worth noting
The
osmidpath detail in routing responses switches fromIntegertoLong. JSON-wise it's still a number; only clients that strongly type the field asintwould break.Testing
mvn -pl replica-common,web-bundle clean compile test-compile: ✅ all clean.webtest-compile is blocked by the separate grpc/protobuf generation prerequisite; running CI is the simplest way to exerciseStableEdgeIdManagerTest.import-sandboxagainst the full TomTom Orbis 2026-05-14 California PBF (200M nodes, 34M ways, way IDs up to 7.7B). Completes in ~13 min, producing a 9.27M-node / 11.57M-edge graph. Spinning upservermode on the result returns sane routes (LAX→SFO: 610 km / 7h10m; SF→Oakland: 18 km / 22 min) and the same routes built from the equivalent Geofabrik CA PBF are within rounding (610 km / 7h07m and 17 km / 20 min) — i.e. on standard-OSM data the patched code is indistinguishable from the old code.🤖 Generated with Claude Code