Fix: one resolver shared across the recursive walk#521
Conversation
inputgraph's load() created a fresh buildcontext.Resolver per target, each with an empty gitMetaCache, so every target re-shelled to git for the same repo's metadata (rev-parse HEAD, describe --tags, refs, ...). Cost was ~195ms per target, linear in graph size. Share a single resolver across the whole recursive walk (the same pattern the normal build path already uses) so gitMetaCache dedups git metadata per local path — roughly one git invocation per distinct path instead of one per target. Measured on this repo (arm64): +code (4 targets): 859ms -> 457ms (-47%) +earthly (5 targets): 1089ms -> 456ms (-58%) Hash output is unchanged: identical hash-log entry counts before/after, since the same inputs are hashed and only the git metadata computation is deduplicated. Also speeds up --auto-skip, which shares HashTarget.
➖ Are we earthbuild yet?No change in "earthly" occurrences 📈 Overall Progress
Keep up the great work migrating from Earthly to Earthbuild! 🚀 💡 Tips for finding more occurrencesRun locally to see detailed breakdown: ./.github/scripts/count-earthly.shNote that the goal is not to reach 0. |
There was a problem hiding this comment.
Code Review
This pull request optimizes the loading process by sharing a single buildcontext.Resolver across the recursive loader walk, which allows reusing the gitMetaCache and avoids redundant git invocations. The reviewer suggests further improving this by allowing an existing Resolver to be passed via HashOpt to share it across multiple HashTarget calls or with the main build path.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // One resolver shared across the whole recursive walk so its | ||
| // gitMetaCache dedups git metadata per local path. Creating it per | ||
| // target re-shells to git (~195ms/target, linear in graph size); | ||
| // shared, it's ~one git invocation per distinct local path. | ||
| resolver: buildcontext.NewResolver(nil, nil, opt.Console, "", "", "", 0, ""), |
There was a problem hiding this comment.
While sharing a single Resolver across the recursive walk is a great improvement, we can take this further. If we allow passing an existing Resolver via HashOpt (e.g., by adding a Resolver *buildcontext.Resolver field to HashOpt), we could reuse the same Resolver (and its gitMetaCache / parseCache) across multiple HashTarget calls or even share it with the main build path. This would allow the caller (which already maintains a Resolver for the normal build path) to pass it in, avoiding redundant git invocations and AST parsing entirely between the hashing and building phases.
Problem
inputgraph'sload()creates a freshbuildcontext.Resolverper target, each with an emptygitMetaCache. So every target in a hash walk re-shells out togitfor the same repo's metadata (rev-parse HEAD,--short, tree hash,--abbrev-ref,describe --tags, refs, timestamps).Profiling shows
resolver.Resolveis 87–100% ofHashTargetwall time, at ~195 ms per target, linear in graph size. File hashing and AST hashing are negligible by comparison.Fix
Share a single
Resolveracross the whole recursive walk — the same pattern the normal build path already uses — sogitMetaCache(a concurrency-safeSyncCache) dedups git metadata per local path. Result: roughly onegitinvocation per distinct local path instead of one per target.Measurements (this repo, arm64)
+code+earthlyHash output is unchanged — identical hash-log entry counts before/after, since the same inputs are hashed and only the redundant git-metadata computation is removed. This also speeds up
--auto-skip, which shares the sameHashTargetpath.Notes