Skip to content

Fix: one resolver shared across the recursive walk#521

Draft
gilescope wants to merge 1 commit into
mainfrom
giles-share-resolver
Draft

Fix: one resolver shared across the recursive walk#521
gilescope wants to merge 1 commit into
mainfrom
giles-share-resolver

Conversation

@gilescope

Copy link
Copy Markdown

Problem

inputgraph's load() creates a fresh buildcontext.Resolver per target, each with an empty gitMetaCache. So every target in a hash walk re-shells out to git for the same repo's metadata (rev-parse HEAD, --short, tree hash, --abbrev-ref, describe --tags, refs, timestamps).

Profiling shows resolver.Resolve is 87–100% of HashTarget wall time, at ~195 ms per target, linear in graph size. File hashing and AST hashing are negligible by comparison.

Fix

Share a single Resolver across the whole recursive walk — the same pattern the normal build path already uses — so gitMetaCache (a concurrency-safe SyncCache) dedups git metadata per local path. Result: roughly one git invocation per distinct local path instead of one per target.

Measurements (this repo, arm64)

Target Targets Before After Δ
+code 4 859 ms 457 ms −47%
+earthly 5 1089 ms 456 ms −58%

Hash output is unchanged — identical hash-log entry counts before/after, since the same inputs are hashed and only the redundant git-metadata computation is removed. This also speeds up --auto-skip, which shares the same HashTarget path.

Notes

  • Residual cost is now ~195 ms × number of distinct local paths; parallelising those independent resolves is a possible follow-up but out of scope here.
  • No new test: the change is a pure dedup with byte-identical output. A regression guard (counting resolver creations / git-metadata calls via an injected resolver) would be a reasonable follow-up.

inputgraph's load() created a fresh buildcontext.Resolver per target,
each with an empty gitMetaCache, so every target re-shelled to git for
the same repo's metadata (rev-parse HEAD, describe --tags, refs, ...).
Cost was ~195ms per target, linear in graph size.

Share a single resolver across the whole recursive walk (the same
pattern the normal build path already uses) so gitMetaCache dedups git
metadata per local path — roughly one git invocation per distinct path
instead of one per target.

Measured on this repo (arm64):
  +code    (4 targets):  859ms -> 457ms  (-47%)
  +earthly (5 targets): 1089ms -> 456ms  (-58%)

Hash output is unchanged: identical hash-log entry counts before/after,
since the same inputs are hashed and only the git metadata computation
is deduplicated. Also speeds up --auto-skip, which shares HashTarget.
@gilescope gilescope added the ai-assisted Authored with AI assistance label Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

➖ Are we earthbuild yet?

No change in "earthly" occurrences

📈 Overall Progress

Branch Total Count
main 5317
This PR 5317
Difference +0

Keep up the great work migrating from Earthly to Earthbuild! 🚀

💡 Tips for finding more occurrences

Run locally to see detailed breakdown:

./.github/scripts/count-earthly.sh

Note that the goal is not to reach 0.
There is anticipated to be at least some occurences of earthly in the source code due to backwards compatibility with config files and language constructs.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the loading process by sharing a single buildcontext.Resolver across the recursive loader walk, which allows reusing the gitMetaCache and avoids redundant git invocations. The reviewer suggests further improving this by allowing an existing Resolver to be passed via HashOpt to share it across multiple HashTarget calls or with the main build path.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread inputgraph/loader.go
Comment on lines +78 to +82
// One resolver shared across the whole recursive walk so its
// gitMetaCache dedups git metadata per local path. Creating it per
// target re-shells to git (~195ms/target, linear in graph size);
// shared, it's ~one git invocation per distinct local path.
resolver: buildcontext.NewResolver(nil, nil, opt.Console, "", "", "", 0, ""),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While sharing a single Resolver across the recursive walk is a great improvement, we can take this further. If we allow passing an existing Resolver via HashOpt (e.g., by adding a Resolver *buildcontext.Resolver field to HashOpt), we could reuse the same Resolver (and its gitMetaCache / parseCache) across multiple HashTarget calls or even share it with the main build path. This would allow the caller (which already maintains a Resolver for the normal build path) to pass it in, avoiding redundant git invocations and AST parsing entirely between the hashing and building phases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted Authored with AI assistance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant