Skip to content

fix: use merge_all() to avoid O(n)-depth runfiles NestedSet nesting#86

Merged
keith merged 1 commit into
keith:mainfrom
honnix:honnix/fix-runfiles-merge-depth
Jun 11, 2026
Merged

fix: use merge_all() to avoid O(n)-depth runfiles NestedSet nesting#86
keith merged 1 commit into
keith:mainfrom
honnix:honnix/fix-runfiles-merge-depth

Conversation

@honnix

@honnix honnix commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace sequential runfiles.merge() calls in loops with collecting into a list + single merge_all() call in both multirun.bzl and command.bzl
  • This changes the NestedSet depth from O(n) to O(1), preventing StackOverflowError in large multirun targets

Problem

When a multirun target has many commands (e.g. 1000+), the current code calls runfiles.merge() in a loop:

for command in ctx.attr.commands:
    runfiles = runfiles.merge(default_info.default_runfiles)

Each merge() wraps the previous result as a transitive child, creating a linear chain of NestedSets with depth proportional to the number of commands. When Bazel later fingerprints this structure via the recursive NestedSetFingerprintCache.addToFingerprint, it overflows the JVM stack.

We hit this as a java.lang.StackOverflowError (1022 recursive frames deep) when upgrading to Bazel 9.1.1 in a monorepo with ~1000 deploy targets aggregated via a single multirun target. The crash is in Bazel's internal NestedSet fingerprinting:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.StackOverflowError
	at c.g.d.b.l.collect.nestedset.DigestMap.readDigest(DigestMap.java:69)
	at c.g.d.b.l.collect.nestedset.NestedSetFingerprintCache.addToFingerprint(NestedSetFingerprintCache.java:106)
	at c.g.d.b.l.collect.nestedset.NestedSetFingerprintCache.addToFingerprint(NestedSetFingerprintCache.java:109)
	... (1022 frames)

Fix

Collect all runfiles objects into a flat list, then call merge_all() once. This produces a shallow NestedSet tree regardless of the number of commands.

merge_all() has been available since Bazel 5.x.

Test plan

  • Existing tests continue to pass
  • Verified against a monorepo with 1000+ commands that previously hit StackOverflowError

🤖 Generated with Claude Code

Replace sequential runfiles.merge() calls in loops with collecting
into a list and a single merge_all() call. This changes the NestedSet
depth from O(n) to O(1), preventing StackOverflowError in large
multirun targets.

When a multirun target has many commands (e.g. 1000+), each merge()
wraps the previous result as a transitive child, creating a linear
chain. When Bazel fingerprints this via the recursive
NestedSetFingerprintCache.addToFingerprint, it overflows the JVM
stack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@keith keith left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a lot of runfiles! thanks!

@keith keith enabled auto-merge (squash) June 11, 2026 16:48
@keith keith merged commit 8c7843c into keith:main Jun 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants