Problem
The spec overlay system operates on section ranges determined by Spec.Visit, a single-pass line walker that treats %if/%endif conditionals as opaque content. This causes a family of bugs at section boundaries involving conditionals:
| # |
Problem |
Impact |
| 1 |
spec-append-lines places lines inside straddling conditionals |
Appended content becomes conditional when it shouldn't be |
| 2 |
collectSectionRanges needs post-hoc balanceRange |
Complexity / fragility |
| 3 |
spec-insert-tag needs skipPastConditional workaround |
Complexity / fragility |
| 4 |
%define/%global inside removed sections silently dropped |
Build failures downstream (#203) |
| 5 |
%else branches with section headers create phantom sections |
Potential incorrect operations |
| 6 |
Continuation lines (\) misinterpreted as structural elements |
Phantom section boundaries, false-positive errors |
| 7 |
Section content spilling past wrapper %endif invisible to section-scoped operations |
Section-scoped overlays miss post-wrapper content |
These patterns are common in Fedora specs (13 specs with %else branches containing sections, 22+ specs with continuation-line issues, 529 specs with macro-generated sections). As Azure Linux imports more complex packages, these issues surface increasingly.
Related issues: #144, #193, #203
Proposed solution
Replace the flat Visit-based line walker with a structural tree parser (Option B from RFC 001). The tree model makes conditional-boundary bugs structurally impossible rather than patching around them.
How it works
A two-pass parser builds a block tree from raw spec lines:
- Pass 1: Collects
%if/%endif pairs and section header positions, skipping %if/%endif inside %define/%global macro continuation bodies (those are macro text, not structural conditionals).
- Pass 2: Builds nested block tree, classifying conditionals as wrappers (contain section headers) vs content blocks (fully within a section).
Operations work on the tree via specTree / sectionHandle types, then serialize back to lines. The tree is an internal implementation detail — the public Spec.* API is unchanged.
What it fixes
- Boundary bugs (1-3): Structurally impossible — sections and conditionals are separate tree nodes, so
AppendLinesToSection can't land inside a wrapper.
- Macro hoisting (4):
RemoveSections walks removed subtrees for %define/%global definitions, scans survivors for references, and hoists referenced macros to the root level automatically.
- Continuation awareness (6):
collectConditionalPairs skips %if/%endif inside %define/%global continuation bodies. findSectionHeaderLines skips all continuation lines.
SearchAndReplace coverage: A dedicated searchReplaceBlock walker visits ALL line types (text, macros, conditional headers, %endif) so patterns targeting %define/%global or %if lines work correctly.
- Visit removal:
Visit, VisitTags, and VisitTagsPackage are replaced by tree-based operations + a new GetTag read-only accessor. The entire Visit infrastructure (parseState, Context, VisitTarget, etc.) is deleted.
Validation
Full render against the Azure Linux 4.0 corpus (~7,400 components):
| Metric |
Baseline |
Tree branch |
Delta |
| Wall time |
24m 11s |
22m 59s |
-5.0% |
| CPU (user) |
303m 34s |
277m 07s |
-8.7% |
| Render failures |
0 |
0 |
— |
Branch tvuong/4.0/feat/azldev-parsetree shows the rendered spec output with the tree-based azldev.
Known limitations
Documented in the overlays user guide:
-
Straddling sections: When a section header is inside a %if wrapper but its content continues past %endif, section-scoped overlays (spec-remove-tag, etc.) cannot find post-wrapper content. Workaround: use spec-search-replace with anchored regex. Affects 1 spec (gdb).
%if 0%{!?scl:1}
%package headless
Requires: binutils
%endif
# Still %package headless in RPM's view, but the tree parser
# can't associate this with "headless" — the section header
# is structurally inside the wrapper.
Recommends: default-yama-scope
The tree parser cannot determine which section owns post-%endif content without evaluating the %if condition — the same content could belong to %package headless (when true) or to whatever section precedes the wrapper (when false).
-
Macro-generated sections: Sections created by macros like %ghc_lib_subpackage or %fontpkg are invisible to the static parser. Use spec-search-replace for these.
Test coverage
- 13 curated fixtures distilling real-world Fedora/Azure Linux patterns (continuation macros, elif chains, straddling wrappers, nested wrappers, script-section tag-shaped lines, subpackage macro hoisting)
- Tier 1: Lossless round-trip (byte-for-byte parse/serialize identity) over every fixture
- Tier 3: Per-pattern edit-operation regression tests (tag insertion, HasSection through wrappers, append through conditionals, script-section safety, section-scoped SearchAndReplace)
- Synthetic stress: 64 deterministic round-trip seeds + 32 AddTag seeds composing primitives randomly
- Macro hoisting: Positive (referenced macro hoisted) and negative (unreferenced macro dropped) regression tests
Problem
The spec overlay system operates on section ranges determined by
Spec.Visit, a single-pass line walker that treats%if/%endifconditionals as opaque content. This causes a family of bugs at section boundaries involving conditionals:spec-append-linesplaces lines inside straddling conditionalscollectSectionRangesneeds post-hocbalanceRangespec-insert-tagneedsskipPastConditionalworkaround%define/%globalinside removed sections silently dropped%elsebranches with section headers create phantom sections\) misinterpreted as structural elements%endifinvisible to section-scoped operationsThese patterns are common in Fedora specs (13 specs with
%elsebranches containing sections, 22+ specs with continuation-line issues, 529 specs with macro-generated sections). As Azure Linux imports more complex packages, these issues surface increasingly.Related issues: #144, #193, #203
Proposed solution
Replace the flat Visit-based line walker with a structural tree parser (Option B from RFC 001). The tree model makes conditional-boundary bugs structurally impossible rather than patching around them.
How it works
A two-pass parser builds a
blocktree from raw spec lines:%if/%endifpairs and section header positions, skipping%if/%endifinside%define/%globalmacro continuation bodies (those are macro text, not structural conditionals).Operations work on the tree via
specTree/sectionHandletypes, then serialize back to lines. The tree is an internal implementation detail — the publicSpec.*API is unchanged.What it fixes
AppendLinesToSectioncan't land inside a wrapper.RemoveSectionswalks removed subtrees for%define/%globaldefinitions, scans survivors for references, and hoists referenced macros to the root level automatically.collectConditionalPairsskips%if/%endifinside%define/%globalcontinuation bodies.findSectionHeaderLinesskips all continuation lines.SearchAndReplacecoverage: A dedicatedsearchReplaceBlockwalker visits ALL line types (text, macros, conditional headers,%endif) so patterns targeting%define/%globalor%iflines work correctly.Visit,VisitTags, andVisitTagsPackageare replaced by tree-based operations + a newGetTagread-only accessor. The entire Visit infrastructure (parseState,Context,VisitTarget, etc.) is deleted.Validation
Full render against the Azure Linux 4.0 corpus (~7,400 components):
Branch
tvuong/4.0/feat/azldev-parsetreeshows the rendered spec output with the tree-based azldev.Known limitations
Documented in the overlays user guide:
Straddling sections: When a section header is inside a
%ifwrapper but its content continues past%endif, section-scoped overlays (spec-remove-tag, etc.) cannot find post-wrapper content. Workaround: usespec-search-replacewith anchored regex. Affects 1 spec (gdb).The tree parser cannot determine which section owns post-
%endifcontent without evaluating the%ifcondition — the same content could belong to%package headless(when true) or to whatever section precedes the wrapper (when false).Macro-generated sections: Sections created by macros like
%ghc_lib_subpackageor%fontpkgare invisible to the static parser. Usespec-search-replacefor these.Test coverage