Skip to content

Structural tree parser for conditional-aware spec editing #214

@trungams

Description

@trungams

Problem

The spec overlay system operates on section ranges determined by Spec.Visit, a single-pass line walker that treats %if/%endif conditionals as opaque content. This causes a family of bugs at section boundaries involving conditionals:

# Problem Impact
1 spec-append-lines places lines inside straddling conditionals Appended content becomes conditional when it shouldn't be
2 collectSectionRanges needs post-hoc balanceRange Complexity / fragility
3 spec-insert-tag needs skipPastConditional workaround Complexity / fragility
4 %define/%global inside removed sections silently dropped Build failures downstream (#203)
5 %else branches with section headers create phantom sections Potential incorrect operations
6 Continuation lines (\) misinterpreted as structural elements Phantom section boundaries, false-positive errors
7 Section content spilling past wrapper %endif invisible to section-scoped operations Section-scoped overlays miss post-wrapper content

These patterns are common in Fedora specs (13 specs with %else branches containing sections, 22+ specs with continuation-line issues, 529 specs with macro-generated sections). As Azure Linux imports more complex packages, these issues surface increasingly.

Related issues: #144, #193, #203

Proposed solution

Replace the flat Visit-based line walker with a structural tree parser (Option B from RFC 001). The tree model makes conditional-boundary bugs structurally impossible rather than patching around them.

How it works

A two-pass parser builds a block tree from raw spec lines:

  • Pass 1: Collects %if/%endif pairs and section header positions, skipping %if/%endif inside %define/%global macro continuation bodies (those are macro text, not structural conditionals).
  • Pass 2: Builds nested block tree, classifying conditionals as wrappers (contain section headers) vs content blocks (fully within a section).

Operations work on the tree via specTree / sectionHandle types, then serialize back to lines. The tree is an internal implementation detail — the public Spec.* API is unchanged.

What it fixes

  • Boundary bugs (1-3): Structurally impossible — sections and conditionals are separate tree nodes, so AppendLinesToSection can't land inside a wrapper.
  • Macro hoisting (4): RemoveSections walks removed subtrees for %define/%global definitions, scans survivors for references, and hoists referenced macros to the root level automatically.
  • Continuation awareness (6): collectConditionalPairs skips %if/%endif inside %define/%global continuation bodies. findSectionHeaderLines skips all continuation lines.
  • SearchAndReplace coverage: A dedicated searchReplaceBlock walker visits ALL line types (text, macros, conditional headers, %endif) so patterns targeting %define/%global or %if lines work correctly.
  • Visit removal: Visit, VisitTags, and VisitTagsPackage are replaced by tree-based operations + a new GetTag read-only accessor. The entire Visit infrastructure (parseState, Context, VisitTarget, etc.) is deleted.

Validation

Full render against the Azure Linux 4.0 corpus (~7,400 components):

Metric Baseline Tree branch Delta
Wall time 24m 11s 22m 59s -5.0%
CPU (user) 303m 34s 277m 07s -8.7%
Render failures 0 0

Branch tvuong/4.0/feat/azldev-parsetree shows the rendered spec output with the tree-based azldev.

Known limitations

Documented in the overlays user guide:

  1. Straddling sections: When a section header is inside a %if wrapper but its content continues past %endif, section-scoped overlays (spec-remove-tag, etc.) cannot find post-wrapper content. Workaround: use spec-search-replace with anchored regex. Affects 1 spec (gdb).

    %if 0%{!?scl:1}
    %package headless
    Requires: binutils
    %endif
    # Still %package headless in RPM's view, but the tree parser
    # can't associate this with "headless" — the section header
    # is structurally inside the wrapper.
    Recommends: default-yama-scope

    The tree parser cannot determine which section owns post-%endif content without evaluating the %if condition — the same content could belong to %package headless (when true) or to whatever section precedes the wrapper (when false).

  2. Macro-generated sections: Sections created by macros like %ghc_lib_subpackage or %fontpkg are invisible to the static parser. Use spec-search-replace for these.

Test coverage

  • 13 curated fixtures distilling real-world Fedora/Azure Linux patterns (continuation macros, elif chains, straddling wrappers, nested wrappers, script-section tag-shaped lines, subpackage macro hoisting)
  • Tier 1: Lossless round-trip (byte-for-byte parse/serialize identity) over every fixture
  • Tier 3: Per-pattern edit-operation regression tests (tag insertion, HasSection through wrappers, append through conditionals, script-section safety, section-scoped SearchAndReplace)
  • Synthetic stress: 64 deterministic round-trip seeds + 32 AddTag seeds composing primitives randomly
  • Macro hoisting: Positive (referenced macro hoisted) and negative (unreferenced macro dropped) regression tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions