Skip to content

Note greediness of PEP 723 reference parser #1960

@SnoopJ

Description

@SnoopJ

Issue Description

While preparing a PR for PEP 723 support in pip, I noticed that the reference parser defined by the PEP and listed in the PyPA docs will collate multiple adjacent /// TYPE blocks as a single match, even when separated by a comment line (the spec refers to it as a "content line"). This greedy collation is surprising and makes distinguishing error cases a little complicated, so I think it merits a warning in the docs if it is not possible to update the specification itself.

I believe this quirk is caused by the last + in the reference regex being greedy and matching all the way to the trailing /// instead of to the first available one. In my limited experimentation, replacing this quantifier with +? resolves the issue, producing the expected number of matches.

This shouldn't slip through anybody's code unnoticed, as the collation will produce invalid TOML (the interior /// is invalid syntax), but it is a surprising enough edge case that I thought to report it here.

click for code
import re

script_A = """
# /// script
# data (1)
# ///
#
# /// script
# data (2)
# ///
"""

script_B = """
# /// script
# data (1)
# ///

# /// script
# data (2)
# ///
"""

# These lines adapted from PEP 723's reference parser:
# https://peps.python.org/pep-0723/#reference-implementation

REGEX = r"(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$"
name = "script"
matches_A = list(
    filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_A))
)
matches_B = list(
    filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_B))
)

# output:
# 1
# 2
print(len(matches_A))
print(len(matches_B))

Code of Conduct

  • I am aware that participants in this repository must follow the PSF Code of Conduct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions