Skip to content

Direct reader universal line endings#640

Merged
Hirogen merged 4 commits into
Developmentfrom
direct-reader-universal-line-endings
Jun 25, 2026
Merged

Direct reader universal line endings#640
Hirogen merged 4 commits into
Developmentfrom
direct-reader-universal-line-endings

Conversation

@Hirogen

@Hirogen Hirogen commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Handle all line endings exactly in Direct stream reader

PositionAwareStreamReaderDirect now detects the actual terminator per line instead of guessing one constant for the whole file.
It scans with IndexOfAny(SearchValues.Create("\r\n")) and classifies each hit as \n, \r\n, or a bare \r (classic-Mac), including the \r-at-block-boundary straddle.
Byte Position advances by the content bytes plus the real terminator's bytes, so it stays exact on files that interleave line endings.

This folds the two capabilities that previously only System/Legacy had into the default reader:

  • bare \r no longer renders a classic-Mac file as one giant line
  • Position no longer drifts on mixed \n/\r\n files, keeping seeks into flushed buffers correct

The guessed _newLineSequenceLength field and GuessNewLineSequenceLength (with its seek-reset-reread) are replaced by a lazy EnsureInitialized that fills the
first block from the current stream position, plus a ResetReader override that resets scan state on a mid-stream seek (also fixing a latent stale-block scan after a Position change).

Tests: 11 new TDD cases covering bare \r, mixed endings, repeated terminators, trailing \r and \r\n, \n\r, multibyte exact positions, and \r / \r\n landing exactly on the 32 KB block boundary. Full reader suite green (97 tests);
Direct throughput benchmark shows no regression.

Adds byte-exact manual/GUI fixtures (LF, CRLF, CR, Mixed) under TestData with a regenerator and README, pinned binary in .gitattributes so core.autocrlf cannot corrupt their terminators.

BRUNER Patrick and others added 4 commits June 25, 2026 10:37
PositionAwareStreamReaderDirect now detects the actual terminator per line instead of guessing one constant for the whole file.
It scans with IndexOfAny(SearchValues.Create("\r\n")) and classifies each hit as \n, \r\n, or a bare \r (classic-Mac), including the \r-at-block-boundary straddle.
Byte Position advances by the content bytes plus the real terminator's bytes, so it stays exact on files that interleave line endings.

This folds the two capabilities that previously only System/Legacy had into the default reader:
- bare \r no longer renders a classic-Mac file as one giant line
- Position no longer drifts on mixed \n/\r\n files, keeping seeks into flushed buffers correct

The guessed _newLineSequenceLength field and GuessNewLineSequenceLength (with its seek-reset-reread) are replaced by a lazy EnsureInitialized that fills the
first block from the current stream position, plus a ResetReader override that resets scan state on a mid-stream seek (also fixing a latent stale-block scan after a Position change).

Tests: 11 new TDD cases covering bare \r, mixed endings, repeated terminators, trailing \r and \r\n, \n\r, multibyte exact positions, and \r / \r\n landing exactly on the 32 KB block boundary. Full reader suite green (97 tests);
Direct throughput benchmark shows no regression.

Adds byte-exact manual/GUI fixtures (LF, CRLF, CR, Mixed) under TestData with a regenerator and README, pinned `binary` in .gitattributes so core.autocrlf cannot corrupt their terminators.
@Hirogen Hirogen merged commit c80bc54 into Development Jun 25, 2026
1 check passed
@Hirogen Hirogen deleted the direct-reader-universal-line-endings branch June 25, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant