Reduce per-cell work in SheetParser (~13% speedup)#74
Open
HoangMinhBK wants to merge 1 commit into
Open
Conversation
Two hot-path optimizations in sheet_parser.rb:
1. Precompute column index at 'c' element start instead of recomputing it
via regex + column_letter_to_number on every 'v'/'t' end element.
Also replaces the inject({}) Hash allocation per cell with a single-pass
each over the attrs array.
2. Accumulate raw SAX string chunks in characters() and call Loader.cast
once in end_element_namespace, rather than casting each chunk and
concatenating already-cast values. This is both faster and more correct
(avoids partial-value casting for numeric/date types split across chunks).
Benchmarked at ~13% faster on a 10k-row x 12-column string workload
(0.495s -> 0.429s on M-series Mac). All 82 existing tests pass.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two small changes to the hot path in
SheetParserthat together shave ~12-16% off parse time. All 82 existing tests pass.Precompute column index at the
'c'elementcell_idxwas doing a regex scan +column_letter_to_numberon everyend_elementcall for'v'/'t'. We already have the cell ref when the'c'starts, so just stash it as@cell_idxthen. Also dropped theinject({})hash build per cell in favour of a singleeachpass.Accumulate raw chunks, cast once
characterswas callingLoader.caston each SAX chunk and concatenating the results. That's slightly wrong if a value ever gets split mid-string (e.g. a number like42.5arriving as"42"+".5"would cast to two separate values before being joined). Easier to just collect the raw chunks and cast once inend_element.Benchmark
Script (run from repo root with
bundle exec ruby -I test bench.rb):Results on M-series Mac (string-heavy workload, real time):