fix(transcript): word spacing, empty boxes & non-monotonic timestamps by preston176 · Pull Request #6 · preston176/openscript

preston176 · 2026-06-09T07:55:48Z

Fixes the two transcript-editor issues from the screenshot.

1. Run-together words + empty boxes (morePDFlost...)
Whisper emits each token with a leading space and the worker stored them raw + joined segment text with "", while the editor renders each word as an adjacent inline <button> with no separator. Whitespace-only chunks became empty clickable boxes.

2. Inaccurate click-to-seek / non-monotonic timestamps (0:49 → 0:40 → 0:47)
Word timestamps from return_timestamps: "word" over chunked/stride long-form audio jump backwards at chunk boundaries.

Fixes

Extracted groupWordsIntoSegments into a pure, tested module transcription/group-words.ts: trims each token, drops empties, clamps each word start >= prev.end (monotonic), space-joins segment text.
Transcript panel: renders a space separator between words, trims display text (robust for legacy docs), skips whitespace-only tokens.
Added group-words.test.ts — 9 cases (trim, empties, null timestamps, monotonic clamp, gap preservation, sentence/gap/max-word segmentation).

Verification

tsc --noEmit clean; 38 tests pass (29 existing + 9 new).

Scope note

Spacing + empty boxes are fixed for existing transcripts on reload (render layer). The improved monotonic timestamps apply to newly generated transcripts (existing ones have their times baked in — re-generate to pick them up). Absolute word-timing precision is still bounded by the model; Tiny is roughest — Small/Medium give tighter word boundaries.

The transcript rendered run-together ("morePDFlost...") with stray empty boxes, and clicking a word seeked to the wrong place (timestamps were even non-monotonic, e.g. 0:49 -> 0:40). Root cause was in the Whisper post-processing + rendering: - worker stored raw tokens (Whisper emits each with a leading space) and joined segment text with "", so the editor's per-word <button> spans ran together; whitespace-only chunks became empty clickable boxes - word timestamps from chunked/stride long-form audio jump backwards at chunk boundaries, scrambling click-to-seek Fixes: - extract groupWordsIntoSegments into a pure, tested module (transcription/group-words.ts): trim each token, drop empties, and clamp each word to start >= previous word's end (monotonic), space-join segments - transcript panel: render a space separator between words, trim display text (robust for legacy docs), and skip whitespace-only tokens - add group-words.test.ts (9 cases: trim, empties, null ts, monotonic clamp, gap preservation, segmentation) Spacing + empty boxes are fixed for existing transcripts on reload (render layer); improved timestamps apply to newly generated transcripts.

vercel · 2026-06-09T07:55:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
openscript	Ready	Preview, Comment	Jun 9, 2026 7:59am
openscript-app	Ready	Preview, Comment	Jun 9, 2026 7:59am

preston176 merged commit d743d6e into main Jun 9, 2026
5 of 7 checks passed

vercel Bot deployed to Preview – openscript-app June 9, 2026 07:57 View deployment

vercel Bot deployed to Preview – openscript June 9, 2026 07:59 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(transcript): word spacing, empty boxes & non-monotonic timestamps#6

fix(transcript): word spacing, empty boxes & non-monotonic timestamps#6
preston176 merged 1 commit into
mainfrom
fix/transcript-spacing-timestamps

preston176 commented Jun 9, 2026

Uh oh!

vercel Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

preston176 commented Jun 9, 2026

Fixes

Verification

Scope note

Uh oh!

vercel Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 9, 2026 •

edited

Loading