Skip to content

fix(transcript): word spacing, empty boxes & non-monotonic timestamps#6

Merged
preston176 merged 1 commit into
mainfrom
fix/transcript-spacing-timestamps
Jun 9, 2026
Merged

fix(transcript): word spacing, empty boxes & non-monotonic timestamps#6
preston176 merged 1 commit into
mainfrom
fix/transcript-spacing-timestamps

Conversation

@preston176

Copy link
Copy Markdown
Owner

Fixes the two transcript-editor issues from the screenshot.

1. Run-together words + empty boxes (morePDFlost...)
Whisper emits each token with a leading space and the worker stored them raw + joined segment text with "", while the editor renders each word as an adjacent inline <button> with no separator. Whitespace-only chunks became empty clickable boxes.

2. Inaccurate click-to-seek / non-monotonic timestamps (0:49 → 0:40 → 0:47)
Word timestamps from return_timestamps: "word" over chunked/stride long-form audio jump backwards at chunk boundaries.

Fixes

  • Extracted groupWordsIntoSegments into a pure, tested module transcription/group-words.ts: trims each token, drops empties, clamps each word start >= prev.end (monotonic), space-joins segment text.
  • Transcript panel: renders a space separator between words, trims display text (robust for legacy docs), skips whitespace-only tokens.
  • Added group-words.test.ts — 9 cases (trim, empties, null timestamps, monotonic clamp, gap preservation, sentence/gap/max-word segmentation).

Verification

  • tsc --noEmit clean; 38 tests pass (29 existing + 9 new).

Scope note

Spacing + empty boxes are fixed for existing transcripts on reload (render layer). The improved monotonic timestamps apply to newly generated transcripts (existing ones have their times baked in — re-generate to pick them up). Absolute word-timing precision is still bounded by the model; Tiny is roughest — Small/Medium give tighter word boundaries.

The transcript rendered run-together ("morePDFlost...") with stray empty
boxes, and clicking a word seeked to the wrong place (timestamps were even
non-monotonic, e.g. 0:49 -> 0:40).

Root cause was in the Whisper post-processing + rendering:
- worker stored raw tokens (Whisper emits each with a leading space) and
  joined segment text with "", so the editor's per-word <button> spans ran
  together; whitespace-only chunks became empty clickable boxes
- word timestamps from chunked/stride long-form audio jump backwards at chunk
  boundaries, scrambling click-to-seek

Fixes:
- extract groupWordsIntoSegments into a pure, tested module
  (transcription/group-words.ts): trim each token, drop empties, and clamp
  each word to start >= previous word's end (monotonic), space-join segments
- transcript panel: render a space separator between words, trim display text
  (robust for legacy docs), and skip whitespace-only tokens
- add group-words.test.ts (9 cases: trim, empties, null ts, monotonic clamp,
  gap preservation, segmentation)

Spacing + empty boxes are fixed for existing transcripts on reload (render
layer); improved timestamps apply to newly generated transcripts.
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openscript Ready Ready Preview, Comment Jun 9, 2026 7:59am
openscript-app Ready Ready Preview, Comment Jun 9, 2026 7:59am

@preston176 preston176 merged commit d743d6e into main Jun 9, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant