Skip to content

greg7gkb/TalkThrough

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TalkThrough

Live captions and translation for any audio playing on your Mac. Captures system audio with ScreenCaptureKit, transcribes it on-device using macOS 26's SpeechTranscriber, and translates each line via DeepL — all rendered in a floating overlay window that stays on top of every Space.

Useful for watching foreign-language video or sitting in a meeting whose audio you don't speak.

Requirements

  • macOS 26.0 or newer (uses SpeechTranscriber, introduced in the macOS 26 Speech framework).
  • Xcode 26 or newer to build.
  • A free DeepL API key — sign up at https://www.deepl.com/pro-api.
  • An Apple Developer account (free tier is fine) for code signing during development.

Setup

git clone https://github.com/greg7gkb/TalkThrough.git
cd TalkThrough
cp Config/Secrets.example.xcconfig Config/Secrets.xcconfig
# Edit Config/Secrets.xcconfig and paste your DeepL API key
open TalkThrough.xcodeproj

Build and run from Xcode. The first launch will prompt for:

  • Speech Recognition — required to use SpeechTranscriber.
  • Screen & System Audio Recording — required for ScreenCaptureKit to capture audio playing through other apps.

The first transcription session also downloads the speech model for your locale (a few seconds to a minute, one-time).

Configuration

The target translation language is hard-coded to Spanish ("ES") in ContentView.swift for now. Change it to any DeepL target language code"EN", "FR", "DE", "JA", etc. — to translate into something else. A picker is on the roadmap; see PLAN.md.

How it works

ScreenCaptureKit ──audio buffers──▶ SpeechTranscriber ──text──▶ ViewModel
                                    (macOS 26 Speech                │
                                     framework)                     ▼
                                                                 DeepL
                                                                    │
                                                                    ▼
                                                            floating overlay
  • AudioCaptureEngine opens an SCStream on the main display, captures audio only (video is throttled to 1fps at 2×2 to keep ScreenCaptureKit happy), and forwards AVAudioPCMBuffers.
  • SpeechEngine wraps SpeechAnalyzer + SpeechTranscriber with the progressiveTranscription preset. It resamples the audio to whatever format the transcriber wants, runs the results stream, and emits a running transcript by tracking results keyed by audio time range.
  • LiveTranslateViewModel drives both engines and debounces translation calls: partials wait 500ms in case more text arrives, but a max-wait clamp forces a fire after 1s so continuous source audio doesn't starve the translator.
  • TranslationService is a thin DeepL REST client using header-based auth (Authorization: DeepL-Auth-Key).

See PLAN.md for the prior phases, current state, and what's next.

Privacy and data

  • Audio capture is local; raw audio buffers are sent to Apple's SpeechTranscriber (which transcribes on-device on macOS 26+) and then discarded.
  • Transcribed text is sent to DeepL over HTTPS for translation. No other network calls.
  • Your API key lives only in Config/Secrets.xcconfig, which is gitignored.

License

MIT — see LICENSE.

About

Live captions and translation for any audio playing on your Mac.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages