Live captions and translation for any audio playing on your Mac. Captures system audio with ScreenCaptureKit, transcribes it on-device using macOS 26's SpeechTranscriber, and translates each line via DeepL — all rendered in a floating overlay window that stays on top of every Space.
Useful for watching foreign-language video or sitting in a meeting whose audio you don't speak.
- macOS 26.0 or newer (uses
SpeechTranscriber, introduced in the macOS 26 Speech framework). - Xcode 26 or newer to build.
- A free DeepL API key — sign up at https://www.deepl.com/pro-api.
- An Apple Developer account (free tier is fine) for code signing during development.
git clone https://github.com/greg7gkb/TalkThrough.git
cd TalkThrough
cp Config/Secrets.example.xcconfig Config/Secrets.xcconfig
# Edit Config/Secrets.xcconfig and paste your DeepL API key
open TalkThrough.xcodeprojBuild and run from Xcode. The first launch will prompt for:
- Speech Recognition — required to use
SpeechTranscriber. - Screen & System Audio Recording — required for
ScreenCaptureKitto capture audio playing through other apps.
The first transcription session also downloads the speech model for your locale (a few seconds to a minute, one-time).
The target translation language is hard-coded to Spanish ("ES") in ContentView.swift for now. Change it to any DeepL target language code — "EN", "FR", "DE", "JA", etc. — to translate into something else. A picker is on the roadmap; see PLAN.md.
ScreenCaptureKit ──audio buffers──▶ SpeechTranscriber ──text──▶ ViewModel
(macOS 26 Speech │
framework) ▼
DeepL
│
▼
floating overlay
AudioCaptureEngineopens anSCStreamon the main display, captures audio only (video is throttled to 1fps at 2×2 to keep ScreenCaptureKit happy), and forwardsAVAudioPCMBuffers.SpeechEnginewrapsSpeechAnalyzer+SpeechTranscriberwith theprogressiveTranscriptionpreset. It resamples the audio to whatever format the transcriber wants, runs the results stream, and emits a running transcript by tracking results keyed by audio time range.LiveTranslateViewModeldrives both engines and debounces translation calls: partials wait 500ms in case more text arrives, but a max-wait clamp forces a fire after 1s so continuous source audio doesn't starve the translator.TranslationServiceis a thin DeepL REST client using header-based auth (Authorization: DeepL-Auth-Key).
See PLAN.md for the prior phases, current state, and what's next.
- Audio capture is local; raw audio buffers are sent to Apple's
SpeechTranscriber(which transcribes on-device on macOS 26+) and then discarded. - Transcribed text is sent to DeepL over HTTPS for translation. No other network calls.
- Your API key lives only in
Config/Secrets.xcconfig, which is gitignored.
MIT — see LICENSE.