Skip to content

niranjanakella/Orator

Repository files navigation

Orator TTS Engine

Kokoro TTS Python ONNX Runtime NumPy Espeak macOS

High-quality neural text-to-speech with 50+ voices across 9 languages

Orator CLI Screenshot

Orator Demo

Click for Demo ☝️ [Will update the demo]

UPDATE: Install the python package using pip install -e .

✨ Features

  • 🌍 9 Languages: American English, British English, Spanish, French, Hindi, Italian, Portuguese, Japanese, Chinese
  • 🎭 55 Voices: Male and female voices with unique personalities
  • Lightning Fast: GPU-accelerated inference with streaming audio
  • 🎯 macOS Hotkey: Double-tap Option key ⌥ for instant TTS anywhere
  • 🔊 High Quality: Super high quality neural audio synthesis
  • 🚀 Easy Setup: Installation through UV package manager
  • 📱 System-Wide: Works with any macOS application

🗣️ Available Voices & Languages

Voices/Languages Available

🇺🇸 American English (a)

Female Voices:

  • af (default)
  • af_alloy
  • af_aoede
  • af_bella
  • af_heart
  • af_jessica
  • af_kore
  • af_nicole
  • af_nova
  • af_river
  • af_sarah
  • af_sky

Male Voices:

  • am_adam
  • am_echo
  • am_eric
  • am_fenrir
  • am_liam
  • am_michael
  • am_onyx
  • am_puck
  • am_santa

🇬🇧 British English (b)

Female Voices:

  • bf_alice
  • bf_emma
  • bf_isabella
  • bf_lily

Male Voices:

  • bm_daniel
  • bm_fable
  • bm_george
  • bm_lewis

🇪🇸 Spanish (e)

Female Voices:

  • ef_dora

Male Voices:

  • em_alex
  • em_santa

🇫🇷 French (f)

Female Voices:

  • ff_siwis

🇮🇳 Hindi (h)

Female Voices:

  • hf_alpha
  • hf_beta

Male Voices:

  • hm_omega
  • hm_psi

🇮🇹 Italian (i)

Female Voices:

  • if_sara

Male Voices:

  • im_nicola

🇯🇵 Japanese (j)

Female Voices:

  • jf_alpha
  • jf_gongitsune
  • jf_nezumi
  • jf_tebukuro

Male Voices:

  • jm_kumo

🇧🇷 Portuguese (p)

Female Voices:

  • pf_dora

Male Voices:

  • pm_alex
  • pm_santa

🇨🇳 Chinese (z)

Female Voices:

  • zf_xiaobei
  • zf_xiaoni
  • zf_xiaoxiao
  • zf_xiaoyi

Male Voices:

  • zm_yunjian
  • zm_yunxi
  • zm_yunxia
  • zm_yunyang

🚀 Quick Start

Why UV? The Future of Python Package Management

We recommend UV for this project because it's:

  • 10-100x faster than pip
  • 🔒 More secure with built-in dependency resolution
  • 🎯 Zero configuration - works out of the box
  • 🔄 Drop-in replacement for pip/pipenv/poetry
  • 🌟 Industry standard - used by major Python projects

Installation

Option A: Install as editable package (Recommended)

  1. Clone and install the package:
    # Clone repo
    cd Orator
    
    # Create virtual environment
    python3 -m venv .venv
    source .venv/bin/activate  # On macOS/Linux
    
    # Install in editable mode
    pip install -e .

Option B: Using UV (Fast alternative)

  1. Install UV (if you don't have it):
  • Assumed that python is already installed on your system.
    pip install uv 
  1. Clone and setup the project:
    # Clone repo
    cd Orator
    
    # Create virtual environment and install dependencies
    uv venv --python=3.11
    source .venv/bin/activate  # On macOS/Linux
    uv pip install -r requirements.txt

Additional Setup (Required for both options)

  1. Install espeak-ng (required for phonemization):

    # macOS
    brew install espeak-ng
    
    # Verify installation
    espeak-ng --version
    
    #eSpeak NG text-to-speech: 1.51  Data at: /opt/homebrew/Cellar/espeak-ng/1.51/share/espeak-ng-data
  2. Download model and voices (if not included):

    uv pip install -U "huggingface_hub[cli]"
    
    # Download model
    huggingface-cli download hexgrad/Kokoro-82M --include "onnx/model.onnx" --local-dir ./kokoro_model_onnx/
    
    # Download voices
    huggingface-cli download onnx-community/Kokoro-82M-v1.0-ONNX --include "voices/*" --local-dir ./kokoro_model_onnx/voices
  3. Language Pack

  • By default "en-core-web-sm" is installed through requirements for English, navigate and install other small language packs from spaCy.

🎯 Usage

1. macOS Hotkey Application

Grant Accessibility Permissions First:

  1. Open System Preferences → Security & Privacy → Privacy
  2. Select "Accessibility" from the left panel
  3. Click the lock icon and enter your password
  4. Add your terminal application (Terminal.app, iTerm2, etc.)
  5. Ensure it's checked/enabled

Run the hotkey application:

# Make sure your are inside the virtual environment
python3 macos_tts_hotkey.py

How to use:

  • Select any text in any macOS application
  • Double-tap the Option key (⌥) quickly to start TTS
  • Press Escape key to stop TTS playback at any time
  • Listen to the text being read aloud!

⚙️ Configuration

Hotkey Application Config

Edit config_hotkey.json:

{
    "model_path": "kokoro-v1_0.pth",
    "voices_dir": "voices",
    "voice": "af_bella",
    "speed": 1.0,
    "device": "auto"
}

Voice Selection

Choose voices by language prefix:

  • af_* / am_* - American English
  • bf_* / bm_* - British English
  • ef_* / em_* - Spanish
  • ff_* - French
  • hf_* / hm_* - Hindi
  • if_* / im_* - Italian
  • jf_* / jm_* - Japanese
  • pf_* / pm_* - Portuguese
  • zf_* / zm_* - Chinese

🛠️ Troubleshooting

Common Issues

"Failed to start keyboard monitoring"

  • Grant Accessibility permissions in System Preferences
  • Restart the application after granting permissions

"espeak-ng not found"

# Install espeak-ng
brew install espeak-ng

# Verify installation
which espeak-ng

"Model file not found"

  • Ensure kokoro-v1_0.onnx is in the kokoro_model_onnx directory
  • Check file permissions and path

"CUDA out of memory"

# Use CPU instead
config.device = "cpu"

# Or reduce batch size for long texts

"Voice file not found"

  • Ensure voice files are in the voices/ directory
  • Check that the voice name matches exactly (case-sensitive)

"Stop functionality not working"

  • Ensure the application has focus or accessibility permissions
  • Try pressing Escape key while TTS is actively playing
  • Check terminal logs for any error messages

Performance Optimization

  • GPU Usage: Automatic CUDA detection, falls back to CPU
  • Memory Management: Automatic cleanup after generation
  • Streaming: Use generate_audio_stream() for long texts
  • Caching: Voice packs are cached after first load

🤝 Contributing

We welcome contributions! Please feel free to:

  • Report bugs and issues
  • Suggest new features
  • Submit pull requests
  • Add new voice packs
  • Improve documentation

🗺️ Roadmap

  • Streaming Audio chunks for Long Formers (Controlled low latency)
  • Speed Controls for Audio Stream
  • LLM driven Agentic AI Capabilities
  • Native MacOS application/interface for UI driven audio controlls
  • UI voice swap controlls

🤝 Get Involved

Want to help shape the future of Kokoro TTS? Here's how:

  • 🐛 Report Issues - Help us identify bugs and improvements
  • 💡 Suggest Features - Share your ideas for new functionality
  • 🔧 Contribute Code - Submit PRs for features or fixes
  • 🎨 Design UI/UX - Help design the native app interface
  • 📝 Write Documentation - Improve guides and tutorials
  • 🗣️ Add Voices - Contribute new voice packs and languages

🙏 Acknowledgments

  • Built on the amazing Kokoro TTS model
  • Powered by ONNX and modern neural architectures
  • Inspired by the need for accessible, high-quality TTS

📊 Repository Stats

GitHub Stars GitHub Forks GitHub Issues GitHub Watchers

GitHub Last Commit GitHub Contributors GitHub Repo Size



Made with ❤️ for the open-source community

LinkedIn

About

0rator an open-source TTS engine built for hotkey driven text-to-speech for Apple MacOS

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages