tambourine-voice

tambourine-voice

Your personal voice interface for any app. Speak naturally and your words appear wherever your cursor is, with fully customizable AI voice dictation. Open source alternative to Wispr Flow.

Stars: 258

Visit
 screenshot

Tambourine is a personal voice interface tool that allows users to speak naturally and have their words appear wherever the cursor is. It is powered by customizable AI voice dictation, providing a universal voice-to-text interface for emails, messages, documents, code editors, and terminals. Users can capture ideas quickly, type at the speed of thought, and benefit from AI formatting that cleans up speech, adds punctuation, and applies personal dictionaries. Tambourine offers full control and transparency, with the ability to customize AI providers, formatting, and extensions. The tool supports dual-mode recording, real-time speech-to-text, LLM text formatting, context-aware formatting, customizable prompts, and more, making it a versatile solution for dictation and transcription tasks.

README:

Tambourine

Tambourine

GitHub Stars License: AGPL-3.0 Discord DeepWiki

Your personal voice interface for any app. Speak naturally and your words appear wherever your cursor is, powered by customizable AI voice dictation.

Open-source alternative to Wispr Flow, Superwhisper, and Willow.

πŸš€ Hosted Service Coming Soon! Join the waitlist to use Tambourine without running the server yourself.

Home

Settings

Dictating into Windows Notepad

Why?

Your voice, any app. Tambourine gives you a universal voice-to-text interface that works everywhere: emails, messages, documents, code editors, terminals. Press a hotkey, speak, and your words are typed at your cursor. No copy-pasting, no app switching, no limitations.

Speak at the speed of thought. Typing averages 40-50 wpm, but speaking averages 130-160 wpm. Capture ideas before they slip away, and give your hands a break from the keyboard.

AI that understands you. Unlike raw transcription, Tambourine uses AI to format your speech into clean textβ€”removing filler words, adding punctuation, and applying your personal dictionary for technical terms and proper nouns.

Why not native dictation? Built-in dictation is not personalized but Tambourine can be customized to your speaking and writing style, and with a personal dictionary for uncommon terms.

Why not proprietary tools? Unlike Wispr Flow or Superwhisper, this project gives you full control and transparency.

Fully customizable. This is your voice interface, built your way:

  • Choose your AI providers β€” Pick your STT (Cartesia, Deepgram, AssemblyAI, Speechmatics, Azure, AWS, Google, Groq, OpenAI, Nemotron) and LLM (Cerebras, OpenAI, Anthropic, Gemini, Groq, OpenRouter), run fully local with Whisper and Ollama, or add more from Pipecat's supported services
  • Customize the formatting β€” Modify prompts, add custom rules, build your personal dictionary
  • Extend freely β€” Built on Pipecat's modular pipeline, fully open-source

Platform Support

Platform Compatibility
Windows βœ…
macOS βœ…
Linux ⚠️
Android ❌
iOS ❌

Features

  • Dual-Mode Recording
    • Hold-to-record: Ctrl+Alt+` - Hold to record, release to stop
    • Toggle mode: Ctrl+Alt+Space - Press to start, press again to stop
  • Real-time Speech-to-Text - Fast transcription with configurable STT providers
  • LLM Text Formatting - Removes filler words, adds punctuation using configurable LLM
  • Context-Aware Formatting - Automatically detect which application is focused and tailor formatting accordingly. Email clients get proper salutations and sign-offs, messaging apps get casual formatting, code editors get syntax-aware output with proper casing and punctuation.
  • Customizable Prompts - Edit formatting rules, enable advanced features, add personal dictionary
  • In-App Provider Selection - Switch STT and LLM providers without restarting
  • Automatic Typing - Input text directly at focused position
  • Recording Overlay - Floating visual indicator
  • Transcription History - View and copy previous dictations
  • Paste Last Transcription - Re-type previous dictation with Ctrl+Alt+.
  • Auto-Mute Audio - Automatically mute system audio while dictating (Windows/macOS)
  • Misc. - System tray integration, microphone selection, sound feedback, configure hotkeys

Planned Features

  • Voice-Driven Text Modification - Highlight existing text and describe how to modify it. Select a paragraph and say "make this more formal" or "fix the grammar" to transform text in place.
  • Voice Shortcuts - Create custom triggers that expand to full formatted text. Say "insert meeting link" to paste your scheduling URL, or "sign off" for your email signature.
  • Auto-Learning Dictionary - Automatically learn new words, names, and terminology from your usage patterns rather than requiring manual dictionary entries.
  • Observability and Evaluation - Integrate tooling from Pipecat and other voice agent frameworks to track transcription quality, latency metrics, and formatting accuracy. Use insights to continuously optimize your personal dictation workflow.
  • Hosted Service - Optional cloud-hosted backend so you can use Tambourine without running the Python server locally.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Tauri App (app/)                       β”‚
β”‚  - Global hotkeys (Ctrl+Alt+Space, Ctrl+Alt+`)              β”‚
β”‚  - Rust backend for keyboard and audio controls             β”‚
β”‚  - React frontend with SmallWebRTC client                   β”‚
β”‚  - System tray with show/hide toggle                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                          API :8765
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Python Server (server/)                    β”‚
β”‚  - Pipecat SmallWebRTC for audio streaming                  β”‚
β”‚  - STT providers (Cartesia, Deepgram, Groq, and more)       β”‚
β”‚  - LLM formatting (Cerebras, OpenAI, Anthropic, and more)   β”‚
β”‚  - Runtime config via WebRTC data channel (RTVI protocol)   β”‚
β”‚  - Returns cleaned text to app                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites

  • Rust
  • Node.js
  • pnpm
  • Python 3.13+
  • uv (Python package manager)

Linux Dependencies

sudo apt-get install libwebkit2gtk-4.1-dev build-essential curl wget file \
  libxdo-dev libssl-dev libayatana-appindicator3-dev librsvg2-dev libgtk-3-dev

Permissions

Microphone Access

When you first use Tambourine, your operating system will prompt you to grant microphone access. Accept this permission to enable voice dictation.

macOS Accessibility Permissions

On macOS, Tambourine needs accessibility permissions to type text at your cursor position.

  • Running the built app: Grant accessibility access to "Tambourine"
  • Running in development: Grant accessibility access to the application you run the code from:
    • If running from VS Code: Add "Visual Studio Code"
    • If running from Terminal: Add "Terminal" (or your terminal app like iTerm2)

Quick Start

⚠️ Build in Progress This project is under active development. Core features work well, but expect breaking changes to the code, architecture, and configuration as the project evolves.

1. Get API Keys

Choose your providers (at least one STT and one LLM required):

Note: The following are examples of providers with generous free tiers. Tambourine supports many more providers with paid API keysβ€”see server/.env.example for the full list.

Provider Type Free Tier Sign Up
Cartesia STT 3 hours/month cartesia.ai
Cerebras LLM 10K tokens/day cloud.cerebras.ai
Gemini LLM 1,500 requests/day (1M tokens/min burst) aistudio.google.com
Groq Both Model-specific (100K-500K tokens/day) console.groq.com

For fully local deployment:

  • Set OLLAMA_BASE_URL=http://localhost:11434 in .env
  • Set WHISPER_ENABLED=true for local STT
  • Optional: set WHISPER_DEVICE (cpu or cuda), WHISPER_MODEL (for example tiny, base, small, medium, large), and WHISPER_COMPUTE_TYPE (for example int8, float16)

2. Set Up the Server

cd server

# Copy environment template and add your API keys
cp .env.example .env

# Install dependencies
uv sync

# Start the server
uv run python main.py

3. Set Up the App

cd app

# Install dependencies
pnpm install

# Start development mode
pnpm dev

4. Usage

  1. Start the server first (uv run python main.py)
  2. Start the app (pnpm dev)
  3. Use either shortcut:
    • Toggle: Press Ctrl+Alt+Space to start, press again to stop
    • Hold: Hold Ctrl+Alt+` while speaking, release to stop
  4. Your cleaned text is typed at your cursor

Server Commands

cd server

# Start server (default: 127.0.0.1:8765)
uv run python main.py

# Start with custom host/port
uv run python main.py --host 0.0.0.0 --port 9000

# Enable verbose logging
uv run python main.py --verbose

Docker Deployment

Run the server in Docker instead of installing Python dependencies locally. Server requires host networking due to RTP/WebRTC random UDP port assignments.

To use GPU acceleration for a locally hosted Whisper model, set up GPU access for your container daemon:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf https://podman-desktop.io/docs/podman/gpu

cd server

# Copy environment template and add your API keys
cp .env.example .env

# Build and start the container
docker compose up --build -d

# View logs
docker compose logs -f

# Stop the container
docker compose down

# Update to latest code
docker compose down && docker compose up --build -d

The .env file is read at runtime (not baked into the image), so your API keys stay secure.

Docker Networking Troubleshooting

If the container shows as running and logs print Tambourine Server Ready!, but the client still cannot connect (or http://127.0.0.1:8765/health fails from your host), verify that host networking is actually enabled/supported by your Docker runtime.

This project uses network_mode: "host" in server/docker-compose.yml for WebRTC/RTP reliability. If host networking is disabled in your Docker setup, the container can appear healthy while still being unreachable from the app.

If you see CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all, your runtime is likely trying to use the Podman GPU stanza with Docker. In server/docker-compose.yml, keep the GPU block that matches your runtime and disable the other one.

App Commands

cd app

# Development
pnpm check         # Run all checks (lint + typecheck + knip + test + cargo)
pnpm dev           # Start Tauri app in dev mode

# Production Build
pnpm build         # Build for current platform

API Reference

The server exposes HTTP endpoints on port 8765 (default). Sample endpoints:

  • GET /health - Health check for container orchestration
  • GET /api/providers - List available STT and LLM providers

See server/main.py and server/api/config_api.py for all endpoints. All endpoints are rate-limited.

Configuration

Server Configuration (.env)

Copy .env.example to .env and add API keys for at least one STT and one LLM provider. See the example file for all supported providers including Deepgram, Cartesia, OpenAI, Anthropic, Cerebras, Groq, AWS, and more. Additional Pipecat-supported providers can be added easily.

You can optionally configure Silero VAD parameters via environment variables (see server/.env.example for VAD_CONFIDENCE, VAD_START_SECS, VAD_STOP_SECS, and VAD_MIN_VOLUME).

App Configuration

The app connects to http://127.0.0.1:8765 by default via WebRTC. Settings are persisted locally and include:

  • Providers - Select active STT and LLM providers from available options
  • Audio - Microphone selection, sound feedback, auto-mute during recording
  • Hotkeys - Customize toggle and hold-to-record shortcuts
  • LLM Formatting Prompt - Three customizable sections:
    • Core Formatting Rules - Filler word removal, punctuation, capitalization
    • Advanced Features - Backtrack corrections ("scratch that"), list formatting
    • Personal Dictionary - Custom words

Data Management

Tambourine supports exporting and importing your configuration data, making it easy to backup settings, share configurations, or try community examples.

Export Data

Go to Settings > Data Management and click the export button. Select a folder and Tambourine exports 5 files:

File Description
tambourine-settings.json App settings (hotkeys, providers, audio preferences)
tambourine-history.json Transcription history entries
tambourine-prompt-main.md Core formatting rules
tambourine-prompt-advanced.md Advanced features (backtrack corrections, list formatting)
tambourine-prompt-dictionary.md Personal dictionary for custom terminology

Import Data

Click the import button in Settings > Data Management and select one or more files (.json or .md). Tambourine auto-detects file types from their content.

For history imports, you can choose a merge strategy:

  • Merge (skip duplicates) - Add new entries, skip existing ones
  • Merge (keep all) - Append all imported entries
  • Replace - Delete existing history and use imported entries

Using Examples

The examples/ folder contains ready-to-use prompt configurations for different use cases.

To use an example:

  1. Open Settings > Data Management
  2. Click the import button
  3. Navigate to examples/<example-name>/
  4. Select all three .md files
  5. Click Open

Your prompts will be updated immediately. You can further customize them in Settings > LLM Formatting Prompt.

Tech Stack

  • Desktop App: Rust, Tauri
  • Frontend: TypeScript, React, Vite
  • UI: Mantine, Tailwind CSS
  • State Management: Zustand, Tanstack Query, XState
  • Backend: Python, FastAPI
  • Voice Pipeline: Pipecat
  • Communications: WebRTC
  • Validation: Zod, Pydantic
  • Code Quality: Biome, Ruff, Ty, Clippy

Acknowledgments

Built with Tauri for the cross-platform desktop app and Pipecat for the modular voice AI pipeline.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Support

If you find Tambourine useful, here are ways to support the project:

  • Star the repo β€” It helps others discover the project and motivates development
  • Report issues β€” Found a bug or have a feature request? Open an issue
  • Join Discord β€” Connect with the community for help and discussions in our Discord server
  • Contribute β€” Check out CONTRIBUTING.md for guidelines on how to contribute

License

AGPL-3.0

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for tambourine-voice

Similar Open Source Tools

For similar tasks

For similar jobs