izwi

izwi

Local first speech AI engine for transcription, TTS, and voice workflows.

Stars: 132

Visit
 screenshot

Izwi is a local-first audio inference engine for text-to-speech (TTS), automatic speech recognition (ASR), and voice AI workflows. It operates on your machine without relying on cloud services or API keys, ensuring data privacy. Izwi offers core capabilities such as real-time voice conversations with AI, generating natural speech from text, converting audio to text accurately, identifying multiple speakers, voice cloning, creating custom voices, word-level audio-text alignment, and text-based AI conversations. The server provides OpenAI-compatible API routes under `/v1`.

README:

Izwi icon

Izwi

Local-first audio inference engine for TTS, ASR, and voice AI workflows.

WebsiteDocumentationReleasesGetting Started

Izwi Screenshot


Overview

Izwi is a privacy-focused audio AI platform that runs entirely on your machine. No cloud services, no API keys, no data leaving your device.

Core capabilities:

  • Voice Mode — Real-time voice conversations with AI
  • Text-to-Speech — Generate natural speech from text
  • Speech Recognition — Convert audio to text with high accuracy
  • Speaker Diarization — Identify and separate multiple speakers
  • Voice Cloning — Clone any voice from a short audio sample
  • Voice Design — Create custom voices from text descriptions
  • Forced Alignment — Word-level audio-text alignment
  • Chat — Text-based AI conversations

The server exposes OpenAI-compatible API routes under /v1.


Quick Install

macOS

Download the latest .dmg from GitHub Releases:

  1. Open the .dmg file
  2. Drag Izwi.app to Applications
  3. Launch Izwi

Linux

wget https://github.com/agentem-ai/izwi/releases/latest/download/izwi_amd64.deb
sudo dpkg -i izwi_amd64.deb

Windows

Download and run the installer from GitHub Releases.

Full installation guides: macOSLinuxWindowsFrom Source


Quick Start

1. Start the server

izwi serve

Open http://localhost:8080 in your browser.

2. Download a model

izwi pull Qwen3-TTS-12Hz-0.6B-Base

3. Generate speech

izwi tts "Hello from Izwi!" --output hello.wav

4. Transcribe audio

izwi pull Qwen3-ASR-0.6B
izwi transcribe audio.wav

Long-form ASR is handled automatically: Izwi now chunks long recordings, stitches overlapping transcripts, and returns a full transcript instead of only the first model window.

Optional tuning knobs:

IZWI_ASR_CHUNK_TARGET_SECS=24
IZWI_ASR_CHUNK_MAX_SECS=30
IZWI_ASR_CHUNK_OVERLAP_SECS=3

Supported Models

Category Models
TTS Qwen3-TTS (0.6B, 1.7B), LFM2-Audio
ASR Qwen3-ASR (0.6B, 1.7B), Parakeet TDT
Diarization Sortformer 4-speaker
Chat Qwen3 (0.6B, 1.7B), Gemma 3 (1B, 4B)
Alignment Qwen3-ForcedAligner

Run izwi list to see all available models.

Full model documentation: Models Guide


Documentation

Resource Link
Getting Started izwiai.com/docs/getting-started
Installation izwiai.com/docs/installation
Features izwiai.com/docs/features
CLI Reference izwiai.com/docs/cli
Models izwiai.com/docs/models
Troubleshooting izwiai.com/docs/troubleshooting

License

Apache 2.0

Acknowledgments

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for izwi

Similar Open Source Tools

For similar tasks

For similar jobs