
ito
Ito, smart dictation in every application
Stars: 128

Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.
README:

Ito is an intelligent voice assistant that brings seamless voice dictation to any application on your computer. Simply hold down your trigger key, speak naturally, and watch your words appear instantly in any text field.
- Works in any app: Emails, documents, chat applications, web browsers, code editors
- Global keyboard shortcuts: Customizable trigger keys that work system-wide
- Real-time transcription: High-accuracy speech-to-text powered by advanced AI models
- Instant text insertion: Automatically types transcribed text into the focused text field
- Custom dictionary: Add technical terms, names, and specialized vocabulary
- Context awareness: Learns from your usage patterns to improve accuracy
- Multi-language support: Transcribe in multiple languages
- Intelligent punctuation: Automatically adds appropriate punctuation
- Flexible shortcuts: Configure any key combination as your trigger
- Audio preferences: Choose your preferred microphone
- Privacy controls: Local processing options and data control settings
- Seamless integration: Works with any application
- Notes system: Automatically save transcriptions for later reference
- Interaction history: Track your dictation sessions and improve over time
- Cloud sync: Keep your settings and data synchronized across devices
- Export capabilities: Export your notes and interaction data
- macOS 10.15+ or Windows 10+
- Node.js 20+ and Bun (for development)
- Rust toolchain (for building native components)
- Microphone access and Accessibility permissions
-
Download the latest release from heyito.ai or the GitHub releases page
-
Install the application:
-
macOS: Open the
.dmg
file and drag Ito to Applications -
Windows: Run the
.exe
installer and follow the setup wizard
-
macOS: Open the
-
Grant permissions when prompted:
- Microphone access: Required for voice input
- Accessibility access: Required for global keyboard shortcuts and text insertion
-
Set up authentication:
- Sign in with Google, Apple, Github through Auth0 or create a local account
- Complete the guided onboarding process
-
Configure your trigger key: Choose a comfortable keyboard shortcut (default:
Fn + Space
) - Test your microphone: Ensure clear audio input during the setup process
- Try it out: Hold your trigger key and speak into any text field
- Customize settings: Adjust voice sensitivity, shortcuts, and preferences
Important: Ito requires a local transcription server for voice processing. See server/README.md for detailed server setup instructions.
# Clone the repository
git clone https://github.com/heyito/ito.git
cd ito
# Install dependencies
bun install
# Set up environment variables
cp .env.example .env
# Build native components (Rust binaries)
./build-binaries.sh
# Set up and start the server (required for transcription)
cd server
cp .env.example .env # Edit with your API keys
bun install
bun run local-db-up # Start PostgreSQL database
bun run db:migrate # Run database migrations
bun run dev # Start development server
cd ..
# Start the Electron app (in a new terminal)
bun run dev
-
Rust: Install via rustup.rs
- Windows users: See Windows-specific instructions below for GNU toolchain setup
- macOS/Linux users: Default installation is sufficient
-
Xcode Command Line Tools:
xcode-select --install
Required Setup:
-
Install Docker Desktop: Download from docker.com and ensure it's running
-
Install Rust with GNU toolchain (for native component development):
# Install rustup (Rust installer) curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Install the GNU toolchain (required for native components) rustup toolchain install stable-x86_64-pc-windows-gnu rustup target add x86_64-pc-windows-gnu
-
Install 7-Zip:
winget install 7zip.7zip
-
Download and install GCC & MinGW-w64 and add to path: https://winlibs.com/
-
Restart your terminal to pick up PATH changes
Note: Windows builds use Docker for cross-compilation to ensure consistent builds. The Docker container handles the Windows build environment automatically.
ito/
โโโ app/ # Electron renderer (React frontend)
โ โโโ components/ # React components
โ โโโ store/ # Zustand state management
โ โโโ styles/ # TailwindCSS styles
โโโ lib/ # Shared library code
โ โโโ main/ # Electron main process
โ โโโ preload/ # Preload scripts & IPC
โ โโโ media/ # Audio/keyboard native interfaces
โโโ native/ # Native components (Rust/Swift)
โ โโโ audio-recorder/ # Audio capture (Rust)
โ โโโ global-key-listener/ # Keyboard events (Rust)
โ โโโ text-writer/ # Text insertion (Rust)
โ โโโ active-application/ # Get the active application for context (Rust)
โโโ server/ # gRPC transcription server
โ โโโ src/ # Server implementation
โ โโโ infra/ # AWS infrastructure (CDK)
โโโ resources/ # Build resources & assets
# Development
bun run dev # Start with hot reload
bun run dev:rust # Build Rust components and start dev
# Building Native Components
bun run build:rust # Build for current platform
bun run build:rust:mac # Build for macOS (with universal binary)
bun run build:rust:win # Build for Windows
# Building Application
bun run build:mac # Build for macOS
bun run build:win # Build for Windows
./build-app.sh mac # Build macOS using build script
./build-app.sh windows # Build Windows using build script (requires Docker)
bun run build:unpack # Build unpacked for testing
# Code Quality
bun run lint # Run ESLint
bun run format # Run Prettier
bun run lint:fix # Fix linting issues
Ito is built as a modern Electron application with a sophisticated multi-process architecture:
- Main Process: Handles system integration, permissions, and native component coordination
- Renderer Process: React-based UI with real-time audio visualization
- Preload Scripts: Secure IPC bridge between main and renderer processes
- Native Components: High-performance Rust binaries for audio capture and keyboard handling
Frontend:
- Electron - Cross-platform desktop framework
- React 19 - Modern UI library with concurrent features
- TypeScript - Type-safe development
- TailwindCSS - Utility-first styling
- Zustand - Lightweight state management
- Framer Motion - Smooth animations
Backend:
- Node.js - Runtime environment
- gRPC - High-performance RPC for transcription services
- SQLite - Local data storage
- Protocol Buffers - Efficient data serialization
Native Components:
- Rust - System-level audio recording and keyboard event handling
- Swift - macOS-specific text manipulation and accessibility features
- cpal - Cross-platform audio library
- enigo - Cross-platform input simulation
Infrastructure:
- AWS CDK - Infrastructure as code
- Docker - Containerized deployments
- Auth0 - Authentication and user management
graph TD
A[User Holds Trigger Key] --> B[Global Key Listener]
B --> C[Main Process]
C --> D[Audio Recorder Service]
D --> E[gRPC Transcription Service]
E --> F[AI Transcription Model]
F --> G[Transcribed Text]
G --> H[Text Writer Service]
H --> I[Active Text Field]
Customize your trigger keys in Settings > Keyboard:
-
Single key:
Space
,Fn
, etc. -
Key combinations:
Cmd + Space
,Ctrl + Shift + V
, etc. -
Complex shortcuts:
Fn + Cmd + Space
for advanced workflows
Fine-tune audio capture in Settings > Audio:
- Microphone selection: Choose from available input devices
- Sensitivity adjustment: Optimize for your voice and environment
- Noise reduction: Filter background noise automatically
- Audio feedback: Enable/disable sound effects
Control your data in Settings > General:
- Local processing: Keep voice data on your device
- Cloud sync: Synchronize settings across devices
- Analytics: Share anonymous usage data (optional)
- Data export: Download your notes and interaction history
- Local-enabled: Voice processing can be done entirely on your device or using our cloud
- Encrypted transmission: All network communication uses TLS encryption
- Minimal data collection: Only essential data is processed and stored
- User control: Full control and transparency over data retention and deletion
Ito requires specific system permissions to function:
- Microphone Access: To capture your voice for transcription
- Accessibility Access: To detect keyboard shortcuts and insert text
- Network Access: For cloud features and updates (optional)
This project is open source under the GNU General Public License. You can:
- Audit the source code for security and privacy
- Contribute improvements and bug fixes
- Fork and customize for your specific needs
- Report security issues through responsible disclosure
We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation, your help makes Ito better for everyone.
- Fork the repository and clone your fork
-
Create a feature branch from
dev
- Make your changes with clear commit messages
- Test thoroughly across supported platforms
- Submit a pull request with a detailed description
- Code Style: Use Prettier and ESLint configurations
- Type Safety: Maintain strong TypeScript typing
- Testing: Add tests for new features
- Documentation: Update docs for API changes
- Performance: Consider impact on time between recording and text insertion
- Accuracy improvements: Better transcription algorithms
- Language support: Additional language models
- UI/UX enhancements: Better user experience
- Platform support: Windows stability testing, Linux compatibility
- Documentation: Tutorials, guides, and examples
This project is licensed under the GNU General Public License - see the LICENSE file for details.
Ito is built with and inspired by amazing open source projects:
- Electron React App by @guasam - The foundational template that provided our modern Electron + React architecture
- Electron - Cross-platform desktop apps with web technologies
- React - Modern UI development
- Rust - Systems programming language for native components
- gRPC - High-performance RPC framework
- TailwindCSS - Utility-first CSS framework
- Community: GitHub Discussions
- Issues: GitHub Issues
- Website: heyito.ai
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ito
Similar Open Source Tools

ito
Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.

opcode
opcode is a powerful desktop application built with Tauri 2 that serves as a command center for interacting with Claude Code. It offers a visual GUI for managing Claude Code sessions, creating custom agents, tracking usage, and more. Users can navigate projects, create specialized AI agents, monitor usage analytics, manage MCP servers, create session checkpoints, edit CLAUDE.md files, and more. The tool bridges the gap between command-line tools and visual experiences, making AI-assisted development more intuitive and productive.

ComfyUI_Yvann-Nodes
ComfyUI_Yvann-Nodes is a pack of custom nodes that enable audio reactivity within ComfyUI, allowing users to create AI-driven animations that sync with music. Users can generate audio reactive AI videos, control AI generation styles, content, and composition with any audio input. The tool is simple to use by dropping workflows in ComfyUI and specifying audio and visual inputs. It is flexible and works with existing ComfyUI AI tech and nodes like IPAdapter, AnimateDiff, and ControlNet. Users can pick workflows for Images โ Video or Video โ Video, download the corresponding .json file, drop it into ComfyUI, install missing custom nodes, set inputs, and generate audio-reactive animations.

llm-apps-java-spring-ai
The 'LLM Applications with Java and Spring AI' repository provides samples demonstrating how to build Java applications powered by Generative AI and Large Language Models (LLMs) using Spring AI. It includes projects for question answering, chat completion models, prompts, templates, multimodality, output converters, embedding models, document ETL pipeline, function calling, image models, and audio models. The repository also lists prerequisites such as Java 21, Docker/Podman, Mistral AI API Key, OpenAI API Key, and Ollama. Users can explore various use cases and projects to leverage LLMs for text generation, vector transformation, document processing, and more.

J.A.R.V.I.S.2.0
J.A.R.V.I.S. 2.0 is an AI-powered assistant designed for voice commands, capable of tasks like providing weather reports, summarizing news, sending emails, and more. It features voice activation, speech recognition, AI responses, and handles multiple tasks including email sending, weather reports, news reading, image generation, database functions, phone call automation, AI-based task execution, website & application automation, and knowledge-based interactions. The assistant also includes timeout handling, automatic input processing, and the ability to call multiple functions simultaneously. It requires Python 3.9 or later and specific API keys for weather, news, email, and AI access. The tool integrates Gemini AI for function execution and Ollama as a fallback mechanism. It utilizes a RAG-based knowledge system and ADB integration for phone automation. Future enhancements include deeper mobile integration, advanced AI-driven automation, improved NLP-based command execution, and multi-modal interactions.

LLM-Navigation
LLM-Navigation is a repository dedicated to documenting learning records related to large models, including basic knowledge, prompt engineering, building effective agents, model expansion capabilities, security measures against prompt injection, and applications in various fields such as AI agent control, browser automation, financial analysis, 3D modeling, and tool navigation using MCP servers. The repository aims to organize and collect information for personal learning and self-improvement through AI exploration.

general_framework
General Framework is a cross-platform library designed to help create apps with a unified codebase using Flutter. It offers features such as cross-platform support, standardized style code, a CLI for easier usage, API integration for bot development, customizable extensions for faster development, and user-friendly information. The library is intended to streamline the app, server, bot, and userbot creation process by providing a comprehensive set of tools and functionalities.

kcores-llm-arena
KCORES LLM Arena is a large model evaluation tool that focuses on real-world scenarios, using human scoring and benchmark testing to assess performance. It aims to provide an unbiased evaluation of large models in real-world applications. The tool includes programming ability tests and specific benchmarks like Mandelbrot Set, Mars Mission, Solar System, and Ball Bouncing Inside Spinning Heptagon. It supports various programming languages and emphasizes performance optimization, rendering, animations, physics simulations, and creative implementations.

LabelQuick
LabelQuick_V2.0 is a fast image annotation tool designed and developed by the AI Horizon team. This version has been optimized and improved based on the previous version. It provides an intuitive interface and powerful annotation and segmentation functions to efficiently complete dataset annotation work. The tool supports video object tracking annotation, quick annotation by clicking, and various video operations. It introduces the SAM2 model for accurate and efficient object detection in video frames, reducing manual intervention and improving annotation quality. The tool is designed for Windows systems and requires a minimum of 6GB of memory.

presenton
Presenton is an open-source AI presentation generator and API that allows users to create professional presentations locally on their devices. It offers complete control over the presentation workflow, including custom templates, AI template generation, flexible generation options, and export capabilities. Users can use their own API keys for various models, integrate with Ollama for local model running, and connect to OpenAI-compatible endpoints. The tool supports multiple providers for text and image generation, runs locally without cloud dependencies, and can be deployed as a Docker container with GPU support.

Fay
Fay is an open-source digital human framework that offers different versions for various purposes. The 'ๅธฆ่ดงๅฎๆด็' is suitable for online and offline salespersons. The 'ๅฉ็ๅฎๆด็' serves as a human-machine interactive digital assistant that can also control devices upon command. The 'agent็' is designed to be an autonomous agent capable of making decisions and contacting its owner. The framework provides updates and improvements across its different versions, including features like emotion analysis integration, model optimizations, and compatibility enhancements. Users can access detailed documentation for each version through the provided links.

ComfyUI-Ollama-Describer
ComfyUI-Ollama-Describer is an extension for ComfyUI that enables the use of LLM models provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3, or Mistral. It requires the Ollama library for interacting with large-scale language models, supporting GPUs using CUDA and AMD GPUs on Windows, Linux, and Mac. The extension allows users to run Ollama through Docker and utilize NVIDIA GPUs for faster processing. It provides nodes for image description, text description, image captioning, and text transformation, with various customizable parameters for model selection, API communication, response generation, and model memory management.

pocketpal-ai
PocketPal AI is a versatile virtual assistant tool designed to streamline daily tasks and enhance productivity. It leverages artificial intelligence technology to provide personalized assistance in managing schedules, organizing information, setting reminders, and more. With its intuitive interface and smart features, PocketPal AI aims to simplify users' lives by automating routine activities and offering proactive suggestions for optimal time management and task prioritization.

llamafarm
LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.

DeepBattler
DeepBattler is a tool designed for Hearthstone Battlegrounds players, providing real-time strategic advice and insights to improve gameplay experience. It integrates with the Hearthstone Deck Tracker plugin and offers voice-assisted guidance. The tool is powered by a large language model (LLM) and can match the strength of top players on EU servers. Users can set up the tool by adding dependencies, configuring the plugin path, and launching the LLM agent. DeepBattler is licensed for personal, educational, and non-commercial use, with guidelines on non-commercial distribution and acknowledgment of external contributions.

gemini-cli
Gemini CLI is an open-source AI agent that provides lightweight access to Gemini, offering powerful capabilities like code understanding, generation, automation, integration, and advanced features. It is designed for developers who prefer working in the command line and offers extensibility through MCP support. The tool integrates directly into GitHub workflows and offers various authentication options for individual developers, enterprise teams, and production workloads. With features like code querying, editing, app generation, debugging, and GitHub integration, Gemini CLI aims to streamline development workflows and enhance productivity.
For similar tasks

ito
Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.

Mindolph
Mindolph is an open source personal knowledge management software for all desktop platforms. It allows users to create and manage their own files in separate workspaces with saving in their local storage, organize their files as a tree in their workspaces, and have multiple tabs for opening files instead of a single file window. Mindolph supports Mind Map, Markdown, PlantUML, CSV sheet, and plain text file formats. It also has features such as quickly navigating to files and searching text in files under a specific folder, editing mind maps easily and quickly with key shortcuts, supporting themes and providing some pre-defined themes, importing from other mind map formats, and exporting to other file formats.

hoarder-app
Hoarder is a self-hostable bookmark manager with a focus on privacy and customization. It features automatic link previews, full-text search, AI-based tagging, and a variety of import and export options. Hoarder is designed to be easy to use and extensible, with a plugin system that allows users to add their own features and integrations.

rocketnotes
Rocketnotes is a web-based Markdown note taking app with LLM-powered text completion, chat and semantic search. It utilizes a 100% serverless RAG pipeline build with langchain, sentence-transformers, faiss and OpenAI or Anthropic API.

nextlint
Nextlint is a rich text editor (WYSIWYG) written in Svelte, using MeltUI headless UI and tailwindcss CSS framework. It is built on top of tiptap editor (headless editor) and prosemirror. Nextlint is easy to use, develop, and maintain. It has a prompt engine that helps to integrate with any AI API and enhance the writing experience. Dark/Light theme is supported and customizable.

reor
Reor is an AI-powered desktop note-taking app that automatically links related notes, answers questions on your notes, and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor. The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Ollama, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. Connecting to OpenAI or OpenAI-compatible APIs like Oobabooga is also supported.

obsidian-companion
Companion is an Obsidian plugin that adds an AI-powered autocomplete feature to your note-taking and personal knowledge management platform. With Companion, you can write notes more quickly and easily by receiving suggestions for completing words, phrases, and even entire sentences based on the context of your writing. The autocomplete feature uses OpenAI's state-of-the-art GPT-3 and GPT-3.5, including ChatGPT, and locally hosted Ollama models, among others, to generate smart suggestions that are tailored to your specific writing style and preferences. Support for more models is planned, too.

uxie
Uxie is a PDF reader app designed to revolutionize the learning experience. It offers features such as annotation, note-taking, collaboration tools, integration with LLM for enhanced learning, and flashcard generation with LLM feedback. Built using Nextjs, tRPC, Zod, TypeScript, Tailwind CSS, React Query, React Hook Form, Supabase, Prisma, and various other tools. Users can take notes, summarize PDFs, chat and collaborate with others, create custom blocks in the editor, and use AI-powered text autocompletion. The tool allows users to craft simple flashcards, test knowledge, answer questions, and receive instant feedback through AI evaluation.
For similar jobs

ChatFAQ
ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.

anything-llm
AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

mikupad
mikupad is a lightweight and efficient language model front-end powered by ReactJS, all packed into a single HTML file. Inspired by the likes of NovelAI, it provides a simple yet powerful interface for generating text with the help of various backends.

glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

firecrawl
Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.