ito

Ito, smart dictation in every application

Stars: 208

Visit

Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.

README:

Ito

Smart dictation. Everywhere you want.

Ito is an intelligent voice assistant that brings seamless voice dictation to any application on your computer. Simply hold down your trigger key, speak naturally, and watch your words appear instantly in any text field.

✨ Features

🎙️ Universal Voice Dictation

Works in any app: Emails, documents, chat applications, web browsers, code editors
Global keyboard shortcuts: Customizable trigger keys that work system-wide
Real-time transcription: High-accuracy speech-to-text powered by advanced AI models
Instant text insertion: Automatically types transcribed text into the focused text field

🧠 Smart & Adaptive

Custom dictionary: Add technical terms, names, and specialized vocabulary
Context awareness: Learns from your usage patterns to improve accuracy
Multi-language support: Transcribe in multiple languages
Intelligent punctuation: Automatically adds appropriate punctuation

⚙️ Powerful Customization

Flexible shortcuts: Configure any key combination as your trigger
Audio preferences: Choose your preferred microphone
Privacy controls: Local processing options and data control settings
Seamless integration: Works with any application

💾 Data Management

Notes system: Automatically save transcriptions for later reference
Interaction history: Track your dictation sessions and improve over time
Cloud sync: Keep your settings and data synchronized across devices
Export capabilities: Export your notes and interaction data

🚀 Quick Start

Prerequisites

macOS 10.15+ or Windows 10+
Node.js 20+ and Bun (for development)
Rust toolchain (for building native components)
Microphone access and Accessibility permissions

Installation

Download the latest release from heyito.ai or the GitHub releases page
Install the application:
- macOS: Open the .dmg file and drag Ito to Applications
- Windows: Run the .exe installer and follow the setup wizard
Grant permissions when prompted:
- Microphone access: Required for voice input
- Accessibility access: Required for global keyboard shortcuts and text insertion
Set up authentication:
- Sign in with Google, Apple, Github through Auth0 or create a local account
- Complete the guided onboarding process

First Use

Configure your trigger key: Choose a comfortable keyboard shortcut (default: Fn + Space)
Test your microphone: Ensure clear audio input during the setup process
Try it out: Hold your trigger key and speak into any text field
Customize settings: Adjust voice sensitivity, shortcuts, and preferences

🛠️ Development

Building from Source

Important: Ito requires a local transcription server for voice processing. See server/README.md for detailed server setup instructions.

# Clone the repository
git clone https://github.com/heyito/ito.git
cd ito

# Install dependencies
bun install

# Set up environment variables
cp .env.example .env

# Build native components (Rust binaries)
./build-binaries.sh

# Set up and start the server (required for transcription)
cd server
cp .env.example .env  # Edit with your API keys
bun install
bun run local-db-up   # Start PostgreSQL database
bun run db:migrate    # Run database migrations
bun run dev           # Start development server
cd ..

# Start the Electron app (in a new terminal)
bun run dev

Build Requirements

All Platforms

Rust: Install via rustup.rs
- Windows users: See Windows-specific instructions below for GNU toolchain setup
- macOS/Linux users: Default installation is sufficient

macOS

Xcode Command Line Tools: xcode-select --install

Windows

Required Setup:

This setup uses git bash for shell operations. Download from git

Install Docker Desktop: Download from docker.com and ensure it's running
Install Rust (with GNU target)

Download and run the official Rust installer for Windows.
This installs rustup and the MSVC toolchain by default.

Add the GNU target (needed for our native components):

rustup toolchain install stable-x86_64-pc-windows-gnu
rustup target add x86_64-pc-windows-gnu

Install 7-Zip

winget install 7zip.7zip

Install GCC & MinGW-w64 via MSYS2

Install MSYS2.

Open the MSYS2 MinGW x64 shell (from the Start Menu).

Update and install the toolchain:

pacman -Syu       # run twice if asked to restart
pacman -S --needed mingw-w64-x86_64-toolchain

Verify the tools exist:

ls /mingw64/bin/gcc.exe /mingw64/bin/dlltool.exe

Use the MinGW tools when building (Git Bash)

You normally develop and build in Git Bash. Before building, prepend the MinGW path:

export PATH="/c/msys64/mingw64/bin:$PATH"
export DLLTOOL="/c/msys64/mingw64/bin/dlltool.exe"
export CC_x86_64_pc_windows_gnu="/c/msys64/mingw64/bin/x86_64-w64-mingw32-gcc.exe"
export AR_x86_64_pc_windows_gnu="/c/msys64/mingw64/bin/ar.exe"
export CARGO_TARGET_X86_64_PC_WINDOWS_GNU_LINKER="/c/msys64/mingw64/bin/x86_64-w64-mingw32-gcc.exe"

Check you’re picking up the right ones:

which gcc       # -> /c/msys64/mingw64/bin/gcc.exe
which dlltool   # -> /c/msys64/mingw64/bin/dlltool.exe

⚠️ Do not add C:\msys64\ucrt64\bin to PATH. That’s the wrong runtime and will break linking.

💡 To avoid running these exports every session, add the lines above to your Git Bash ~/.bashrc file. They will be applied automatically whenever you open a new Git Bash window.

Restart Git Bash if you update MSYS2

Whenever you update MSYS2 packages with pacman -Syu, restart Git Bash so the changes take effect.

Note: Windows builds use Docker for cross-compilation to ensure consistent builds. The Docker container handles the Windows build environment automatically.

Project Structure

ito/
├── app/                    # Electron renderer (React frontend)
│   ├── components/         # React components
│   ├── store/             # Zustand state management
│   └── styles/            # TailwindCSS styles
├── lib/                   # Shared library code
│   ├── main/              # Electron main process
│   ├── preload/           # Preload scripts & IPC
│   └── media/             # Audio/keyboard native interfaces
├── native/                # Native components (Rust/Swift)
│   ├── audio-recorder/    # Audio capture (Rust)
│   ├── global-key-listener/ # Keyboard events (Rust)
│   ├── text-writer/       # Text insertion (Rust)
│   └── active-application/ # Get the active application for context (Rust)
├── server/                # gRPC transcription server
│   ├── src/               # Server implementation
│   └── infra/             # AWS infrastructure (CDK)
└── resources/             # Build resources & assets

Available Scripts

# Development
bun run dev                 # Start with hot reload
bun run dev:rust           # Build Rust components and start dev

# Building Native Components
bun run build:rust         # Build for current platform
bun run build:rust:mac     # Build for macOS (with universal binary)
bun run build:rust:win     # Build for Windows

# Building Application
bun run build:mac          # Build for macOS
bun run build:win          # Build for Windows
./build-app.sh mac          # Build macOS using build script
./build-app.sh windows      # Build Windows using build script (requires Docker)

# Code Quality
bun run lint               # Run ESLint
bun run format             # Run Prettier
bun run lint:fix           # Fix linting issues

🏗️ Architecture

Client Architecture

Ito is built as a modern Electron application with a sophisticated multi-process architecture:

Main Process: Handles system integration, permissions, and native component coordination
Renderer Process: React-based UI with real-time audio visualization
Preload Scripts: Secure IPC bridge between main and renderer processes
Native Components: High-performance Rust binaries for audio capture and keyboard handling

Technology Stack

Frontend:

Electron - Cross-platform desktop framework
React 19 - Modern UI library with concurrent features
TypeScript - Type-safe development
TailwindCSS - Utility-first styling
Zustand - Lightweight state management
Framer Motion - Smooth animations

Backend:

Node.js - Runtime environment
gRPC - High-performance RPC for transcription services
SQLite - Local data storage
Protocol Buffers - Efficient data serialization

Native Components:

Rust - System-level audio recording and keyboard event handling
Swift - macOS-specific text manipulation and accessibility features
cpal - Cross-platform audio library
enigo - Cross-platform input simulation

Infrastructure:

AWS CDK - Infrastructure as code
Docker - Containerized deployments
Auth0 - Authentication and user management

Communication Flow

graph TD
    A[User Holds Trigger Key] --> B[Global Key Listener]
    B --> C[Main Process]
    C --> D[Audio Recorder Service]
    D --> E[gRPC Transcription Service]
    E --> F[AI Transcription Model]
    F --> G[Transcribed Text]
    G --> H[Text Writer Service]
    H --> I[Active Text Field]

🔧 Configuration

Keyboard Shortcuts

Customize your trigger keys in Settings > Keyboard:

Single key: Space, Fn, etc.
Key combinations: Cmd + Space, Ctrl + Shift + V, etc.
Complex shortcuts: Fn + Cmd + Space for advanced workflows

Audio Settings

Fine-tune audio capture in Settings > Audio:

Microphone selection: Choose from available input devices
Sensitivity adjustment: Optimize for your voice and environment
Noise reduction: Filter background noise automatically
Audio feedback: Enable/disable sound effects

Privacy & Data

Control your data in Settings > General:

Local processing: Keep voice data on your device
Cloud sync: Synchronize settings across devices
Analytics: Share anonymous usage data (optional)
Data export: Download your notes and interaction history

🔒 Privacy & Security

Data Handling

Local-enabled: Voice processing can be done entirely on your device or using our cloud
Encrypted transmission: All network communication uses TLS encryption
Minimal data collection: Only essential data is processed and stored
User control: Full control and transparency over data retention and deletion

Permissions

Ito requires specific system permissions to function:

Microphone Access: To capture your voice for transcription
Accessibility Access: To detect keyboard shortcuts and insert text
Network Access: For cloud features and updates (optional)

Open Source

This project is open source under the GNU General Public License. You can:

Audit the source code for security and privacy
Contribute improvements and bug fixes
Fork and customize for your specific needs
Report security issues through responsible disclosure

🤝 Contributing

We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation, your help makes Ito better for everyone.

Getting Started

Fork the repository and clone your fork
Create a feature branch from dev
Make your changes with clear commit messages
Test thoroughly across supported platforms
Submit a pull request with a detailed description

Development Guidelines

Code Style: Use Prettier and ESLint configurations
Type Safety: Maintain strong TypeScript typing
Testing: Add tests for new features
Documentation: Update docs for API changes
Performance: Consider impact on time between recording and text insertion

Areas for Contribution

Accuracy improvements: Better transcription algorithms
Language support: Additional language models
UI/UX enhancements: Better user experience
Platform support: Windows stability testing, Linux compatibility
Documentation: Tutorials, guides, and examples

📄 License

This project is licensed under the GNU General Public License - see the LICENSE file for details.

🙏 Acknowledgments

Ito is built with and inspired by amazing open source projects:

Electron React App by @guasam - The foundational template that provided our modern Electron + React architecture
Electron - Cross-platform desktop apps with web technologies
React - Modern UI development
Rust - Systems programming language for native components
gRPC - High-performance RPC framework
TailwindCSS - Utility-first CSS framework

📞 Support

Community: GitHub Discussions
Issues: GitHub Issues
Website: heyito.ai

For Tasks:

Click tags to check more tools for each tasks

write emails take notes compose documents chat messaging code editing

For Jobs:

transcriptionist content writer software developer translator journalist

Alternative AI tools for ito

Similar Open Source Tools

ito

github

: 208

pluely

Pluely is a versatile and user-friendly tool for managing tasks and projects. It provides a simple interface for creating, organizing, and tracking tasks, making it easy to stay on top of your work. With features like task prioritization, due date reminders, and collaboration options, Pluely helps individuals and teams streamline their workflow and boost productivity. Whether you're a student juggling assignments, a professional managing multiple projects, or a team coordinating tasks, Pluely is the perfect solution to keep you organized and efficient.

github

: 687

ToolNeuron

ToolNeuron is a secure, offline AI ecosystem for Android devices that allows users to run private AI models and dynamic plugins fully offline, with hardware-grade encryption ensuring maximum privacy. It enables users to have an offline-first experience, add capabilities without app updates through pluggable tools, and ensures security by design with strict plugin validation and sandboxing.

github

: 58

AGiXT

AGiXT is a dynamic Artificial Intelligence Automation Platform engineered to orchestrate efficient AI instruction management and task execution across a multitude of providers. Our solution infuses adaptive memory handling with a broad spectrum of commands to enhance AI's understanding and responsiveness, leading to improved task completion. The platform's smart features, like Smart Instruct and Smart Chat, seamlessly integrate web search, planning strategies, and conversation continuity, transforming the interaction between users and AI. By leveraging a powerful plugin system that includes web browsing and command execution, AGiXT stands as a versatile bridge between AI models and users. With an expanding roster of AI providers, code evaluation capabilities, comprehensive chain management, and platform interoperability, AGiXT is consistently evolving to drive a multitude of applications, affirming its place at the forefront of AI technology.

github

: 3.1k

ComfyUI_Yvann-Nodes

ComfyUI_Yvann-Nodes is a pack of custom nodes that enable audio reactivity within ComfyUI, allowing users to create AI-driven animations that sync with music. Users can generate audio reactive AI videos, control AI generation styles, content, and composition with any audio input. The tool is simple to use by dropping workflows in ComfyUI and specifying audio and visual inputs. It is flexible and works with existing ComfyUI AI tech and nodes like IPAdapter, AnimateDiff, and ControlNet. Users can pick workflows for Images → Video or Video → Video, download the corresponding .json file, drop it into ComfyUI, install missing custom nodes, set inputs, and generate audio-reactive animations.

github

: 340

opcode

opcode is a powerful desktop application built with Tauri 2 that serves as a command center for interacting with Claude Code. It offers a visual GUI for managing Claude Code sessions, creating custom agents, tracking usage, and more. Users can navigate projects, create specialized AI agents, monitor usage analytics, manage MCP servers, create session checkpoints, edit CLAUDE.md files, and more. The tool bridges the gap between command-line tools and visual experiences, making AI-assisted development more intuitive and productive.

github

: 15.8k

J.A.R.V.I.S.2.0

J.A.R.V.I.S. 2.0 is an AI-powered assistant designed for voice commands, capable of tasks like providing weather reports, summarizing news, sending emails, and more. It features voice activation, speech recognition, AI responses, and handles multiple tasks including email sending, weather reports, news reading, image generation, database functions, phone call automation, AI-based task execution, website & application automation, and knowledge-based interactions. The assistant also includes timeout handling, automatic input processing, and the ability to call multiple functions simultaneously. It requires Python 3.9 or later and specific API keys for weather, news, email, and AI access. The tool integrates Gemini AI for function execution and Ollama as a fallback mechanism. It utilizes a RAG-based knowledge system and ADB integration for phone automation. Future enhancements include deeper mobile integration, advanced AI-driven automation, improved NLP-based command execution, and multi-modal interactions.

github

: 212

mcp-memory-service

The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.

github

: 724

general_framework

General Framework is a cross-platform library designed to help create apps with a unified codebase using Flutter. It offers features such as cross-platform support, standardized style code, a CLI for easier usage, API integration for bot development, customizable extensions for faster development, and user-friendly information. The library is intended to streamline the app, server, bot, and userbot creation process by providing a comprehensive set of tools and functionalities.

github

: 99

ToolUniverse

ToolUniverse is a collection of 211 biomedical tools designed for Agentic AI, providing access to biomedical knowledge for solving therapeutic reasoning tasks. The tools cover various aspects of drugs and diseases, linked to trusted sources like US FDA-approved drugs since 1939, Open Targets, and Monarch Initiative.

github

: 218

AionUi

AionUi is a user interface library for building modern and responsive web applications. It provides a set of customizable components and styles to create visually appealing user interfaces. With AionUi, developers can easily design and implement interactive web interfaces that are both functional and aesthetically pleasing. The library is built using the latest web technologies and follows best practices for performance and accessibility. Whether you are working on a personal project or a professional application, AionUi can help you streamline the UI development process and deliver a seamless user experience.

github

: 2.2k

oneclick-subtitles-generator

A comprehensive web application for auto-subtitling videos and audio, translating SRT files, generating AI narration with voice cloning, creating background images, and rendering professional subtitled videos. Designed for content creators, educators, and general users who need high-quality subtitle generation and video production capabilities.

github

: 136

llm-apps-java-spring-ai

The 'LLM Applications with Java and Spring AI' repository provides samples demonstrating how to build Java applications powered by Generative AI and Large Language Models (LLMs) using Spring AI. It includes projects for question answering, chat completion models, prompts, templates, multimodality, output converters, embedding models, document ETL pipeline, function calling, image models, and audio models. The repository also lists prerequisites such as Java 21, Docker/Podman, Mistral AI API Key, OpenAI API Key, and Ollama. Users can explore various use cases and projects to leverage LLMs for text generation, vector transformation, document processing, and more.

github

: 634

evi-run

evi-run is a powerful, production-ready multi-agent AI system built on Python using the OpenAI Agents SDK. It offers instant deployment, ultimate flexibility, built-in analytics, Telegram integration, and scalable architecture. The system features memory management, knowledge integration, task scheduling, multi-agent orchestration, custom agent creation, deep research, web intelligence, document processing, image generation, DEX analytics, and Solana token swap. It supports flexible usage modes like private, free, and pay mode, with upcoming features including NSFW mode, task scheduler, and automatic limit orders. The technology stack includes Python 3.11, OpenAI Agents SDK, Telegram Bot API, PostgreSQL, Redis, and Docker & Docker Compose for deployment.

github

: 74

persistent-ai-memory

Persistent AI Memory System is a comprehensive tool that offers persistent, searchable storage for AI assistants. It includes features like conversation tracking, MCP tool call logging, and intelligent scheduling. The system supports multiple databases, provides enhanced memory management, and offers various tools for memory operations, schedule management, and system health checks. It also integrates with various platforms like LM Studio, VS Code, Koboldcpp, Ollama, and more. The system is designed to be modular, platform-agnostic, and scalable, allowing users to handle large conversation histories efficiently.

github

: 138

Zettelgarden

Zettelgarden is a human-centric, open-source personal knowledge management system that helps users develop and maintain their understanding of the world. It focuses on creating and connecting atomic notes, thoughtful AI integration, and scalability from personal notes to company knowledge bases. The project is actively evolving, with features subject to change based on community feedback and development priorities.

github

: 152

For similar tasks

ito

github

: 208

Mindolph

Mindolph is an open source personal knowledge management software for all desktop platforms. It allows users to create and manage their own files in separate workspaces with saving in their local storage, organize their files as a tree in their workspaces, and have multiple tabs for opening files instead of a single file window. Mindolph supports Mind Map, Markdown, PlantUML, CSV sheet, and plain text file formats. It also has features such as quickly navigating to files and searching text in files under a specific folder, editing mind maps easily and quickly with key shortcuts, supporting themes and providing some pre-defined themes, importing from other mind map formats, and exporting to other file formats.

github

: 167

hoarder-app

Hoarder is a self-hostable bookmark manager with a focus on privacy and customization. It features automatic link previews, full-text search, AI-based tagging, and a variety of import and export options. Hoarder is designed to be easy to use and extensible, with a plugin system that allows users to add their own features and integrations.

github

: 1.1k

rocketnotes

Rocketnotes is a web-based Markdown note taking app with LLM-powered text completion, chat and semantic search. It utilizes a 100% serverless RAG pipeline build with langchain, sentence-transformers, faiss and OpenAI or Anthropic API.

github

: 1.3k

nextlint

Nextlint is a rich text editor (WYSIWYG) written in Svelte, using MeltUI headless UI and tailwindcss CSS framework. It is built on top of tiptap editor (headless editor) and prosemirror. Nextlint is easy to use, develop, and maintain. It has a prompt engine that helps to integrate with any AI API and enhance the writing experience. Dark/Light theme is supported and customizable.

github

: 145

reor

Reor is an AI-powered desktop note-taking app that automatically links related notes, answers questions on your notes, and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor. The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Ollama, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. Connecting to OpenAI or OpenAI-compatible APIs like Oobabooga is also supported.

github

: 7.8k

obsidian-companion

Companion is an Obsidian plugin that adds an AI-powered autocomplete feature to your note-taking and personal knowledge management platform. With Companion, you can write notes more quickly and easily by receiving suggestions for completing words, phrases, and even entire sentences based on the context of your writing. The autocomplete feature uses OpenAI's state-of-the-art GPT-3 and GPT-3.5, including ChatGPT, and locally hosted Ollama models, among others, to generate smart suggestions that are tailored to your specific writing style and preferences. Support for more models is planned, too.

github

: 154

uxie

Uxie is a PDF reader app designed to revolutionize the learning experience. It offers features such as annotation, note-taking, collaboration tools, integration with LLM for enhanced learning, and flashcard generation with LLM feedback. Built using Nextjs, tRPC, Zod, TypeScript, Tailwind CSS, React Query, React Hook Form, Supabase, Prisma, and various other tools. Users can take notes, summarize PDFs, chat and collaborate with others, create custom blocks in the editor, and use AI-powered text autocompletion. The tool allows users to craft simple flashcards, test knowledge, answer questions, and receive instant feedback through AI evaluation.

github

: 131

For similar jobs

ChatFAQ

ChatFAQ is an open-source comprehensive platform for creating a wide variety of chatbots: generic ones, business-trained, or even capable of redirecting requests to human operators. It includes a specialized NLP/NLG engine based on a RAG architecture and customized chat widgets, ensuring a tailored experience for users and avoiding vendor lock-in.

github

: 142

anything-llm

AnythingLLM is a full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.

github

: 49.2k

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 668

mikupad

mikupad is a lightweight and efficient language model front-end powered by ReactJS, all packed into a single HTML file. Inspired by the likes of NovelAI, it provides a simple yet powerful interface for generating text with the help of various backends.

github

: 300

glide

Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.

github

: 110

onnxruntime-genai

ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

github

: 831

firecrawl

Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.

github

: 34.1k