vocotype-cli
VocoType 是一款运行在本地端侧的隐私安全语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持 AI 优化文本、自定义替换词典、录音视频转文字等功能,让语音输入更高效、更准确。
Stars: 159
VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.
README:
VocoType 是一款专为注重隐私和效率的专业人士打造的、完全免费的桌面端语音输入法。所有识别均在本地完成,无惧断网,不上传任何数据。
这个 GitHub 项目是 VocoType 核心引擎的 CLI (命令行) 开源版本,主要面向开发者。
开箱即用,功能更完整,无需任何技术背景。
VocoType 是一款智能语音输入工具,通过快捷键即可将语音实时转换为文字并自动输入到当前应用。支持MCP语音转文字、 AI 优化文本、自定义替换词典等功能,让语音输入更高效、更准确。
| OS | Download |
|---|---|
| Windows | |
| macOS |
|
| 特性 | ✅ VocoType | 传统云端输入法 | 操作系统自带 |
|---|---|---|---|
| 隐私安全 | 本地离线,绝不上传 | ❌ 数据需上传云端 | |
| 网络依赖 | 完全无需联网 | ❌ 必须联网使用 | ❌ 强依赖网络 |
| 响应速度 | 0.1 秒级 | 慢,受网速影响 | 慢,受网速影响 |
| 定制化能力 | 强大的自定义词表 | 弱或无 | 基本没有 |
- 完整的图形用户界面:开箱即用,所有操作清晰直观。
- 系统级全局输入:在任何软件、任何文本框内都能直接语音输入。
- 自定义词典:支持添加 20 个常用术语、人名,提升识别准确率。
- 100% 离线运行:绝对的隐私和数据安全。
- 旗舰级识别引擎:精准识别中英混合内容。
- AI 智能优化:支持选择多种 AI 模型,通过可定制的 Prompt 模板自动修正语音转录中的错别字、同音字和自我修正,智能识别口语中的修正指令(如"不对"、"改成"等),让输出文本更准确流畅。
(对于有更高需求的专业用户,应用内提供了升级到 Pro 版的选项,以解锁无限词典等高级功能。)
无论是文字工作者、律师、学者、游戏玩家,还是日常办公,VocoType 都能成为您值得信赖的效率伙伴。
| 用户 | 场景 |
|---|---|
| 作家与创作者 | 撰写文章、小说,整理会议纪要,让思绪通过语音即时转化为文字,心无旁骛,专注于创作本身。 |
| 法律 & 医疗人士 | 处理高度敏感的客户信息或病历时,100%离线确保数据安全。自定义词表更能轻松驾驭行业术语。 |
| 学生与学者 | 快速记录课堂笔记、整理访谈录音、撰写学术论文。告别繁琐的打字,将更多精力投入到思考与研究之中。 |
| 开发者 & 程序员 | 无论是与 AI 结对编程,还是撰写技术文档,都能精准识别 function、Kubernetes pod 等专业术语。 |
| 游戏玩家 | 在激烈的游戏对战中,通过语音快速打字与队友交流,无需停下操作,保持游戏节奏,提升团队协作效率。 |
所有 VocoType 版本共享同一个强大的核心引擎。
- 🛡️ 100% 离线,隐私无忧:所有语音识别在您的电脑本地完成。
- ⚡️ 旗舰级识别引擎:中英混合输入同样精准,告别反复修改。
- ⚙️ 高度可定制:独创的替换词表功能,让人名、地名、行业术语一次就对。
- 💻 轻量化设计:仅需 700MB 内存,纯 CPU 推理,无需昂贵显卡。
- 🚀 0.1 秒级响应:感受所言即所得的畅快,让您的灵感不再因等待而中断。
请注意: 此版本面向有一定技术背景的开发者。如果您不熟悉命令行,我们强烈建议您访问官网,下载简单易用的 VocoType 免费桌面版。
- Python 3.12
- 我们强烈建议使用
uv或venv创建虚拟环境。
# 1. 克隆仓库
git clone https://github.com/233stone/vocotype-cli.git
cd vocotype-cli
# 2. (推荐) 创建并激活虚拟环境
pip install uv
uv venv --python 3.12
source .venv/bin/activate # macOS/Linux
# 或者 .\.venv\Scripts\activate (Windows)
# 3. 安装依赖
uv pip install -r requirements.txt
# 4. 运行
python main.py
# 保存数据集运行
python main.py --save-dataset模型下载:首次运行时,程序会自动下载约 500MB 的模型文件,请确保网络连接稳定。
Q: 我的数据安全吗?
A: 100%安全。所有语音识别均在本地离线完成,您的音频数据不会上传到任何服务器。
- Bug 与建议:请优先使用 GitHub Issues。
- 关注我们获取最新动态:https://vocotype.com
VocoType 的诞生离不开以下优秀的开源项目:
感谢这些开源社区的无私贡献!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for vocotype-cli
Similar Open Source Tools
vocotype-cli
VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.
Saber-Translator
Saber-Translator is your exclusive AI comic translation tool, designed to effortlessly eliminate language barriers and enjoy the original comic fun. It offers features like translating comic images/PDFs, intelligent bubble detection and text recognition, powerful AI translation engine with multiple service providers, highly customizable translation effects, real-time preview and convenient operations, efficient image management and download, model recording and recommendation, and support for language learning with dual prompt word outputs.
Snap-Solver
Snap-Solver is a revolutionary AI tool for online exam solving, designed for students, test-takers, and self-learners. With just a keystroke, it automatically captures any question on the screen, analyzes it using AI, and provides detailed answers. Whether it's complex math formulas, physics problems, coding issues, or challenges from other disciplines, Snap-Solver offers clear, accurate, and structured solutions to help you better understand and master the subject matter.
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
chatless
Chatless is a modern AI chat desktop application built on Tauri and Next.js. It supports multiple AI providers, can connect to local Ollama models, supports document parsing and knowledge base functions. All data is stored locally to protect user privacy. The application is lightweight, simple, starts quickly, and consumes minimal resources.
ai_quant_trade
The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.
Operit
Operit AI is a fully functional AI assistant application for mobile devices, running independently on Android devices with powerful tool invocation capabilities. It offers over 40 built-in tools for file system operations, HTTP requests, system operations, UI automation, and media processing. The app combines these tools with rich plugins to enable a wide range of tasks, from simple to complex, providing a comprehensive experience of a smartphone AI assistant.
ChatGPT-Next-Web-Pro
ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.
vscode-antigravity-cockpit
VS Code extension for monitoring Google Antigravity AI model quotas. It provides a webview dashboard, QuickPick mode, quota grouping, automatic grouping, renaming, card view, drag-and-drop sorting, status bar monitoring, threshold notifications, and privacy mode. Users can monitor quota status, remaining percentage, countdown, reset time, progress bar, and model capabilities. The extension supports local and authorized quota monitoring, multiple account authorization, and model wake-up scheduling. It also offers settings customization, user profile display, notifications, and group functionalities. Users can install the extension from the Open VSX Marketplace or via VSIX file. The source code can be built using Node.js and npm. The project is open-source under the MIT license.
AutoGLM-GUI
AutoGLM-GUI is an AI-driven Android automation productivity tool that supports scheduled tasks, remote deployment, and 24/7 AI assistance. It features core functionalities such as deploying to servers, scheduling tasks, and creating an AI automation assistant. The tool enhances productivity by automating repetitive tasks, managing multiple devices, and providing a layered agent mode for complex task planning and execution. It also supports real-time screen preview, direct device control, and zero-configuration deployment. Users can easily download the tool for Windows, macOS, and Linux systems, and can also install it via Python package. The tool is suitable for various use cases such as server automation, batch device management, development testing, and personal productivity enhancement.
aio-hub
AIO Hub is a cross-platform AI hub built on Tauri + Vue 3 + TypeScript, aiming to provide developers and creators with precise LLM control experience and efficient toolchain. It features a chat function designed for complex tasks and deep exploration, a unified context pipeline for controlling every token sent to the model, interactive AI buttons, dual-view management for non-linear conversation mapping, open ecosystem compatibility with various AI models, and a rich text renderer for LLM output. The tool also includes features for media workstation, developer productivity, system and asset management, regex applier, collaboration enhancement between developers and AI, and more.
bailing
Bailing is an open-source voice assistant designed for natural conversations with users. It combines Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Large Language Model (LLM), and Text-to-Speech (TTS) technologies to provide a high-quality voice interaction experience similar to GPT-4o. Bailing aims to achieve GPT-4o-like conversation effects without the need for GPU, making it suitable for various edge devices and low-resource environments. The project features efficient open-source models, modular design allowing for module replacement and upgrades, support for memory function, tool integration for information retrieval and task execution via voice commands, and efficient task management with progress tracking and reminders.
torra-community
Torra Community Edition is a modern AI workflow and intelligent agent visualization editor based on Nuxt 4. It offers a lightweight but production-ready architecture with frontend VueFlow + Tailwind v4 + shadcn/ui, backend FeathersJS, and built-in LangChain.js runtime. It supports multiple databases (SQLite/MySQL/MongoDB) and local ↔ cloud hot switching. The tool covers various tasks such as visual workflow editing, modern UI, native integration of LangChain.js, pluggable storage options, full-stack TypeScript implementation, and more. It is designed for enterprises looking for an easy-to-deploy and scalable solution for AI workflows.
push-2-talk
PushToTalk is a high-performance desktop voice input tool with large language model (LLM) capabilities. It supports two working modes: dictation mode and AI assistant mode. The tool offers features like real-time transcription, LLM intelligent post-processing, custom hotkeys, multiple ASR engines support, visual feedback, audio feedback, history records, system tray support, automatic updates, multiple configuration management, personal glossary, automatic glossary learning, LLM configuration center, theme switching, mute during recording, VAD silence detection, AGC automatic gain, multi-screen support, and more.
MaiMBot
MaiMBot is an intelligent QQ group chat bot based on a large language model. It is developed using the nonebot2 framework, utilizes LLM for conversation abilities, MongoDB for data persistence, and NapCat for QQ protocol support. The bot features keyword-triggered proactive responses, dynamic prompt construction, support for images and message forwarding, typo generation, multiple replies, emotion-based emoji responses, daily schedule generation, user relationship management, knowledge base, and group impressions. Work-in-progress features include personality, group atmosphere, image handling, humor, meme functions, and Minecraft interactions. The tool is in active development with plans for GIF compatibility, mini-program link parsing, bug fixes, documentation improvements, and logic enhancements for emoji sending.
For similar tasks
vocotype-cli
VocoType is a free desktop voice input method designed for professionals who value privacy and efficiency. All recognition is done locally, ensuring offline operation and no data upload. The CLI open-source version of the VocoType core engine on GitHub is mainly targeted at developers.
Dictate
Dictate is an easy-to-use keyboard tool that utilizes OpenAI Whisper for transcription and dictation. It offers accurate results in multiple languages with punctuation and custom AI rewording using GPT-4 Omni. The tool is designed to simplify the process of transcribing spoken words into text, making it convenient for users to dictate their thoughts and notes effortlessly.
noScribe
noScribe is an AI-based software designed for automated audio transcription, specifically tailored for transcribing interviews for qualitative social research or journalistic purposes. It is a free and open-source tool that runs locally on the user's computer, ensuring data privacy. The software can differentiate between speakers and supports transcription in 99 languages. It includes a user-friendly editor for reviewing and correcting transcripts. Developed by Kai Dröge, a PhD in sociology with a background in computer science, noScribe aims to streamline the transcription process and enhance the efficiency of qualitative analysis.
AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.
NotelyVoice
Notely Voice is a free, modern, cross-platform AI voice transcription and note-taking application. It offers powerful Whisper AI Voice to Text capabilities, making it ideal for students, professionals, doctors, researchers, and anyone in need of hands-free note-taking. The app features rich text editing, simple search, smart filtering, organization with folders and tags, advanced speech-to-text, offline capability, seamless integration, audio recording, theming, cross-platform support, and sharing functionality. It includes memory-efficient audio processing, chunking configuration, and utilizes OpenAI Whisper for speech recognition technology. Built with Kotlin, Compose Multiplatform, Coroutines, Android Architecture, ViewModel, Koin, Material 3, Whisper AI, and Native Compose Navigation, Notely follows Android Architecture principles with distinct layers for UI, presentation, domain, and data.
sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.
ai_paper
The AI Paper tool is a powerful platform for generating various types of academic papers, including graduation papers, course papers, journal papers, title papers, opening reports, task books, AIGC reduction, and custom literature. It supports unlimited revisions, AI 4.0 technology, AIGC rate reduction, custom references, outlines, feeding of materials, various types of charts and tables, multiple languages, and anonymous mode. The tool offers a wide range of paper formats and supports tasks like writing papers, assignments, internship reports, job planning, and long-form writing.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
sourcegraph
Sourcegraph is a code search and navigation tool that helps developers read, write, and fix code in large, complex codebases. It provides features such as code search across all repositories and branches, code intelligence for navigation and refactoring, and the ability to fix and refactor code across multiple repositories at once.
open-webui
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our Open WebUI Documentation.
ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.
litgpt
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs **on your own data**. It features highly-optimized training recipes for the world's most powerful open-source large-language-models (LLMs).
khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.
chronon
Chronon is a platform that simplifies and improves ML workflows by providing a central place to define features, ensuring point-in-time correctness for backfills, simplifying orchestration for batch and streaming pipelines, offering easy endpoints for feature fetching, and guaranteeing and measuring consistency. It offers benefits over other approaches by enabling the use of a broad set of data for training, handling large aggregations and other computationally intensive transformations, and abstracting away the infrastructure complexity of data plumbing.
rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool that helps you conduct experiments and evaluations using Azure AI Search and RAG pattern. It offers a rich set of features, including experiment setup, integration with Azure AI Search, Azure Machine Learning, MLFlow, and Azure OpenAI, multiple document chunking strategies, query generation, multiple search types, sub-querying, re-ranking, metrics and evaluation, report generation, and multi-lingual support. The tool is designed to make it easier and faster to run experiments and evaluations of search queries and quality of response from OpenAI, and is useful for researchers, data scientists, and developers who want to test the performance of different search and OpenAI related hyperparameters, compare the effectiveness of various search strategies, fine-tune and optimize parameters, find the best combination of hyperparameters, and generate detailed reports and visualizations from experiment results.