
chatgpt-subtitle-translator
Efficient translation tool based on ChatGPT or any OpenAI compatible LLM chat completion API
Stars: 347

This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.
README:
ChatGPT has also demonstrated its capabilities as a robust translator, capable of handling not just common languages, but also unconventional forms of writing like emojis and word scrambling. However, it may not always produce a deterministic output or adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and with the model temperature
parameter set to 0
.
This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead and grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.
Web Interface: https://cerlancism.github.io/chatgpt-subtitle-translator
- Web User Interface (Web UI) and Command Line Interface (CLI)
-
New: Supports Structured Output: for more concise results, available in the Web UI and in CLI with
--experimental-structured-mode
-
New: Supports Prompt Caching: by including the full context of translated data, the system instruction and translation context are packaged to work well with prompt caching, enabled with
--experimental-use-full-context
(CLI only) - Supports any OpenAI API compatible providers such as running Ollama locally
- Line-based batching: avoids token limit per request, reduces overhead token wastage, and maintains translation context to a certain extent
- Checks with the free OpenAI Moderation tool: prevents token wastage if the model is highly likely to refuse to translate
- Streaming process output
- Request per minute (RPM) rate limits
- Progress resumption (CLI only)
Reference: https://github.com/openai/openai-quickstart-node#setup
- Node.js version
>= 16.13.0
required. This README assumesbash
shell environment - Clone this repository and
git clone https://github.com/Cerlancism/chatgpt-subtitle-translator
- Navigate into the directory
cd chatgpt-subtitle-translator
- Install the requirements
npm install
- Give executable permission
chmod +x cli/translator.mjs
- Copy
.example.env
to.env
cp .env.example .env
- Add your API key to the newly created
.env
file- (Optional) Set rate limits: https://platform.openai.com/docs/guides/rate-limits/overview
cli/translator.mjs --help
Usage: translator [options]
Translation tool based on ChatGPT API
Options:
-
--from <language>
Source language (default: "") -
--to <language>
Target language (default: "English") -
-i, --input <file>
Input source text with the content of this file, in.srt
format or plain text -
-o, --output <file>
Output file name, defaults to be based on input file name -
-p, --plain-text <text>
Input source text with this plain text argument -
-s, --system-instruction <instruction>
Override the prompt system instruction templateTranslate ${from} to ${to}
with this plain text, ignoring--from
and--to
options -
--initial-prompts <prompts>
Initial prompts for the translation in JSON (default:"[]"
) -
--no-use-moderator
Don't use the OpenAI API Moderation endpoint -
--moderation-model
(default:"omni-moderation-latest"
) https://platform.openai.com/docs/models/moderation -
--no-prefix-number
Don't prefix lines with numerical indices -
--no-line-matching
Don't enforce one to one line quantity input output matching -
-l, --history-prompt-length <length>
Length of prompt history to retain for next request batch (default: 10) -
-b, --batch-sizes <sizes>
Batch sizes of increasing order for translation prompt slices in JSON Array (default:"[10,100]"
)The number of lines to include in each translation prompt, provided that they are estimated to be within the token limit.
In case of mismatched output line quantities, this number will be decreased step by step according to the values in the array, ultimately reaching one.Larger batch sizes generally lead to more efficient token utilization and potentially better contextual translation.
However, mismatched output line quantities or exceeding the token limit will cause token wastage, requiring resubmission of the batch with a smaller batch size. -
--experimental-structured-mode [mode]
Enable structured response. (default:array
, choicesarray
)-
--experimental-structured-mode array
Structures the input and output into a plain array format. This option is more concise compared to base mode, though it uses slightly more tokens per batch.
-
-
--experimental-use-full-context
Include the full context of translated data to work well with prompt caching.The translated lines per user and assistant message pairs are sliced as defined by
--history-prompt-length
(by default--history-prompt-length 10
), it is recommended to set this to the largest batch size (by default--batch-sizes "[10,100]"
):--history-prompt-length 100
.Enabling this may risk running into the model's context window limit, typically
128K
, but should be sufficient for most cases. -
--log-level <level>
Log level (default:debug
, choices:trace
,debug
,info
,warn
,error
,silent
) -
--silent
Same as--log-level silent
-
--quiet
Same as--log-level silent
Additional Options for GPT:
-
-m, --model <model>
(default:"gpt-4o-mini"
) https://platform.openai.com/docs/api-reference/chat/create -
--stream
Stream progress output to terminal https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream -
-t, --temperature <temperature>
Sampling temperature to use, should set a low value below0.3
to be more deterministic for translation (default:1
) https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature -
--top_p <top_p>
Nucleus sampling parameter, top_p probability mass https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p -
--presence_penalty <presence_penalty>
Penalty for new tokens based on their presence in the text so far https://platform.openai.com/docs/api-reference/chat/create#chat-create-presence_penalty -
--frequency_penalty <frequency_penalty
Penalty for new tokens based on their frequency in the text so far https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty -
--logit_bias <logit_bias>
Modify the likelihood of specified tokens appearing in the completion https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias
cli/translator.mjs --plain-text "δ½ ε₯½"
Standard Output
Hello.
cli/translator.mjs --stream --to "Emojis" --temperature 0 --plain-text "$(curl 'https://api.chucknorris.io/jokes/0ECUwLDTTYSaeFCq6YMa5A' | jq .value)"
Input Argument
Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'.
Standard Output
π¨βπ¦°πͺπΆββοΈπ¦πππ
ππππππ¦ππ’ππΏοΈππΏοΈβοΈπ³π¬π²ππ€΅π¨βπ¦°π=ππππ¦ππ¦π¦π¦§π¦π
π¦π¦π¦ππ¦ππππ¦=ππ€΅.
cli/translator.mjs --stream --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Chuck Norris can walk with the animals, talk with the animals;"
Standard Output
Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;
cli/translator.mjs --stream --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"
Standard Output
Chuck Norris can walk with the animals, talk with the animals;
cli/translator.mjs --stream --temperature 0 --input test/data/test_cn.txt
Input file: test/data/test_cn.txt
δ½ ε₯½γ
ζζοΌ
Standard Output
Hello.
Goodbye!
cli/translator.mjs --stream --temperature 0 --input test/data/test_ja_small.srt
Input file: test/data/test_ja_small.srt
1
00:00:00,000 --> 00:00:02,000
γγ―γγγγγγΎγγ
2
00:00:02,000 --> 00:00:05,000
γε
ζ°γ§γγοΌ
3
00:00:05,000 --> 00:00:07,000
γ―γγε
ζ°γ§γγ
4
00:00:08,000 --> 00:00:12,000
δ»ζ₯γ―倩ζ°γγγγ§γγγ
5
00:00:12,000 --> 00:00:16,000
γ―γγγ¨γ¦γγγ倩ζ°γ§γγ
Output file: test/data/test_ja_small.srt.out_English.srt
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.
System Instruction
Tokens: 4
Translate to English
Input | Prompt | Transform | Output |
---|---|---|---|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
γγ―γγγγγγΎγγ
2
00:00:02,000 --> 00:00:05,000
γε
ζ°γ§γγοΌ
3
00:00:05,000 --> 00:00:07,000
γ―γγε
ζ°γ§γγ
4
00:00:08,000 --> 00:00:12,000
δ»ζ₯γ―倩ζ°γγγγ§γγγ
5
00:00:12,000 --> 00:00:16,000
γ―γγγ¨γ¦γγγ倩ζ°γ§γγ |
|
|
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |
Lines | SRT Text Format | No Batching | ChatGPT Subtitle Translator |
---|---|---|---|
5 | 280 | 469 | 133 |
10 | 511 | 834 | 171 |
50 | 2,518 | 7,944 | 818 |
100 | 5,011 | 15,263 | 1,611 |
500 | 25,400 | 79,297 | 9,025 |
1000 | 52,988 | 184,596 | 20,593 |
Test data can be found in the test/data directory. Token count also roughly includes message payload structure and prompt token overheads.
SRT Text Format: Full SRT text format, including timestamps for input/output.
No Batching: SRT formatting and timestamp stripping, but one line per prompt with system instruction overhead, including up to 10
historical entries for context per prompt.
ChatGPT Subtitle Translator: SRT formatting and timestamp stripping, with line batching of 100
, including up to 10
historical entries for context per batch.
This analysis assumes perfect input/output quantity matching. In reality, this depends on model and subtitle quality. Typically, buffer an additional 20%~30% token usage for retries, refer to the --batch-sizes
CLI option.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for chatgpt-subtitle-translator
Similar Open Source Tools

chatgpt-subtitle-translator
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.

nano-graphrag
nano-GraphRAG is a simple, easy-to-hack implementation of GraphRAG that provides a smaller, faster, and cleaner version of the official implementation. It is about 800 lines of code, small yet scalable, asynchronous, and fully typed. The tool supports incremental insert, async methods, and various parameters for customization. Users can replace storage components and LLM functions as needed. It also allows for embedding function replacement and comes with pre-defined prompts for entity extraction and community reports. However, some features like covariates and global search implementation differ from the original GraphRAG. Future versions aim to address issues related to data source ID, community description truncation, and add new components.

airbase
Airbase is a Maven project management tool that provides a parent pom structure and conventions for defining new projects. It includes guidelines for project pom structure, deployment to Maven Central, project build and checkers, well-known dependencies, and other properties. Airbase helps in enforcing build configurations, organizing project pom files, and running various checkers to catch problems early in the build process. It also offers default properties that can be overridden in the project pom.

datadreamer
DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

gcop
GCOP (Git Copilot) is an AI-powered Git assistant that automates commit message generation, enhances Git workflow, and offers 20+ smart commands. It provides intelligent commit crafting, customizable commit templates, smart learning capabilities, and a seamless developer experience. Users can generate AI commit messages, add all changes with AI-generated messages, undo commits while keeping changes staged, and push changes to the current branch. GCOP offers configuration options for AI models and provides detailed documentation, contribution guidelines, and a changelog. The tool is designed to make version control easier and more efficient for developers.

olah
Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.

ChatDBG
ChatDBG is an AI-based debugging assistant for C/C++/Python/Rust code that integrates large language models into a standard debugger (`pdb`, `lldb`, `gdb`, and `windbg`) to help debug your code. With ChatDBG, you can engage in a dialog with your debugger, asking open-ended questions about your program, like `why is x null?`. ChatDBG will _take the wheel_ and steer the debugger to answer your queries. ChatDBG can provide error diagnoses and suggest fixes. As far as we are aware, ChatDBG is the _first_ debugger to automatically perform root cause analysis and to provide suggested fixes.

detoxify
Detoxify is a library that provides trained models and code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. It includes models like 'original', 'unbiased', and 'multilingual' trained on different datasets to detect toxicity and minimize bias. The library aims to help in stopping harmful content online by interpreting visual content in context. Users can fine-tune the models on carefully constructed datasets for research purposes or to aid content moderators in flagging out harmful content quicker. The library is built to be user-friendly and straightforward to use.

chatgpt-cli
ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.

python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.

cheating-based-prompt-engine
This is a vulnerability mining engine purely based on GPT, requiring no prior knowledge base, no fine-tuning, yet its effectiveness can overwhelmingly surpass most of the current related research. The core idea revolves around being task-driven, not question-driven, driven by prompts, not by code, and focused on prompt design, not model design. The essence is encapsulated in one word: deception. It is a type of code understanding logic vulnerability mining that fully stimulates the capabilities of GPT, suitable for real actual projects.

ChatSim
ChatSim is a tool designed for editable scene simulation for autonomous driving via LLM-Agent collaboration. It provides functionalities for setting up the environment, installing necessary dependencies like McNeRF and Inpainting tools, and preparing data for simulation. Users can train models, simulate scenes, and track trajectories for smoother and more realistic results. The tool integrates with Blender software and offers options for training McNeRF models and McLight's skydome estimation network. It also includes a trajectory tracking module for improved trajectory tracking. ChatSim aims to facilitate the simulation of autonomous driving scenarios with collaborative LLM-Agents.

orbiton
Orbiton is a text editor and simple IDE designed with minimal annoyance in mind, not highly configurable to help users stay focused, and supports rapid edit-format-compile cycles. It is suitable for writing git commit messages, editing README.md and TODO.md files, writing Markdown and exporting to HTML or PDF, learning programming languages, editing files within larger projects, solving Advent of Code tasks, and providing a distraction-free environment for writing. The tool offers unique features like smart cursor movement, paste and copy shortcuts, portal for copying lines across files, code building and formatting shortcuts, and more.

trickPrompt-engine
This repository contains a vulnerability mining engine based on GPT technology. The engine is designed to identify logic vulnerabilities in code by utilizing task-driven prompts. It does not require prior knowledge or fine-tuning and focuses on prompt design rather than model design. The tool is effective in real-world projects and should not be used for academic vulnerability testing. It supports scanning projects in various languages, with current support for Solidity. The engine is configured through prompts and environment settings, enabling users to scan for vulnerabilities in their codebase. Future updates aim to optimize code structure, add more language support, and enhance usability through command line mode. The tool has received a significant audit bounty of $50,000+ as of May 2024.

QA-Pilot
QA-Pilot is an interactive chat project that leverages online/local LLM for rapid understanding and navigation of GitHub code repository. It allows users to chat with GitHub public repositories using a git clone approach, store chat history, configure settings easily, manage multiple chat sessions, and quickly locate sessions with a search function. The tool integrates with `codegraph` to view Python files and supports various LLM models such as ollama, openai, mistralai, and localai. The project is continuously updated with new features and improvements, such as converting from `flask` to `fastapi`, adding `localai` API support, and upgrading dependencies like `langchain` and `Streamlit` to enhance performance.

k8sgpt
K8sGPT is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.
For similar tasks

chatgpt-subtitle-translator
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.

gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.

TeroSubtitler
Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.

AiNiee
AiNiee is a tool focused on AI translation, capable of automatically translating RPG SLG games, Epub TXT novels, Srt Lrc subtitles, and more. It provides features for configuring AI platforms, proxies, and translation settings. Users can utilize this tool for translating game scripts, novels, and subtitles efficiently. The tool supports multiple AI platforms and offers tutorials for beginners. It also includes functionalities for extracting and translating game text, with options for customizing translation projects and managing translation tasks effectively.

video2blog
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.

auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.

Srt-AI-Voice-Assistant
Srt-AI-Voice-Assistant is a convenient tool that generates audio from uploaded .srt subtitle files by calling APIs such as Bert-VITS2 (HiyoriUI), GPT-SoVITS, and Microsoft TTS (online). The code is currently not perfect, and feedback on bugs or suggestions can be provided at https://github.com/YYuX-1145/Srt-AI-Voice-Assistant/issues. Recent updates include adding custom API functionality with a focus on security, support for Microsoft online TTS (requires key configuration), error handling improvements, automatic project path detection, compatibility with API-v1 for limited functionality, and significant feature updates supporting card synthesis.

VideoCaptioner
VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.
For similar jobs

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.