chatgpt-subtitle-translator
Efficient translation tool based on ChatGPT or any OpenAI compatible LLM chat completion API
Stars: 370
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.
README:
ChatGPT has also demonstrated its capabilities as a robust translator, capable of handling not just common languages, but also unconventional forms of writing like emojis and word scrambling. However, it may not always produce a deterministic output or adhere to line-to-line correlation, potentially disrupting the timing of subtitles, even when instructed to follow precise instructions and with the model temperature parameter set to 0.
This utility uses the OpenAI ChatGPT API to translate text, with a specific focus on line-based translation, especially for SRT subtitles. The translator optimizes token usage by removing SRT overhead and grouping text into batches, resulting in arbitrary length translations without excessive token consumption while ensuring a one-to-one match between line input and output.
Web Interface: https://cerlancism.github.io/chatgpt-subtitle-translator
- Web User Interface (Web UI) and Command Line Interface (CLI)
-
New: Supports Structured Output: for more concise results, available in the Web UI and in CLI with
--experimental-structured-mode -
New: Supports Prompt Caching: by including the full context of translated data, the system instruction and translation context are packaged to work well with prompt caching, enabled with
--experimental-use-full-context(CLI only) - Supports any OpenAI API compatible providers such as running Ollama locally
- Line-based batching: avoids token limit per request, reduces overhead token wastage, and maintains translation context to a certain extent
- Checks with the free OpenAI Moderation tool: prevents token wastage if the model is highly likely to refuse to translate
- Streaming process output
- Request per minute (RPM) rate limits
- Progress resumption (CLI only)
Reference: https://github.com/openai/openai-quickstart-node#setup
- Node.js version
>= 16.13.0required. This README assumesbashshell environment - Clone this repository and
git clone https://github.com/Cerlancism/chatgpt-subtitle-translator
- Navigate into the directory
cd chatgpt-subtitle-translator - Install the requirements
npm install
- Give executable permission
chmod +x cli/translator.mjs
- Copy
.example.envto.envcp .env.example .env
- Add your API key to the newly created
.envfile- (Optional) Set rate limits: https://platform.openai.com/docs/guides/rate-limits/overview
cli/translator.mjs --help
Usage: translator [options]
Translation tool based on ChatGPT API
Options:
-
--from <language>
Source language (default: "") -
--to <language>
Target language (default: "English") -
-i, --input <file>
Input source text with the content of this file, in.srtformat or plain text -
-o, --output <file>
Output file name, defaults to be based on input file name -
-p, --plain-text <text>
Input source text with this plain text argument -
-s, --system-instruction <instruction>
Override the prompt system instruction templateTranslate ${from} to ${to}with this plain text, ignoring--fromand--tooptions -
--initial-prompts <prompts>
Initial prompts for the translation in JSON (default:"[]") -
--no-use-moderator
Don't use the OpenAI API Moderation endpoint -
--moderation-model
(default:"omni-moderation-latest") https://platform.openai.com/docs/models/moderation -
--no-prefix-number
Don't prefix lines with numerical indices -
--no-line-matching
Don't enforce one to one line quantity input output matching -
-l, --history-prompt-length <length>
Length of prompt history to retain for next request batch (default: 10) -
-b, --batch-sizes <sizes>Batch sizes of increasing order for translation prompt slices in JSON Array (default:"[10,100]")The number of lines to include in each translation prompt, provided that they are estimated to be within the token limit.
In case of mismatched output line quantities, this number will be decreased step by step according to the values in the array, ultimately reaching one.Larger batch sizes generally lead to more efficient token utilization and potentially better contextual translation.
However, mismatched output line quantities or exceeding the token limit will cause token wastage, requiring resubmission of the batch with a smaller batch size. -
--experimental-structured-mode [mode]
Enable structured response. (default:array, choicesarray)-
--experimental-structured-mode arrayStructures the input and output into a plain array format. This option is more concise compared to base mode, though it uses slightly more tokens per batch.
-
-
--experimental-use-full-context
Include the full context of translated data to work well with prompt caching.The translated lines per user and assistant message pairs are sliced as defined by
--history-prompt-length(by default--history-prompt-length 10), it is recommended to set this to the largest batch size (by default--batch-sizes "[10,100]"):--history-prompt-length 100.Enabling this may risk running into the model's context window limit, typically
128K, but should be sufficient for most cases. -
--log-level <level>
Log level (default:debug, choices:trace,debug,info,warn,error,silent) -
--silent
Same as--log-level silent -
--quiet
Same as--log-level silent
Additional Options for GPT:
-
-m, --model <model>
(default:"gpt-4o-mini") https://platform.openai.com/docs/api-reference/chat/create -
--stream
Stream progress output to terminal https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream -
-t, --temperature <temperature>
Sampling temperature to use, should set a low value below0.3to be more deterministic for translation (default:1) https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature -
--top_p <top_p>
Nucleus sampling parameter, top_p probability mass https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_p -
--presence_penalty <presence_penalty>
Penalty for new tokens based on their presence in the text so far https://platform.openai.com/docs/api-reference/chat/create#chat-create-presence_penalty -
--frequency_penalty <frequency_penalty
Penalty for new tokens based on their frequency in the text so far https://platform.openai.com/docs/api-reference/chat/create#chat-create-frequency_penalty -
--logit_bias <logit_bias>
Modify the likelihood of specified tokens appearing in the completion https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias
cli/translator.mjs --plain-text "δ½ ε₯½"Standard Output
Hello.
cli/translator.mjs --stream --to "Emojis" --temperature 0 --plain-text "$(curl 'https://api.chucknorris.io/jokes/0ECUwLDTTYSaeFCq6YMa5A' | jq .value)"Input Argument
Chuck Norris can walk with the animals, talk with the animals; grunt and squeak and squawk with the animals... and the animals, without fail, always say 'yessir Mr. Norris'.
Standard Output
π¨βπ¦°πͺπΆββοΈπ¦πππ
ππππππ¦ππ’ππΏοΈππΏοΈβοΈπ³π¬π²ππ€΅π¨βπ¦°π=ππππ¦ππ¦π¦π¦§π¦π
π¦π¦π¦ππ¦ππππ¦=ππ€΅.
cli/translator.mjs --stream --system-instruction "Scramble characters of words while only keeping the start and end letter" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Chuck Norris can walk with the animals, talk with the animals;"Standard Output
Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;
cli/translator.mjs --stream --system-instruction "Unscramble characters back to English" --no-prefix-number --no-line-matching --temperature 0 --plain-text "Cuhck Nroris can wakl wtih the aiamnls, talk wtih the aiamnls;"Standard Output
Chuck Norris can walk with the animals, talk with the animals;
cli/translator.mjs --stream --temperature 0 --input test/data/test_cn.txtInput file: test/data/test_cn.txt
δ½ ε₯½γ
ζζοΌ
Standard Output
Hello.
Goodbye!
cli/translator.mjs --stream --temperature 0 --input test/data/test_ja_small.srtInput file: test/data/test_ja_small.srt
1
00:00:00,000 --> 00:00:02,000
γγ―γγγγγγΎγγ
2
00:00:02,000 --> 00:00:05,000
γε
ζ°γ§γγοΌ
3
00:00:05,000 --> 00:00:07,000
γ―γγε
ζ°γ§γγ
4
00:00:08,000 --> 00:00:12,000
δ»ζ₯γ―倩ζ°γγγγ§γγγ
5
00:00:12,000 --> 00:00:16,000
γ―γγγ¨γ¦γγγ倩ζ°γ§γγOutput file: test/data/test_ja_small.srt.out_English.srt
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather.System Instruction
Tokens: 4
Translate to English
| Input | Prompt | Transform | Output |
|---|---|---|---|
|
Tokens: |
Tokens: |
Tokens: |
Tokens: |
1
00:00:00,000 --> 00:00:02,000
γγ―γγγγγγΎγγ
2
00:00:02,000 --> 00:00:05,000
γε
ζ°γ§γγοΌ
3
00:00:05,000 --> 00:00:07,000
γ―γγε
ζ°γ§γγ
4
00:00:08,000 --> 00:00:12,000
δ»ζ₯γ―倩ζ°γγγγ§γγγ
5
00:00:12,000 --> 00:00:16,000
γ―γγγ¨γ¦γγγ倩ζ°γ§γγ |
|
|
1
00:00:00,000 --> 00:00:02,000
Good morning.
2
00:00:02,000 --> 00:00:05,000
How are you?
3
00:00:05,000 --> 00:00:07,000
Yes, I'm doing well.
4
00:00:08,000 --> 00:00:12,000
The weather is nice today, isn't it?
5
00:00:12,000 --> 00:00:16,000
Yes, it's very nice weather. |
| Lines | SRT Text Format | No Batching | ChatGPT Subtitle Translator |
|---|---|---|---|
| 5 | 280 | 469 | 133 |
| 10 | 511 | 834 | 171 |
| 50 | 2,518 | 7,944 | 818 |
| 100 | 5,011 | 15,263 | 1,611 |
| 500 | 25,400 | 79,297 | 9,025 |
| 1000 | 52,988 | 184,596 | 20,593 |
Test data can be found in the test/data directory. Token count also roughly includes message payload structure and prompt token overheads.
SRT Text Format: Full SRT text format, including timestamps for input/output.
No Batching: SRT formatting and timestamp stripping, but one line per prompt with system instruction overhead, including up to 10 historical entries for context per prompt.
ChatGPT Subtitle Translator: SRT formatting and timestamp stripping, with line batching of 100, including up to 10 historical entries for context per batch.
This analysis assumes perfect input/output quantity matching. In reality, this depends on model and subtitle quality. Typically, buffer an additional 20%~30% token usage for retries, refer to the --batch-sizes CLI option.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for chatgpt-subtitle-translator
Similar Open Source Tools
chatgpt-subtitle-translator
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.
nano-graphrag
nano-GraphRAG is a simple, easy-to-hack implementation of GraphRAG that provides a smaller, faster, and cleaner version of the official implementation. It is about 800 lines of code, small yet scalable, asynchronous, and fully typed. The tool supports incremental insert, async methods, and various parameters for customization. Users can replace storage components and LLM functions as needed. It also allows for embedding function replacement and comes with pre-defined prompts for entity extraction and community reports. However, some features like covariates and global search implementation differ from the original GraphRAG. Future versions aim to address issues related to data source ID, community description truncation, and add new components.
python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.
bilingual_book_maker
The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt/srt files and books. It supports various models like gpt-4, gpt-3.5-turbo, claude-2, palm, llama-2, azure-openai, command-nightly, and gemini. Users need ChatGPT or OpenAI token, epub/txt books, internet access, and Python 3.8+. The tool provides options to specify OpenAI API key, model selection, target language, proxy server, context addition, translation style, and more. It generates bilingual books in epub format after translation. Users can test translations, set batch size, tweak prompts, and use different models like DeepL, Google Gemini, Tencent TranSmart, and more. The tool also supports retranslation, translating specific tags, and e-reader type specification. Docker usage is available for easy setup.
slack-mcp-server
Slack MCP Server is a Model Context Protocol server for Slack Workspaces, offering powerful features like Stealth and OAuth Modes, Enterprise Workspaces Support, Channel and Thread Support, Smart History, Search Messages, Safe Message Posting, DM and Group DM support, Embedded user information, Cache support, and multiple transport options. It provides tools like conversations_history, conversations_replies, conversations_add_message, conversations_search_messages, and channels_list for managing messages, threads, adding messages, searching messages, and listing channels. The server also exposes directory resources for workspace metadata access. The tool is designed to enhance Slack workspace functionality and improve user experience.
chatgpt-cli
ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.
magentic
Easily integrate Large Language Models into your Python code. Simply use the `@prompt` and `@chatprompt` decorators to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.
receipt-scanner
The receipt-scanner repository is an AI-Powered Receipt and Invoice Scanner for Laravel that allows users to easily extract structured receipt data from images, PDFs, and emails within their Laravel application using OpenAI. It provides a light wrapper around OpenAI Chat and Completion endpoints, supports various input formats, and integrates with Textract for OCR functionality. Users can install the package via composer, publish configuration files, and use it to extract data from plain text, PDFs, images, Word documents, and web content. The scanned receipt data is parsed into a DTO structure with main classes like Receipt, Merchant, and LineItem.
syncode
SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output with respect to defined Context-Free Grammar (CFG) rules. It supports general-purpose programming languages like Python, Go, SQL, JSON, and more, allowing users to define custom grammars using EBNF syntax. The tool compares favorably to other constrained decoders and offers features like fast grammar-guided generation, compatibility with HuggingFace Language Models, and the ability to work with various decoding strategies.
tokf
Tokf is a versatile text analysis tool designed to extract key information from text data. It provides functionalities for text summarization, sentiment analysis, keyword extraction, and named entity recognition. Tokf is easy to use and can handle large volumes of text data efficiently. Whether you are a data scientist, researcher, or developer, Tokf can help you gain valuable insights from your text data.
syncode
SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that ensures syntactically valid output based on a Context-Free Grammar (CFG). It supports various programming languages like Python, Go, SQL, Math, JSON, and more. Users can define custom grammars using EBNF syntax. SynCode offers fast generation, seamless integration with HuggingFace Language Models, and the ability to sample with different decoding strategies.
consult-llm-mcp
Consult LLM MCP is an MCP server that enables users to consult powerful AI models like GPT-5.2, Gemini 3.0 Pro, and DeepSeek Reasoner for complex problem-solving. It supports multi-turn conversations, direct queries with optional file context, git changes inclusion for code review, comprehensive logging with cost estimation, and various CLI modes for Gemini and Codex. The tool is designed to simplify the process of querying AI models for assistance in resolving coding issues and improving code quality.
monacopilot
Monacopilot is a powerful and customizable AI auto-completion plugin for the Monaco Editor. It supports multiple AI providers such as Anthropic, OpenAI, Groq, and Google, providing real-time code completions with an efficient caching system. The plugin offers context-aware suggestions, customizable completion behavior, and framework agnostic features. Users can also customize the model support and trigger completions manually. Monacopilot is designed to enhance coding productivity by providing accurate and contextually appropriate completions in daily spoken language.
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
llama.vim
llama.vim is a plugin that provides local LLM-assisted text completion for Vim users. It offers features such as auto-suggest on cursor movement, manual suggestion toggling, suggestion acceptance with Tab and Shift+Tab, control over text generation time, context configuration, ring context with chunks from open and edited files, and performance stats display. The plugin requires a llama.cpp server instance to be running and supports FIM-compatible models. It aims to be simple, lightweight, and provide high-quality and performant local FIM completions even on consumer-grade hardware.
AirspeedVelocity.jl
AirspeedVelocity.jl is a tool designed to simplify benchmarking of Julia packages over their lifetime. It provides a CLI to generate benchmarks, compare commits/tags/branches, plot benchmarks, and run benchmark comparisons for every submitted PR as a GitHub action. The tool freezes the benchmark script at a specific revision to prevent old history from affecting benchmarks. Users can configure options using CLI flags and visualize benchmark results. AirspeedVelocity.jl can be used to benchmark any Julia package and offers features like generating tables and plots of benchmark results. It also supports custom benchmarks and can be integrated into GitHub actions for automated benchmarking of PRs.
For similar tasks
chatgpt-subtitle-translator
This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.
gpt-subtrans
GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.
TeroSubtitler
Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.
AiNiee
AiNiee is a tool focused on AI translation, capable of automatically translating RPG SLG games, Epub TXT novels, Srt Lrc subtitles, and more. It provides features for configuring AI platforms, proxies, and translation settings. Users can utilize this tool for translating game scripts, novels, and subtitles efficiently. The tool supports multiple AI platforms and offers tutorials for beginners. It also includes functionalities for extracting and translating game text, with options for customizing translation projects and managing translation tasks effectively.
video2blog
video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.
auto-subs
Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.
Srt-AI-Voice-Assistant
Srt-AI-Voice-Assistant is a convenient tool that generates audio from uploaded .srt subtitle files by calling APIs such as Bert-VITS2 (HiyoriUI), GPT-SoVITS, and Microsoft TTS (online). The code is currently not perfect, and feedback on bugs or suggestions can be provided at https://github.com/YYuX-1145/Srt-AI-Voice-Assistant/issues. Recent updates include adding custom API functionality with a focus on security, support for Microsoft online TTS (requires key configuration), error handling improvements, automatic project path detection, compatibility with API-v1 for limited functionality, and significant feature updates supporting card synthesis.
VideoCaptioner
VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.