manga-translator
手机端的即时自动漫画翻译软件,由LLM驱动。Instant automatic manga translation app for mobile devices, powered by LLM.
Stars: 56
Manga Translator is a tool designed to help users translate manga pages from Japanese to English. It utilizes optical character recognition (OCR) technology to extract text from images and provides a user-friendly interface for translating and editing the text. The tool supports various manga formats and allows users to customize the translation process by adjusting settings such as language preferences and text alignment. With Manga Translator, users can easily translate manga pages for personal use or sharing with others.
README:
面向安卓的漫画翻译 App:本地气泡检测与 OCR,结合 OpenAI 兼容接口完成翻译,并在原图上覆盖显示可拖动的翻译气泡。 使用教程:https://github.com/jedzqer/manga-translator/blob/main/Tutorial/简中教程.md
| 原图 | 翻译结果 | 嵌入效果 |
|---|---|---|
![]() |
![]() |
![]() |
- 日译中,英译中
- 漫画库管理:新建文件夹、批量导入图片、EhViewer 导入
- 翻译流程:气泡检测 + OCR + LLM 翻译,支持标准模式与全文速译
- 阅读体验:翻译覆盖层、翻译气泡位置可拖动、阅读进度自动保存
- 译名表与缓存:按文件夹维护 glossary.json,自动累积固定译名
- 更新与日志:启动检查更新,翻译期间前台服务与日志查看
- 在漫画库中新建文件夹并导入图片
- 确保图片文件名顺序与阅读顺序一致(例如 1.jpg, 2.jpg)
- 在设置页填写 API 地址、API Key、模型名称(OpenAI 兼容)
- 回到漫画库,选择文件夹并点击“翻译文件夹”
- 翻译完成后点击“开始阅读”,在阅读页可拖动气泡位置
全文速译建议:页数较多时分批上传翻译,或在设置中提高 API 超时。
- 翻译失败或结果为空:确认 API 地址以
/v1结尾,模型名与供应商一致,且网络可达 - 翻译顺序错乱:请先对图片按阅读顺序重命名
- 怎么获取AI:具体获取方法可以去搜索一下
可以进QQ群提问交流:1080302768
- 漫画库存储:
/Android/data/<package>/files/manga_library/ - 每张图片生成同名
*.json翻译结果,OCR 缓存为*.ocr.json - 译名表:每个文件夹维护
glossary.json - 阅读进度、全文速译开关等存储在 SharedPreferences
- JDK 17.0.17+
- Kotlin 2.0.0+
- Gradle 8.11.1+
- Android SDK: platform 35, build-tools 35.0.0
./gradlew :app:assembleDebug
./gradlew :app:assembleRelease将以下模型文件放入 assets/:
-
comic-speech-bubble-detector.onnx(气泡检测) -
encoder_model.onnx、decoder_model.onnx(OCR)
模型下载链接:
- 气泡检测模型:https://huggingface.co/ogkalu/comic-speech-bubble-detector-yolov8m
- OCR 模型:https://huggingface.co/l0wgear/manga-ocr-2025-onnx
提示词与 OCR 配置位于 assets/,名称需与代码保持一致。
需同时修改:
app/src/main/java/com/manga/translate/VersionInfo.ktapp/build.gradle.ktsupdate.json
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for manga-translator
Similar Open Source Tools
manga-translator
Manga Translator is a tool designed to help users translate manga pages from Japanese to English. It utilizes optical character recognition (OCR) technology to extract text from images and provides a user-friendly interface for translating and editing the text. The tool supports various manga formats and allows users to customize the translation process by adjusting settings such as language preferences and text alignment. With Manga Translator, users can easily translate manga pages for personal use or sharing with others.
hujiang_dictionary
Hujiang Dictionary is a tool that provides translation services between Japanese, Chinese, and English. It supports various translation modes such as Japanese to Chinese, Chinese to Japanese, English to Japanese, and more. The tool utilizes cloud services like Telegram, Lambda, and Cloudflare Workers for different deployment options. Users can interact with the tool via a command-line interface (CLI) to perform translations and access online resources like weblio and Google Translate. Additionally, the tool offers a Telegram bot for users to access translation services conveniently. The tool also supports setting up and managing databases for storing translation data.
PaddleOCR
PaddleOCR is an easy-to-use and scalable OCR toolkit based on PaddlePaddle. It provides a series of text detection and recognition models, supporting multiple languages and various scenarios. With PaddleOCR, users can perform accurate and efficient text extraction from images and videos, making it suitable for tasks such as document scanning, text recognition, and information extraction.
subtitler
Subtitles by fframes is a free, local, on-device AI video transcription tool with a user-friendly GUI. It allows users to transcribe video content, edit transcribed cues, style the subtitles, and render them directly onto the video. The tool provides a convenient way to create accurate subtitles for videos without the need for an internet connection.
Rodel.Agent
Rodel Agent is a Windows desktop application that integrates chat, text-to-image, text-to-speech, and machine translation services, providing users with a comprehensive desktop AI experience. The application supports mainstream AI services and aims to enhance user interaction through various AI functionalities.
llamabot
LlamaBot is a Pythonic bot interface to Large Language Models (LLMs), providing an easy way to experiment with LLMs in Jupyter notebooks and build Python apps utilizing LLMs. It supports all models available in LiteLLM. Users can access LLMs either through local models with Ollama or by using API providers like OpenAI and Mistral. LlamaBot offers different bot interfaces like SimpleBot, ChatBot, QueryBot, and ImageBot for various tasks such as rephrasing text, maintaining chat history, querying documents, and generating images. The tool also includes CLI demos showcasing its capabilities and supports contributions for new features and bug reports from the community.
recognizer
Recognizer is a Python library for speech recognition. It provides a simple interface to transcribe speech from audio files or live audio input. The library supports multiple speech recognition engines, including Google Speech Recognition, Sphinx, and Wit.ai. Recognizer is easy to use and can be integrated into various applications to enable voice commands, transcription, and speech-to-text functionality.
prompt-generator-comfyui
Custom AI prompt generator node for ComfyUI. With this node, you can use text generation models to generate prompts. Before using, text generation model has to be trained with prompt dataset.
amazon-sagemaker-generativeai
Repository for training and deploying Generative AI models, including text-text, text-to-image generation, prompt engineering playground and chain of thought examples using SageMaker Studio. The tool provides a platform for users to experiment with generative AI techniques, enabling them to create text and image outputs based on input data. It offers a range of functionalities for training and deploying models, as well as exploring different generative AI applications.
OpenStableDiffusion
OpenStableDiffusion is a straightforward Android application designed to create stable diffusion images using AI technology. The app is user-friendly and allows users to generate high-quality diffusion images effortlessly. It leverages AI algorithms to enhance the image creation process, providing users with a seamless experience. OpenStableDiffusion is suitable for individuals looking to generate visually appealing diffusion images without the need for advanced technical skills. The app's intuitive interface and efficient AI capabilities make it a valuable tool for creating stunning images with ease.
push-2-talk
PushToTalk is a high-performance desktop voice input tool with large language model (LLM) capabilities. It supports two working modes: dictation mode and AI assistant mode. The tool offers features like real-time transcription, LLM intelligent post-processing, custom hotkeys, multiple ASR engines support, visual feedback, audio feedback, history records, system tray support, automatic updates, multiple configuration management, personal glossary, automatic glossary learning, LLM configuration center, theme switching, mute during recording, VAD silence detection, AGC automatic gain, multi-screen support, and more.
LLM-Workshop
This repository contains a collection of resources for learning about and using Large Language Models (LLMs). The resources include tutorials, code examples, and links to additional resources. LLMs are a type of artificial intelligence that can understand and generate human-like text. They have a wide range of potential applications, including natural language processing, machine translation, and chatbot development.
mdream
Mdream is a lightweight and user-friendly markdown editor designed for developers and writers. It provides a simple and intuitive interface for creating and editing markdown files with real-time preview. The tool offers syntax highlighting, markdown formatting options, and the ability to export files in various formats. Mdream aims to streamline the writing process and enhance productivity for individuals working with markdown documents.
Adobe-Illustrator-And-Generative-AI-2024
Adobe Illustrator And Generative AI 2024 is a repository offering Adobe Illustrator CC for free as part of a creative toolkit. It provides legal and free access to the 2024 edition of Adobe Illustrator, a standard tool for designing vector graphics and digital illustrations. The repository includes information on installation, setup, and the main functions of Adobe Illustrator, such as creating digital illustrations, logo design, infographics, print design, publication design, web element design, and user interface design. It also lists the technical requirements, language options, license details, and the latest update date.
nvim-aider
Nvim-aider is a plugin for Neovim that provides additional functionality and key mappings to enhance the user's editing experience. It offers features such as code navigation, quick access to commonly used commands, and improved text manipulation tools. With Nvim-aider, users can streamline their workflow and increase productivity while working with Neovim.
obsidian-NotEMD
Obsidian-NotEMD is a plugin for the Obsidian note-taking app that allows users to export notes in various formats without converting them to EMD. It simplifies the process of sharing and collaborating on notes by providing seamless export options. With Obsidian-NotEMD, users can easily export their notes to PDF, HTML, Markdown, and other formats directly from Obsidian, saving time and effort. This plugin enhances the functionality of Obsidian by streamlining the export process and making it more convenient for users to work with their notes across different platforms and applications.
For similar tasks
ai-no-jimaku-gumi
AI no jimaku gumi is a command-line utility designed to assist in video translation. It supports translating subtitles using AI models and provides options for different translation and subtitle sources. Users can easily set up the tool by following the installation steps and use it to translate videos to different languages with customizable settings. The tool currently supports DeepL and llm translation backends and SRT subtitle export. It aims to simplify the process of adding subtitles to videos by leveraging AI technology.
manga-translator
Manga Translator is a tool designed to help users translate manga pages from Japanese to English. It utilizes optical character recognition (OCR) technology to extract text from images and provides a user-friendly interface for translating and editing the text. The tool supports various manga formats and allows users to customize the translation process by adjusting settings such as language preferences and text alignment. With Manga Translator, users can easily translate manga pages for personal use or sharing with others.
airunner
AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.
tb1
A Telegram bot for accessing Google Gemini, MS Bing, etc. The bot responds to the keywords 'bot' and 'google' to provide information. It can handle voice messages, text files, images, and links. It can generate images based on descriptions, extract text from images, and summarize content. The bot can interact with various AI models and perform tasks like voice control, text-to-speech, and text recognition. It supports long texts, large responses, and file transfers. Users can interact with the bot using voice commands and text. The bot can be customized for different AI providers and has features for both users and administrators.
kazam
Kazam 2.0 is a versatile tool for screen recording, broadcasting, capturing, and optical character recognition (OCR). It allows users to capture screen content, broadcast live over the internet, extract text from captured content, record audio, and use a web camera for recording. The tool supports full screen, window, and area modes, and offers features like keyboard shortcuts, live broadcasting with Twitch and YouTube, and tips for recording quality. Users can install Kazam on Ubuntu and use it for various recording and broadcasting needs.
Edit-Banana
Edit Banana is a universal content re-editor that allows users to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction while preserving original diagram details and logical relationships. The platform offers advanced segmentation, fixed multi-round VLM scanning, high-quality OCR, user system with credits, multi-user concurrency, and a web interface. Users can upload images or PDFs to get editable DrawIO (XML) or PPTX files in seconds. The project structure includes components for segmentation, text extraction, frontend, models, and scripts, with detailed installation and setup instructions provided. The tool is open-source under the Apache License 2.0, allowing commercial use and secondary development.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.


