VideoCaptioner

🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理！- A powered tool for easy and efficient video subtitling.

Stars: 4908

Visit

VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.

README:

卡卡字幕助手

VideoCaptioner

一款基于大语言模型(LLM)的视频字幕处理助手，支持语音识别、字幕断句、优化、翻译全流程处理

简体中文 / 正體中文 / English / 日本語

📖 项目介绍

卡卡字幕助手（VideoCaptioner）操作简单且无需高配置，支持网络调用和本地离线（支持调用GPU）两种方式进行语音识别，利用可用通过大语言模型进行字幕智能断句、校正、翻译，字幕视频全流程一键处理！为视频配上效果惊艳的字幕。

最新版本已经支持 VAD 、人声分离、字级时间戳批量字幕等实用功能

🎯 无需GPU即可使用强大的语音识别引擎，生成精准字幕
✂️ 基于 LLM 的智能分割与断句，字幕阅读更自然流畅
🔄 AI字幕多线程优化与翻译，调整字幕格式、表达更地道专业
🎬 支持批量视频字幕合成，提升处理效率
📝 直观的字幕编辑查看界面，支持实时预览和快捷编辑
🤖 消耗模型 Token 少，且内置基础 LLM 模型，保证开箱即用

📸 界面预览

🧪 测试

全流程处理一个14分钟1080P的 B站英文 TED 视频，调用本地 Whisper 模型进行语音识别，使用 gpt-4o-mini 模型优化和翻译为中文，总共消耗时间约 4 分钟。

近后台计算，模型优化和翻译消耗费用不足￥0.01（以OpenAI官方价格为计算）

具体字幕和视频合成的效果的测试结果图片，请参考 TED视频测试

🚀 快速开始

Windows 用户

软件较为轻量，打包大小不足 60M,已集成所有必要环境，下载后可直接运行。

从 Release 页面下载最新版本的可执行程序。或者：蓝奏盘下载
打开安装包进行安装
LLM API 配置，（用于字幕断句、校正），可使用 ✨本项目的中转站
翻译配置，选择是否启用翻译，翻译服务（默认使用微软翻译，质量一般，推荐使用大模型翻译）
语音识别配置（默认使用B接口，中英以外的语言请使用本地转录）
拖拽视频文件到软件窗口，即可全自动处理

提示：每一个步骤均支持单独处理，均支持文件拖拽。软件具体模型选择和参数配置说明，请查看下文。

MacOS 用户

由于本人缺少 Mac，所以没法测试和打包，暂无法提供 MacOS 的可执行程序。

Mac 用户请自行使用下载源码和安装 python 依赖运行。（本地 Whisper 功能暂不支持 MacOS）

安装 ffmpeg 和 Aria2 下载工具

brew install ffmpeg
brew install aria2
brew install python@3.**

克隆项目

git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner

安装依赖

python3.** -m venv venv
source venv/bin/activate
pip install -r requirements.txt

运行程序

python main.py

Docker 部署（beta）

目前本项目streamlit应用因为项目重构过，Docker不可以使用。欢迎各位PR贡献新代码。

1. 克隆项目

git clone https://github.com/WEIFENG2333/VideoCaptioner.git
cd VideoCaptioner

2. 构建镜像

docker build -t video-captioner .

3. 运行容器

使用自定义API配置运行：

docker run -d \
  -p 8501:8501 \
  -v $(pwd)/temp:/app/temp \
  -e OPENAI_BASE_URL="你的API地址" \
  -e OPENAI_API_KEY="你的API密钥" \
  --name video-captioner \
  video-captioner

4. 访问应用

打开浏览器访问：http://localhost:8501

注意事项

容器内已预装ffmpeg等必要依赖
如需使用其他模型，请通过环境变量配置

⚙️ 基本配置

1. LLM API 配置说明

LLM 大模型是用来字幕段句、字幕优化、以及字幕翻译（如果选择了LLM 大模型翻译）。

配置项	说明
SiliconCloud	SiliconCloud 官网配置方法请参考配置文档该并发较低，建议把线程设置为5以下。
DeepSeek	DeepSeek 官网，建议使用 `deepseek-v3` 模型，官方网站最近服务好像并不太稳定。
Ollama本地	Ollama 官网
内置公益模型	内置基础大语言模型（`gpt-4o-mini`）(公益服务不稳定，强烈建议请使用自己的模型API)
OpenAI兼容接口	如果有其他服务商的API，可直接在软件中填写。base_url 和api_key

注：如果用的 API 服务商不支持高并发，请在软件设置中将“线程数”调低，避免请求错误。

如果希望高并发⚡️，或者希望在在软件内使用使用 OpenAI 或者 Claude 等优质大模型进行字幕校正和翻译。

可使用本项目的✨LLM API中转站✨： https://api.videocaptioner.cn

其支持高并发，性价比极高，且有国内外大量模型可挑选。

注册获取key之后，设置中按照下面配置：

BaseURL: https://api.videocaptioner.cn/v1

API-key: 个人中心-API 令牌页面自行获取。

💡 模型选择建议 (本人在各质量层级中精选出的高性价比模型)：

高质量之选： claude-3-5-sonnet-20241022 (耗费比例：3)
较高质量之选： gemini-2.0-flash、deepseek-chat (耗费比例：1)
中质量之选： gpt-4o-mini、gemini-1.5-flash (耗费比例：0.15)

本站支持超高并发，软件中线程数直接拉满即可~ 处理速度非常快~

更详细的API配置教程：中转站配置配置

2. 翻译配置

配置项	说明
LLM 大模型翻译	🌟 翻译质量最好的选择。使用 AI 大模型进行翻译,能更好理解上下文,翻译更自然。需要在设置中配置 LLM API(比如 OpenAI、DeepSeek 等)
DeepLx 翻译	翻译较可靠。基于 DeepL 翻译, 需要要配置自己的后端接口。
微软翻译	使用微软的翻译服务, 速度非常快
谷歌翻译	谷歌的翻译服务,速度快,但需要能访问谷歌的网络环境

推荐使用 LLM 大模型翻译 ，翻译质量最好。

3. 语音识别接口说明

接口名称	支持语言	运行方式	说明
B接口	仅支持中文、英文	在线	免费、速度较快
J接口	仅支持中文、英文	在线	免费、速度较快
WhisperCpp	中文、日语、韩语、英文等 99 种语言，外语效果较好	本地	（实际使用不稳定）需要下载转录模型中文建议medium以上模型英文等使用较小模型即可达到不错效果。
fasterWhisper 👍	中文、英文等多99种语言，外语效果优秀，时间轴更准确	本地	（🌟极力推荐🌟）需要下载程序和转录模型支持CUDA,速度更快，转录准确。超级准确的时间戳字幕。建议优先使用

4. 本地 Whisper 语音识别模型

Whisper 版本有 WhisperCpp 和 fasterWhisper（推荐）两种，后者效果更好，都需要自行在软件内下载模型。

模型	磁盘空间	内存占用	说明
Tiny	75 MiB	~273 MB	转录很一般，仅用于测试
Small	466 MiB	~852 MB	英文识别效果已经不错
Medium	1.5 GiB	~2.1 GB	中文识别建议至少使用此版本
Large-v2 👍	2.9 GiB	~3.9 GB	效果好，配置允许情况推荐使用
Large-v3	2.9 GiB	~3.9 GB	社区反馈可能会出现幻觉/字幕重复问题

推荐模型: Large-v2 稳定且质量较好。

注：以上模型国内网络可直接在软件内下载。

5. 文稿匹配

在"字幕优化与翻译"页面，包含"文稿匹配"选项，支持以下一种或者多种内容，辅助校正字幕和翻译:

类型	说明	填写示例
术语表	专业术语、人名、特定词语的修正对照表	机器学习->Machine Learning 马斯克->Elon Musk 打call -> 应援图灵斑图公交车悖论
原字幕文稿	视频的原有文稿或相关内容	完整的演讲稿、课程讲义等
修正要求	内容相关的具体修正要求	统一人称代词、规范专业术语等填写内容相关的要求即可，示例参考

如果需要文稿进行字幕优化辅助，全流程处理时，先填写文稿信息，再进行开始任务处理
注意: 使用上下文参数量不高的小型LLM模型时，建议控制文稿内容在1千字内，如果使用上下文较大的模型，则可以适当增加文稿内容。

无特殊需求，一般不填写。

6. Cookie 配置说明

如果使用URL下载功能时，如果遇到以下情况:

下载视频网站需要登录信息才可以下载；
只能下载较低分辨率的视频；
网络条件较差时需要验证；

请参考 Cookie 配置说明获取Cookie信息，并将cookies.txt文件放置到软件安装目录的 AppData 目录下，即可正常下载高质量视频。

💡 软件流程介绍

程序简单的处理流程如下:

语音识别转录 -> 字幕断句(可选) -> 字幕优化翻译(可选) -> 字幕视频合成

✨ 软件主要功能

软件利用大语言模型(LLM)在理解上下文方面的优势，对语音识别生成的字幕进一步处理。有效修正错别字、统一专业术语，让字幕内容更加准确连贯，为用户带来出色的观看体验！

1. 多平台视频下载与处理

支持国内外主流视频平台（B站、Youtube、小红书、TikTok、X、西瓜视频、抖音等）
自动提取视频原有字幕处理

2. 专业的语音识别引擎

提供多种接口在线识别，效果媲美剪映（免费、高速）
支持本地Whisper模型（保护隐私、可离线）

3. 字幕智能纠错

自动优化专业术语、代码片段和数学公式格式
上下文进行断句优化，提升阅读体验
支持文稿提示，使用原有文稿或者相关提示优化字幕断句

4. 高质量字幕翻译

结合上下文的智能翻译，确保译文兼顾全文
通过Prompt指导大模型反思翻译，提升翻译质量
使用序列模糊匹配算法、保证时间轴完全一致

5. 字幕样式调整

丰富的字幕样式模板（科普风、新闻风、番剧风等等）
多种格式字幕视频（SRT、ASS、VTT、TXT）

针对小白用户，对一些软件内的选项说明：

1. 语音转录页面

VAD过滤：开启后，VAD（语音活动检测）将过滤无人声的语音片段，从而减少幻觉现象。建议保持默认开启状态。如果不懂，其他VAD选项建议直接保持默认即可。
音频分离：开启后，使用MDX-Net进行降噪处理，能够有效分离人声和背景音乐，从而提升音频质量。建议只在嘈杂的视频中开启。

2. 字幕优化与翻译页面

智能断句：开启后，全流程处理时生成字级时间戳，然后通过LLM大模型进行断句，从而在视频有更完美的观看体验。有按照句子断句和按照语义断句两种模式。可根据自己的需求配置。
字幕校正：开启后，会通过LLM大模型对字幕内容进行校正(如：英文单词大小写、标点符号、错别字、数学公式和代码的格式等)，提升字幕的质量。
反思翻译：开启后，会通过LLM大模型进行反思翻译，提升翻译的质量。相应的会增加请求的时间和消耗的Token。(选项在设置页-LLM大模型翻译-反思翻译中开启。)
文稿提示：填写后，这部分也将作为提示词发送给大模型，辅助字幕优化和翻译。

3. 字幕视频合成页面

视频合成：开启后，会根据合成字幕视频；关闭将跳过视频合成的流程。
软字幕：开启后，字幕不会烧录到视频中，处理速度极快。但是软字幕需要一些播放器（如PotPlayer）支持才可以进行显示播放。而且软字幕的样式不是软件内调整的字幕样式，而是播放器默认的白色样式。

安装软件的主要目录结构说明如下：

VideoCaptioner/
├── runtime/                    # 运行环境目录
├── resources/               # 软件资源文件目录（二进制程序、图标等,以及下载的faster-whisper程序）
├── work-dir/               # 工作目录，处理完成的视频和字幕文件保存在这里
├── AppData/                    # 应用数据目录
    ├── cache/              # 缓存目录，缓存转录、大模型请求的数据。
    ├── models/              # 存放 Whisper 模型文件
    ├── logs/               # 日志目录，记录软件运行状态
    ├── settings.json          # 存储用户设置
    └──  cookies.txt           # 视频平台的 cookie 信息（下载高清视频时需要）
└── VideoCaptioner.exe      # 主程序执行文件

📝 说明

字幕断句的质量对观看体验至关重要。软件能将逐字字幕智能重组为符合自然语言习惯的段落，并与视频画面完美同步。
在处理过程中，仅向大语言模型发送文本内容，不包含时间轴信息，这大大降低了处理开销。
在翻译环节，我们采用吴恩达提出的"翻译-反思-翻译"方法论。这种迭代优化的方式确保了翻译的准确性。
填入 YouTube 链接时进行处理时，会自动下载视频的字幕，从而省去转录步骤，极大地节省操作时间。

🤝 贡献指南

作者是一名大三学生，个人能力和项目都还有许多不足，项目也在不断完善中，如果在使用过程遇到的Bug，欢迎提交 Issue 和 Pull Request 帮助改进项目。

更新日志

2025.02.07

### Bug 修复与其他改进 - 修复谷歌翻译语言不正确的问题。 - 修部微软翻译不准确的问题。 - 修复运行设备不选择cuda时显示报 winError的错误 - 修复合成失败的问题 - 修复ass单语字幕没有内容的问题

2024.2.06

核心功能增强

完整重构代码架构，优化整体性能
字幕优化与翻译功能模块分离，提供更灵活的处理选项
新增批量处理功能：支持批量字幕、批量转录、批量字幕视频合成
全面优化 UI 界面与交互细节

AI 模型与翻译升级

扩展 LLM 支持：新增 SiliconCloud、DeepSeek、Ollama、Gemini、ChatGLM 等模型
集成多种翻译服务：DeepLx、Bing、Google、LLM
新增 faster-whisper-large-v3-turbo 模型支持
新增多种 VAD（语音活动检测）方法
支持自定义反思翻译开关
字幕断句支持语义/句子两种模式
字幕断句、优化、翻译提示词的优化
字幕、转录缓存机制的优化
优化中文字幕自动换行功能
新增竖屏字幕样式
改进字幕时间轴切换机制，消除闪烁问题

Bug 修复与其他改进

修复 Whisper API 无法使用问题
新增多种字幕视频格式支持
修复部分情况转录错误的问题
优化视频工作目录结构
新增日志查看功能
新增泰语、德语等语言的字幕优化
修复诸多Bug...

2024.12.07

新增 Faster-whisper 支持，音频转字幕质量更优
支持Vad语音断点检测，大大减少幻觉现象
支持人声音分离，分离视频背景噪音
支持关闭视频合成
新增字幕最大长度设置
新增字幕末尾标点去除设置
优化和翻译的提示词优化
优化LLM字幕断句错误的情况
修复音频转换格式不一致问题

2024.11.23

新增 Whisper-v3 模型支持，大幅提升语音识别准确率
优化字幕断句算法，提供更自然的阅读体验
修复检测模型可用性时的稳定性问题

2024.11.20

支持自定义调节字幕位置和样式
新增字幕优化和翻译过程的实时日志查看
修复使用 API 时的自动翻译问题
优化视频工作目录结构,提升文件管理效率

2024.11.17

支持双语/单语字幕灵活导出
新增文稿匹配提示对齐功能
修复字幕导入时的稳定性问题
修复非中文路径下载模型的兼容性问题

2024.11.13

新增 Whisper API 调用支持
支持导入 cookie.txt 下载各大视频平台资源
字幕文件名自动与视频保持一致
软件主页新增运行日志实时查看
统一和完善软件内部功能

💖 支持作者

如果觉得项目对你有帮助，可以给项目点个Star，这将是对我最大的鼓励和支持！

捐助支持

⭐ Star History

For Tasks:

Click tags to check more tools for each tasks

generate accurate subtitles optimize subtitles translate subtitles batch process video subtitles edit subtitles

For Jobs:

video editor content creator translator subtitler ai engineer

Alternative AI tools for VideoCaptioner

Similar Open Source Tools

VideoCaptioner

github

: 4.9k

WeClone

WeClone is a tool that fine-tunes large language models using WeChat chat records. It utilizes approximately 20,000 integrated and effective data points, resulting in somewhat satisfactory outcomes that are occasionally humorous. The tool's effectiveness largely depends on the quantity and quality of the chat data provided. It requires a minimum of 16GB of GPU memory for training using the default chatglm3-6b model with LoRA method. Users can also opt for other models and methods supported by LLAMA Factory, which consume less memory. The tool has specific hardware and software requirements, including Python, Torch, Transformers, Datasets, Accelerate, and other optional packages like CUDA and Deepspeed. The tool facilitates environment setup, data preparation, data preprocessing, model downloading, parameter configuration, model fine-tuning, and inference through a browser demo or API service. Additionally, it offers the ability to deploy a WeChat chatbot, although users should be cautious due to the risk of account suspension by WeChat.

github

: 368

Langchain-Chatchat

LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.

github

: 34.4k

XianyuAutoAgent

Xianyu AutoAgent is an AI customer service robot system specifically designed for the Xianyu platform, providing 24/7 automated customer service, supporting multi-expert collaborative decision-making, intelligent bargaining, and context-aware conversations. The system includes intelligent conversation engine with features like context awareness and expert routing, business function matrix with modules like core engine, bargaining system, technical support, and operation monitoring. It requires Python 3.8+ and NodeJS 18+ for installation and operation. Users can customize prompts for different experts and contribute to the project through issues or pull requests.

github

: 973

ChatTTS-Forge

ChatTTS-Forge is a powerful text-to-speech generation tool that supports generating rich audio long texts using a SSML-like syntax and provides comprehensive API services, suitable for various scenarios. It offers features such as batch generation, support for generating super long texts, style prompt injection, full API services, user-friendly debugging GUI, OpenAI-style API, Google-style API, support for SSML-like syntax, speaker management, style management, independent refine API, text normalization optimized for ChatTTS, and automatic detection and processing of markdown format text. The tool can be experienced and deployed online through HuggingFace Spaces, launched with one click on Colab, deployed using containers, or locally deployed after cloning the project, preparing models, and installing necessary dependencies.

github

: 692

video-subtitle-remover

Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.

github

: 4.0k

XiaoXinAir14IML_2019_hackintosh

XiaoXinAir14IML_2019_hackintosh is a repository dedicated to enabling macOS installation on Lenovo XiaoXin Air-14 IML 2019 laptops. The repository provides detailed information on the hardware specifications, supported systems, BIOS versions, related models, installation methods, updates, patches, and recommended settings. It also includes tools and guides for BIOS modifications, enabling high-resolution display settings, Bluetooth synchronization between macOS and Windows 10, voltage adjustments for efficiency, and experimental support for YogaSMC. The repository offers solutions for various issues like sleep support, sound card emulation, and battery information. It acknowledges the contributions of developers and tools like OpenCore, itlwm, VoodooI2C, and ALCPlugFix.

github

: 140

Awesome-ChatTTS

Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

github

: 594

TelegramForwarder

Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.

github

: 193

agentica

Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

github

: 108

ChuanhuChatGPT

Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.

github

: 15.2k

xiaogpt

xiaogpt is a tool that allows you to play ChatGPT and other LLMs with Xiaomi AI Speaker. It supports ChatGPT, New Bing, ChatGLM, Gemini, Doubao, and Tongyi Qianwen. You can use it to ask questions, get answers, and have conversations with AI assistants. xiaogpt is easy to use and can be set up in a few minutes. It is a great way to experience the power of AI and have fun with your Xiaomi AI Speaker.

github

: 6.5k

build_MiniLLM_from_scratch

This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

github

: 397

Speech-AI-Forge

Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

github

: 1.2k

HivisionIDPhotos

HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.

github

: 10.3k

Chinese-Mixtral-8x7B

Chinese-Mixtral-8x7B is an open-source project based on Mistral's Mixtral-8x7B model for incremental pre-training of Chinese vocabulary, aiming to advance research on MoE models in the Chinese natural language processing community. The expanded vocabulary significantly improves the model's encoding and decoding efficiency for Chinese, and the model is pre-trained incrementally on a large-scale open-source corpus, enabling it with powerful Chinese generation and comprehension capabilities. The project includes a large model with expanded Chinese vocabulary and incremental pre-training code.

github

: 635

For similar tasks

VideoCaptioner

github

: 4.9k

TeroSubtitler

Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.

github

: 190

subtitler

Subtitles by fframes is a free, local, on-device AI video transcription tool with a user-friendly GUI. It allows users to transcribe video content, edit transcribed cues, style the subtitles, and render them directly onto the video. The tool provides a convenient way to create accurate subtitles for videos without the need for an internet connection.

github

: 92

gpt-subtrans

GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.

github

: 418

chatgpt-subtitle-translator

This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.

github

: 295

AiNiee

AiNiee is a tool focused on AI translation, capable of automatically translating RPG SLG games, Epub TXT novels, Srt Lrc subtitles, and more. It provides features for configuring AI platforms, proxies, and translation settings. Users can utilize this tool for translating game scripts, novels, and subtitles efficiently. The tool supports multiple AI platforms and offers tutorials for beginners. It also includes functionalities for extracting and translating game text, with options for customizing translation projects and managing translation tasks effectively.

github

: 2.2k

video2blog

video2blog is an open-source project aimed at converting videos into textual notes. The tool follows a process of extracting video information using yt-dlp, downloading the video, downloading subtitles if available, translating subtitles if not in Chinese, generating Chinese subtitles using whisper if no subtitles exist, converting subtitles to articles using gemini, and manually inserting images from the video into the article. The tool provides a solution for creating blog content from video resources, enhancing accessibility and content creation efficiency.

github

: 58

auto-subs

Auto-subs is a tool designed to automatically transcribe editing timelines using OpenAI Whisper and Stable-TS for extreme accuracy. It generates subtitles in a custom style, is completely free, and runs locally within Davinci Resolve. It works on Mac, Linux, and Windows, supporting both Free and Studio versions of Resolve. Users can jump to positions on the timeline using the Subtitle Navigator and translate from any language to English. The tool provides a user-friendly interface for creating and customizing subtitles for video content.

github

: 799

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k