video-subtitle-remover

基于AI的图片/视频硬字幕去除、文本水印去除，无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API，本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Stars: 4046

Visit

Video-subtitle-remover (VSR) is a software based on AI technology that removes hard subtitles from videos. It achieves the following functions: - Lossless resolution: Remove hard subtitles from videos, generate files with subtitles removed - Fill the region of removed subtitles using a powerful AI algorithm model (non-adjacent pixel filling and mosaic removal) - Support custom subtitle positions, only remove subtitles in defined positions (input position) - Support automatic removal of all text in the entire video (no input position required) - Support batch removal of watermark text from multiple images.

README:

简体中文 | English

项目简介

Video-subtitle-remover (VSR) 是一款基于AI技术，将视频中的硬字幕去除的软件。主要实现了以下功能：

无损分辨率将视频中的硬字幕去除，生成去除字幕后的文件
通过超强AI算法模型，对去除字幕文本的区域进行填充（非相邻像素填充与马赛克去除）
支持自定义字幕位置，仅去除定义位置中的字幕（传入位置）
支持全视频自动去除所有文本（不传入位置）
支持多选图片批量去除水印文本

使用说明：

有使用问题请加群讨论，QQ群：806152575
直接下载压缩包解压运行，如果不能运行再按照下面的教程，尝试源码安装conda环境运行

下载地址：

Windows GPU版本v1.1.0（GPU）：

百度网盘: vsr_windows_gpu_v1.1.0.zip 提取码：vsr1
Google Drive: vsr_windows_gpu_v1.1.0.zip

仅供具有Nvidia显卡的用户使用(AMD的显卡不行)

演示

GUI版：

点击查看演示视频👇

源码使用说明

无Nvidia显卡请勿使用本项目，最低配置：

GPU：GTX 1060或以上显卡

CPU: 支持AVX指令集

1. 下载安装Miniconda

Windows: Miniconda3-py38_4.11.0-Windows-x86_64.exe
Linux: Miniconda3-py38_4.11.0-Linux-x86_64.sh

2. 创建并激活虚机环境

（1）切换到源码所在目录：

cd <源码所在目录>

例如：如果你的源代码放在D盘的tools文件下，并且源代码的文件夹名为video-subtitle-remover，就输入 cd D:/tools/video-subtitle-remover-main

（2）创建激活conda环境

conda create -n videoEnv python=3.8

conda activate videoEnv

3. 安装依赖文件

请确保你已经安装 python 3.8+，使用conda创建项目虚拟环境并激活环境 (建议创建虚拟环境运行，以免后续出现问题)

安装CUDA和cuDNN
Linux用户

(1) 下载CUDA 11.7
```
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
```
(2) 安装CUDA 11.7
```
sudo sh cuda_11.7.0_515.43.04_linux.run
```
1. 输入accept

2. 选中CUDA Toolkit 11.7（如果你没有安装nvidia驱动则选中Driver，如果你已经安装了nvidia驱动请不要选中driver），之后选中install，回车

3. 添加环境变量

在 ~/.bashrc 加入以下内容
```
# CUDA
export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
```
使其生效
```
source ~/.bashrc
```
(3) 下载cuDNN 8.4.1

国内：cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz 提取码：57mg

国外：cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz

(4) 安装cuDNN 8.4.1
```
 tar -xf cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz
 mv cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive cuda
 sudo cp ./cuda/include/* /usr/local/cuda-11.7/include/
 sudo cp ./cuda/lib/* /usr/local/cuda-11.7/lib64/
 sudo chmod a+r /usr/local/cuda-11.7/lib64/*
 sudo chmod a+r /usr/local/cuda-11.7/include/*
```
Windows用户

(1) 下载CUDA 11.7
cuda_11.7.0_516.01_windows.exe
(2) 安装CUDA 11.7

(3) 下载cuDNN 8.4.1

cudnn-windows-x86_64-8.4.1.50_cuda11.6-archive.zip

(4) 安装cuDNN 8.4.1

将cuDNN解压后的cuda文件夹中的bin, include, lib目录下的文件复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\对应目录下

安装GPU版本Paddlepaddle:

windows:

python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

Linux:

python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

安装GPU版本Pytorch:

conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.8 -c pytorch -c nvidia

或者使用

pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118

安装其他依赖:
```
pip install -r requirements.txt
```

4. 运行程序

运行图形化界面

python gui.py

运行命令行版本(CLI)

python ./backend/main.py

常见问题

提取速度慢怎么办

修改backend/config.py中的参数，可以大幅度提高去除速度

MODE = InpaintMode.STTN  # 设置为STTN算法
STTN_SKIP_DETECTION = True # 跳过字幕检测，跳过后可能会导致要去除的字幕遗漏或者误伤不需要去除字幕的视频帧

视频去除效果不好怎么办

修改backend/config.py中的参数，尝试不同的去除算法，算法介绍

InpaintMode.STTN 算法：对于真人视频效果较好，速度快，可以跳过字幕检测

InpaintMode.LAMA 算法：对于图片效果最好，对动画类视频效果好，速度一般，不可以跳过字幕检测

InpaintMode.PROPAINTER 算法：需要消耗大量显存，速度较慢，对运动非常剧烈的视频效果较好

使用STTN算法

MODE = InpaintMode.STTN  # 设置为STTN算法
# 相邻帧数, 调大会增加显存占用，效果变好
STTN_NEIGHBOR_STRIDE = 10
# 参考帧长度, 调大会增加显存占用，效果变好
STTN_REFERENCE_LENGTH = 10
# 设置STTN算法最大同时处理的帧数量，设置越大速度越慢，但效果越好
# 要保证STTN_MAX_LOAD_NUM大于STTN_NEIGHBOR_STRIDE和STTN_REFERENCE_LENGTH
STTN_MAX_LOAD_NUM = 30

使用LAMA算法

MODE = InpaintMode.LAMA  # 设置为STTN算法
LAMA_SUPER_FAST = False  # 保证效果

如果对模型去字幕的效果不满意，可以查看design文件夹里面的训练方法，利用backend/tools/train里面的代码进行训练，然后将训练的模型替换旧模型即可

CondaHTTPError

将项目中的.condarc放在用户目录下(C:/Users/<你的用户名>)，如果用户目录已经存在该文件则覆盖

解决方案：https://zhuanlan.zhihu.com/p/260034241

7z文件解压错误

解决方案：升级7-zip解压程序到最新版本

4090使用cuda 11.7跑不起来

解决方案：改用cuda 11.8

pip install torch==2.1.0 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118

赞助

捐赠者	累计捐赠金额	赞助席位
坤V	400.00 RMB	金牌赞助席位
Jenkit	200.00 RMB	金牌赞助席位
落花未逝	100.00 RMB	金牌赞助席位
麦格	100.00 RMB	金牌赞助席位
无痕	100.00 RMB	金牌赞助席位
wr	100.00 RMB	金牌赞助席位
陈	100.00 RMB	金牌赞助席位
TalkLuv	50.00 RMB	银牌赞助席位
陈凯	50.00 RMB	银牌赞助席位
Tshuang	20.00 RMB	银牌赞助席位
很奇异	15.00 RMB	银牌赞助席位
郭鑫	12.00 RMB	银牌赞助席位
生活不止眼前的苟且	10.00 RMB	铜牌赞助席位
何斐	10.00 RMB	铜牌赞助席位
老猫	8.80 RMB	铜牌赞助席位
伍六七	7.77 RMB	铜牌赞助席位
长缨在手	6.00 RMB	铜牌赞助席位
无忌	6.00 RMB	铜牌赞助席位
Stephen	2.00 RMB	铜牌赞助席位
Leo	1.00 RMB	铜牌赞助席位

For Tasks:

Click tags to check more tools for each tasks

remove subtitles batch remove watermarks custom subtitle removal automatic text removal enhance video quality

For Jobs:

video editor content creator ai engineer software developer data scientist

Alternative AI tools for video-subtitle-remover

Similar Open Source Tools

video-subtitle-remover

github

: 4.0k

build_MiniLLM_from_scratch

This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

github

: 397

web-builder

Web Builder is a low-code front-end framework based on Material for Angular, offering a rich component library for excellent digital innovation experience. It allows rapid construction of modern responsive UI, multi-theme, multi-language web pages through drag-and-drop visual configuration. The framework includes a beautiful admin theme, complete front-end solutions, and AI integration in the Pro version for optimizing copy, creating components, and generating pages with a single sentence.

github

: 381

JiwuChat

JiwuChat is a lightweight multi-platform chat application built on Tauri2 and Nuxt3, with various real-time messaging features, AI group chat bots (such as 'iFlytek Spark', 'KimiAI' etc.), WebRTC audio-video calling, screen sharing, and AI shopping functions. It supports seamless cross-device communication, covering text, images, files, and voice messages, also supporting group chats and customizable settings. It provides light/dark mode for efficient social networking.

github

: 400

agentica

Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.

github

: 108

MEGREZ

MEGREZ is a modern and elegant open-source high-performance computing platform that efficiently manages GPU resources. It allows for easy container instance creation, supports multiple nodes/multiple GPUs, modern UI environment isolation, customizable performance configurations, and user data isolation. The platform also comes with pre-installed deep learning environments, supports multiple users, features a VSCode web version, resource performance monitoring dashboard, and Jupyter Notebook support.

github

: 77

jiwu-mall-chat-tauri

Jiwu Chat Tauri APP is a desktop chat application based on Nuxt3 + Tauri + Element Plus framework. It provides a beautiful user interface with integrated chat and social functions. It also supports AI shopping chat and global dark mode. Users can engage in real-time chat, share updates, and interact with AI customer service through this application.

github

: 151

gpt_server

The GPT Server project leverages the basic capabilities of FastChat to provide the capabilities of an openai server. It perfectly adapts more models, optimizes models with poor compatibility in FastChat, and supports loading vllm, LMDeploy, and hf in various ways. It also supports all sentence_transformers compatible semantic vector models, including Chat templates with function roles, Function Calling (Tools) capability, and multi-modal large models. The project aims to reduce the difficulty of model adaptation and project usage, making it easier to deploy the latest models with minimal code changes.

github

: 163

Llama-Chinese

Llama中文社区是一个专注于Llama模型在中文方面的优化和上层建设的高级技术社区。 **已经基于大规模中文数据，从预训练开始对Llama2模型进行中文能力的持续迭代升级【Done】**。**正在对Llama3模型进行中文能力的持续迭代升级【Doing】** 我们热忱欢迎对大模型LLM充满热情的开发者和研究者加入我们的行列。

github

: 14.5k

HivisionIDPhotos

HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.

github

: 10.3k

wenda

Wenda is a platform for large-scale language model invocation designed to efficiently generate content for specific environments, considering the limitations of personal and small business computing resources, as well as knowledge security and privacy issues. The platform integrates capabilities such as knowledge base integration, multiple large language models for offline deployment, auto scripts for additional functionality, and other practical capabilities like conversation history management and multi-user simultaneous usage.

github

: 6.3k

widgets

Widgets is a desktop component front-end open source component. The project is still being continuously improved. The desktop component client can be downloaded and run in two ways: 1. https://www.microsoft.com/store/productId/9NPR50GQ7T53 2. https://widgetjs.cn After cloning the code, you need to download the dependency in the project directory: `shell pnpm install` and run: `shell pnpm serve`

github

: 228

hcaptcha-challenger

github

: 1.6k

WeClone

WeClone is a tool that fine-tunes large language models using WeChat chat records. It utilizes approximately 20,000 integrated and effective data points, resulting in somewhat satisfactory outcomes that are occasionally humorous. The tool's effectiveness largely depends on the quantity and quality of the chat data provided. It requires a minimum of 16GB of GPU memory for training using the default chatglm3-6b model with LoRA method. Users can also opt for other models and methods supported by LLAMA Factory, which consume less memory. The tool has specific hardware and software requirements, including Python, Torch, Transformers, Datasets, Accelerate, and other optional packages like CUDA and Deepspeed. The tool facilitates environment setup, data preparation, data preprocessing, model downloading, parameter configuration, model fine-tuning, and inference through a browser demo or API service. Additionally, it offers the ability to deploy a WeChat chatbot, although users should be cautious due to the risk of account suspension by WeChat.

github

: 368

MedicalGPT

MedicalGPT is a training medical GPT model with ChatGPT training pipeline, implement of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization).

github

: 3.6k

k8m

k8m is an AI-driven Mini Kubernetes AI Dashboard lightweight console tool designed to simplify cluster management. It is built on AMIS and uses 'kom' as the Kubernetes API client. k8m has built-in Qwen2.5-Coder-7B model interaction capabilities and supports integration with your own private large models. Its key features include miniaturized design for easy deployment, user-friendly interface for intuitive operation, efficient performance with backend in Golang and frontend based on Baidu AMIS, pod file management for browsing, editing, uploading, downloading, and deleting files, pod runtime management for real-time log viewing, log downloading, and executing shell commands within pods, CRD management for automatic discovery and management of CRD resources, and intelligent translation and diagnosis based on ChatGPT for YAML property translation, Describe information interpretation, AI log diagnosis, and command recommendations, providing intelligent support for managing k8s. It is cross-platform compatible with Linux, macOS, and Windows, supporting multiple architectures like x86 and ARM for seamless operation. k8m's design philosophy is 'AI-driven, lightweight and efficient, simplifying complexity,' helping developers and operators quickly get started and easily manage Kubernetes clusters.

github

: 157

For similar tasks

video-subtitle-remover

github

: 4.0k

videogigagan-pytorch

Video GigaGAN - Pytorch is an implementation of Video GigaGAN, a state-of-the-art video upsampling technique developed by Adobe AI labs. The project aims to provide a Pytorch implementation for researchers and developers interested in video super-resolution. The codebase allows users to replicate the results of the original research paper and experiment with video upscaling techniques. The repository includes the necessary code and resources to train and test the GigaGAN model on video datasets. Researchers can leverage this implementation to enhance the visual quality of low-resolution videos and explore advancements in video super-resolution technology.

github

: 62

Video-Super-Resolution-Library

Intel® Library for Video Super Resolution (Intel® Library for VSR) is a project that offers a variety of algorithms, including machine learning and deep learning implementations, to convert low-resolution videos to high resolution. It enhances the RAISR algorithm to provide better visual quality and real-time performance for upscaling on Intel® Xeon® platforms and Intel® GPUs. The project is developed in C++ and utilizes Intel® AVX-512 on Intel® Xeon® Scalable Processor family and OpenCL support on Intel® GPUs. It includes an FFmpeg plugin inside a Docker container for ease of testing and deployment.

github

: 67

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

video-subtitle-remover

README:

项目简介

演示

源码使用说明

1. 下载安装Miniconda

2. 创建并激活虚机环境

3. 安装依赖文件

(1) 下载CUDA 11.7

(2) 安装CUDA 11.7

(3) 下载cuDNN 8.4.1

(4) 安装cuDNN 8.4.1

(1) 下载CUDA 11.7

(2) 安装CUDA 11.7

(3) 下载cuDNN 8.4.1

(4) 安装cuDNN 8.4.1

4. 运行程序

常见问题

赞助

For Tasks:

For Jobs:

Alternative AI tools for video-subtitle-remover

Similar Open Source Tools

video-subtitle-remover

build_MiniLLM_from_scratch

web-builder

JiwuChat

agentica

MEGREZ

jiwu-mall-chat-tauri

gpt_server

Llama-Chinese

HivisionIDPhotos

wenda

widgets

hcaptcha-challenger

WeClone

MedicalGPT

k8m

For similar tasks

video-subtitle-remover

videogigagan-pytorch

Video-Super-Resolution-Library

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape