Whisper-WebUI

Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Stars: 1109

Visit
 screenshot

Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.

README:

Whisper-WebUI

A Gradio-based browser interface for Whisper. You can use it as an Easy Subtitle Generator!

Whisper WebUI

Notebook

If you wish to try this on Colab, you can do it in here!

Feature

Installation and Running

Prerequisite

To run this WebUI, you need to have git, python version 3.8 ~ 3.10, FFmpeg.
And if you're not using an Nvida GPU, or using a different CUDA version than 12.4, edit the requirements.txt to match your environment.

Please follow the links below to install the necessary software:

After installing FFmpeg, make sure to add the FFmpeg/bin folder to your system PATH!

Automatic Installation

  1. Download Whisper-WebUI.zip with the file corresponding to your OS from v1.0.0 and extract its contents.
  2. Run install.bat or install.sh to install dependencies. (This will create a venv directory and install dependencies there.)
  3. Start WebUI with start-webui.bat or start-webui.sh
  4. To update the WebUI, run update.bat or update.sh

And you can also run the project with command line arguments if you like to, see wiki for a guide to arguments.

  • Running with Docker

  1. Install and launch Docker-Desktop.

  2. Git clone the repository

git clone https://github.com/jhj0517/Whisper-WebUI.git
  1. Build the image ( Image is about 7GB~ )
docker compose build 
  1. Run the container
docker compose up
  1. Connect to the WebUI with your browser at http://localhost:7860

If needed, update the docker-compose.yaml to match your environment.

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed.

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB

If you want to use an implementation other than faster-whisper, use --whisper_type arg and the repository name.
Read wiki for more info about CLI args.

Available models

This is Whisper's original VRAM usage table for models.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

.en models are for English only, and the cool thing is that you can use the Translate to English option from the "large" models!

TODO🗓

  • [x] Add DeepL API translation
  • [x] Add NLLB Model translation
  • [x] Integrate with faster-whisper
  • [x] Integrate with insanely-fast-whisper
  • [x] Integrate with whisperX ( Only speaker diarization part )
  • [x] Add background music separation pre-processing with UVR
  • [ ] Add fast api script
  • [ ] Support real-time transcription for microphone

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Whisper-WebUI

Similar Open Source Tools

For similar tasks

For similar jobs