oneclick-subtitles-generator

oneclick-subtitles-generator

🎬 Auto-subtitle videos with AI transcription, translation, voice cloning & professional rendering

Stars: 133

Visit
 screenshot

A comprehensive web application for auto-subtitling videos and audio, translating SRT files, generating AI narration with voice cloning, creating background images, and rendering professional subtitled videos. Designed for content creators, educators, and general users who need high-quality subtitle generation and video production capabilities.

README:

One-Click Subtitles Generator

Xem bαΊ£n tiαΊΏng Việt tαΊ‘i Δ‘Γ’y.

Screenshots

Click to view screenshots

Here are some screenshots showcasing the application's current features:

Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later Caption later
Caption later Caption later

A comprehensive web application for auto-subtitling videos and audio, translating SRT files, generating AI narration with voice cloning, creating background images, and rendering professional subtitled videos. Designed for content creators, educators, and general users who need high-quality subtitle generation and video production capabilities.

Installation Options Comparison

Choose the right version for your needs:

Feature OSG Lite OSG Full OSG Vercel
AI Subtitle Generation βœ… Gemini AI transcription βœ… Gemini AI transcription βœ… Gemini AI transcription
Video Sources βœ… YouTube, Douyin/TikTok, 1000+ platforms + Upload βœ… YouTube, Douyin/TikTok, 1000+ platforms + Upload Upload only
Subtitle Editor βœ… Visual timeline, waveform, real-time preview βœ… Visual timeline, waveform, real-time preview βœ… Visual timeline, waveform, real-time preview
Translation βœ… Multi-language with context awareness βœ… Multi-language with context awareness βœ… Multi-language with context awareness
Video Rendering βœ… GPU-accelerated with Remotion βœ… GPU-accelerated with Remotion ❌ Not available
Background Generation βœ… Gemini Native/Nano Banana βœ… Gemini Native/Nano Banana βœ… Gemini Native/Nano Banana
Basic TTS βœ… Gemini Live API, Edge TTS, Google TTS βœ… Gemini Live API, Edge TTS, Google TTS ❌ Not available
Voice Cloning ❌ Not included βœ… F5-TTS, Chatterbox ❌ Not available
Project Folder Size ~2-3 GB ~8-12 GB N/A (hosted)
GPU Requirements Any GPU for video rendering GPU accelerated voice cloning (CPU fallback available) None (no rendering)

πŸ’‘ Recommendation:

  • Choose OSG Lite if you need fast subtitle generation and video rendering without voice cloning
  • Choose OSG (Full) if you need advanced voice cloning and narration capabilities

Quick Installation Guide

Installation on Windows

  • Go to Releases and download the latest OSG_installer_Windows.bat.

  • Open the downloaded .bat file and follow the instructions (app size will be large if installing with voice cloning feature)

Installation on macOS and Ubuntu

  • Clone this repo and run the OSG_installer.sh file:

    git clone https://github.com/nganlinh4/oneclick-subtitles-generator.git
    cd oneclick-subtitles-generator
    chmod +x OSG_installer.sh
    ./OSG_installer.sh
  • Follow the on-screen instructions (app size will be large if installing with voice cloning feature)

Update or Run Application

Windows

  • Open OSG_installer_Windows.bat and follow the instructions.

macOS and Ubuntu

  • Open Terminal and run the OSG_installer.sh file again:

    ./OSG_installer.sh
  • Browser will automatically open at http://localhost:3030

Features

🎬 Video & Audio Processing

  • Multi-source support: Upload video/audio files, YouTube URLs, Douyin/TikTok links, or search YouTube by title
  • Format compatibility: Supports MP4, AVI, MOV, WebM, WMV, MP3, WAV, AAC, FLAC, and more
  • Quality scanning: Intelligent video quality detection with cookie-based authentication for premium content
  • Video compatibility checking: Automatic format conversion for Remotion compatibility

πŸ€– AI-Powered Subtitle Generation

  • Google Gemini AI: Uses latest Gemini 2.5 models (Flash, Pro) for accurate transcription
  • Multi-language support: Generate subtitles in multiple languages with high accuracy
  • Parallel processing: Handles long videos (15+ minutes) with intelligent segmentation
  • Custom prompts: Configurable transcription prompts for specialized content
  • Retry mechanisms: Smart retry with different models for failed segments

✏️ Advanced Subtitle Editing

  • Visual timeline editor: Drag-and-drop timing adjustments with waveform visualization
  • Real-time preview: Live subtitle synchronization with video playback
  • Sticky timing: Batch adjust multiple subtitles simultaneously
  • Text editing: Direct text modification with undo/redo functionality
  • Merge & split: Combine adjacent subtitles or split long ones
  • Format support: Export to SRT, JSON, or custom formats

πŸ—£οΈ AI Voice Narration

  • F5-TTS integration: State-of-the-art voice cloning technology
  • Chatterbox TTS: High-quality text-to-speech with voice conversion
  • Edge TTS & Google TTS: Multiple TTS engine options
  • Reference audio: Upload, record, or extract voice samples from videos
  • Multi-audio tracks: Combine original audio with AI-generated narration
  • Volume controls: Independent audio level management

🌍 Translation & Localization

  • Multi-language translation: Translate subtitles to any language while preserving timing
  • Custom formatting: Configurable output formats with brackets, delimiters, and chains
  • Batch processing: Translate multiple subtitle sets simultaneously
  • Context awareness: AI-powered translation with video context understanding

🎨 Background Image Generation

  • AI-powered creation: Generate custom backgrounds using Gemini's image generation
  • Album art integration: Use existing artwork as reference for style consistency
  • Batch generation: Create multiple variations with unique prompts
  • Smart prompting: Automatic prompt generation based on lyrics and content

πŸŽ₯ Professional Video Rendering

  • Remotion integration: GPU-accelerated video rendering with hardware optimization
  • Multi-resolution support: 360p to 8K output with automatic aspect ratio detection
  • Subtitle customization: Extensive styling options including fonts, colors, effects, and animations
  • Multi-audio support: Combine original video audio with AI narration tracks
  • Background integration: Use generated images or video backgrounds
  • Render queue: Batch processing with progress tracking

How to Use

1. Select Your Content Source

  • File Upload: Drag & drop or browse for video/audio files
  • YouTube: Paste URL or search by title with thumbnail preview
  • Douyin/TikTok: Paste URL for automatic extraction
  • Other platforms: Use any supported video URL

2. Generate AI Subtitles

  • Choose your preferred Gemini model (2.5 Flash/Pro recommended)
  • Configure custom prompts for specialized content
  • Click "Generate timed subtitles" and monitor progress
  • Long videos are automatically processed in parallel segments

3. Edit & Refine Subtitles

  • Visual timeline: Drag timing handles with waveform visualization
  • Real-time preview: See changes instantly synchronized with video
  • Text editing: Click to edit subtitle content directly
  • Batch operations: Use sticky timing for multiple subtitle adjustments
  • Advanced tools: Merge, split, insert, or delete subtitle segments

4. Translate Content (Optional)

  • Select target languages for translation
  • Configure output formatting (brackets, delimiters, chains)
  • Use context-aware AI translation with video understanding
  • Preserve original timing while adapting text

5. Generate AI Narration (Optional)

  • Set up reference audio: Upload, record, or extract from video
  • Choose TTS engine: F5-TTS (voice cloning), Chatterbox, Edge TTS, or Google TTS
  • Configure voice settings: Adjust speed, pitch, and style parameters
  • Generate narration: Create AI voice for original or translated subtitles

6. Create Background Images (Optional)

  • Upload album art or reference images
  • Generate AI-powered backgrounds based on content
  • Create multiple variations with unique prompts
  • Use generated images in video rendering

7. Render Professional Videos

  • Open video renderer: Access the integrated Remotion-based renderer
  • Customize subtitles: Extensive styling options (fonts, colors, effects, animations)
  • Configure audio: Balance original video audio with AI narration
  • Set output quality: Choose resolution from 360p to 8K
  • Render with GPU acceleration: Hardware-optimized processing for fast output

8. Export & Download

  • Subtitle files: SRT, JSON, or custom formats
  • Audio files: Generated narration in various formats
  • Background images: AI-generated artwork
  • Rendered videos: Professional subtitled videos with custom styling

Configuration

Access settings via the gear icon in the top-right corner:

  • API Keys: Gemini (required), YouTube (optional for search)
  • AI Models: Choose between Gemini 2.5 Flash, Pro, or experimental models
  • Languages: English, Vietnamese, Korean interface support
  • Video Processing: Segment duration, quality preferences, cookie management
  • TTS Engines: F5-TTS, Chatterbox, Edge TTS, or Google TTS selection
  • Interface: Dark/light themes, time format, waveform visualization
  • Cache Management: Clear caches and monitor storage usage

Technical Stack

  • Frontend: React 18, Material-UI, Styled Components, i18next
  • Video Rendering: Remotion 4 with GPU acceleration (Vulkan/OpenGL)
  • Backend: Node.js/Express, Python Flask, FastAPI
  • AI Integration: Google Gemini API, F5-TTS, Chatterbox TTS
  • Audio/Video: FFmpeg, Web Audio API, yt-dlp, Playwright
  • Performance: React Window virtualization, multi-level caching, hardware acceleration

Performance Features

  • GPU Acceleration: Hardware-accelerated video rendering with Vulkan/OpenGL
  • Virtualized UI: Only renders visible elements for optimal performance with long videos
  • Parallel Processing: Multi-core subtitle generation and video processing
  • Smart Caching: Multi-layer cache system for subtitles, videos, and generated content
  • Optimized Timeline: Hardware-accelerated canvas visualization with adaptive rendering
  • Efficient Memory: Automatic cleanup and smart resource management

Acknowledgements

🎯 Core Technologies

  • React - Modern UI framework with hooks and context
  • Material-UI - Professional design system and components
  • Remotion - Programmatic video creation and rendering
  • Node.js - JavaScript runtime for backend services
  • Express - Web application framework for Node.js

πŸ€– AI & Machine Learning

🎬 Video & Audio Processing

  • FFmpeg - Comprehensive multimedia framework
  • yt-dlp - Universal video downloader for 1000+ platforms
  • Playwright - Browser automation for complex site interactions
  • Puppeteer - Headless Chrome control for web scraping

🎨 UI & Visualization

🌐 Internationalization & Accessibility

  • i18next - Internationalization framework
  • React i18next - React integration for i18next
  • Material Design 3 - Modern design principles and accessibility standards

πŸ”§ Development & Build Tools

πŸ“¦ Package Management & Deployment

  • npm - Package manager for JavaScript
  • uv - Fast Python package installer and resolver
  • Python - Backend services for AI processing

πŸ™ Special Thanks

  • Open source community for maintaining these incredible tools
  • Google DeepMind for advancing AI accessibility
  • Remotion team for revolutionizing programmatic video creation
  • F5-TTS contributors for open-source voice cloning technology
  • All beta testers and contributors who helped improve this application

License

MIT License

Copyright (c) 2024 Subtitles Generator

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for oneclick-subtitles-generator

Similar Open Source Tools

For similar tasks

For similar jobs