oneclick-subtitles-generator

🎬 Auto-subtitle videos with AI transcription, translation, voice cloning & professional rendering

Stars: 136

Visit

A comprehensive web application for auto-subtitling videos and audio, translating SRT files, generating AI narration with voice cloning, creating background images, and rendering professional subtitled videos. Designed for content creators, educators, and general users who need high-quality subtitle generation and video production capabilities.

README:

One-Click Subtitles Generator

Xem bản tiếng Việt tại đây.

Screenshots

Click to view screenshots

Here are some screenshots showcasing the application's current features:


Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later	Caption later

Caption later	Caption later

Installation Options Comparison

Choose the right version for your needs:

Feature	OSG Lite	OSG Full	OSG Vercel
AI Subtitle Generation	✅ Gemini AI transcription	✅ Gemini AI transcription	✅ Gemini AI transcription
Video Sources	✅ YouTube, Douyin/TikTok, 1000+ platforms + Upload	✅ YouTube, Douyin/TikTok, 1000+ platforms + Upload	Upload only
Subtitle Editor	✅ Visual timeline, waveform, real-time preview	✅ Visual timeline, waveform, real-time preview	✅ Visual timeline, waveform, real-time preview
Translation	✅ Multi-language with context awareness	✅ Multi-language with context awareness	✅ Multi-language with context awareness
Video Rendering	✅ GPU-accelerated with Remotion	✅ GPU-accelerated with Remotion	❌ Not available
Background Generation	✅ Gemini Native/Nano Banana	✅ Gemini Native/Nano Banana	✅ Gemini Native/Nano Banana
Basic TTS	✅ Gemini Live API, Edge TTS, Google TTS	✅ Gemini Live API, Edge TTS, Google TTS	❌ Not available
Voice Cloning	❌ Not included	✅ F5-TTS, Chatterbox	❌ Not available
Project Folder Size	~2-3 GB	~8-12 GB	N/A (hosted)
GPU Requirements	Any GPU for video rendering	GPU accelerated voice cloning (CPU fallback available)	None (no rendering)

💡 Recommendation:

Choose OSG Lite if you need fast subtitle generation and video rendering without voice cloning
Choose OSG (Full) if you need advanced voice cloning and narration capabilities

Quick Installation Guide

Installation on Windows

Go to Releases and download the latest OSG_installer_Windows.bat.
Open the downloaded .bat file and follow the instructions (app size will be large if installing with voice cloning feature)

Installation on macOS and Ubuntu

Clone this repo and run the OSG_installer.sh file:

git clone https://github.com/nganlinh4/oneclick-subtitles-generator.git
cd oneclick-subtitles-generator
chmod +x OSG_installer.sh
./OSG_installer.sh

Follow the on-screen instructions (app size will be large if installing with voice cloning feature)

Update or Run Application

Windows

Open OSG_installer_Windows.bat and follow the instructions.

macOS and Ubuntu

Open Terminal and run the OSG_installer.sh file again:
```
./OSG_installer.sh
```
Browser will automatically open at http://localhost:3030

Features

🎬 Video & Audio Processing

Multi-source support: Upload video/audio files, YouTube URLs, Douyin/TikTok links, or search YouTube by title
Format compatibility: Supports MP4, AVI, MOV, WebM, WMV, MP3, WAV, AAC, FLAC, and more
Quality scanning: Intelligent video quality detection with cookie-based authentication for premium content
Video compatibility checking: Automatic format conversion for Remotion compatibility

🤖 AI-Powered Subtitle Generation

Google Gemini AI: Uses latest Gemini 2.5 models (Flash, Pro) for accurate transcription
Multi-language support: Generate subtitles in multiple languages with high accuracy
Parallel processing: Handles long videos (15+ minutes) with intelligent segmentation
Custom prompts: Configurable transcription prompts for specialized content
Retry mechanisms: Smart retry with different models for failed segments

✏️ Advanced Subtitle Editing

Visual timeline editor: Drag-and-drop timing adjustments with waveform visualization
Real-time preview: Live subtitle synchronization with video playback
Sticky timing: Batch adjust multiple subtitles simultaneously
Text editing: Direct text modification with undo/redo functionality
Merge & split: Combine adjacent subtitles or split long ones
Format support: Export to SRT, JSON, or custom formats

🗣️ AI Voice Narration

F5-TTS integration: State-of-the-art voice cloning technology
Chatterbox TTS: High-quality text-to-speech with voice conversion
Edge TTS & Google TTS: Multiple TTS engine options
Reference audio: Upload, record, or extract voice samples from videos
Multi-audio tracks: Combine original audio with AI-generated narration
Volume controls: Independent audio level management

🌍 Translation & Localization

Multi-language translation: Translate subtitles to any language while preserving timing
Custom formatting: Configurable output formats with brackets, delimiters, and chains
Batch processing: Translate multiple subtitle sets simultaneously
Context awareness: AI-powered translation with video context understanding

🎨 Background Image Generation

AI-powered creation: Generate custom backgrounds using Gemini's image generation
Album art integration: Use existing artwork as reference for style consistency
Batch generation: Create multiple variations with unique prompts
Smart prompting: Automatic prompt generation based on lyrics and content

🎥 Professional Video Rendering

Remotion integration: GPU-accelerated video rendering with hardware optimization
Multi-resolution support: 360p to 8K output with automatic aspect ratio detection
Subtitle customization: Extensive styling options including fonts, colors, effects, and animations
Multi-audio support: Combine original video audio with AI narration tracks
Background integration: Use generated images or video backgrounds
Render queue: Batch processing with progress tracking

How to Use

1. Select Your Content Source

File Upload: Drag & drop or browse for video/audio files
YouTube: Paste URL or search by title with thumbnail preview
Douyin/TikTok: Paste URL for automatic extraction
Other platforms: Use any supported video URL

2. Generate AI Subtitles

Choose your preferred Gemini model (2.5 Flash/Pro recommended)
Configure custom prompts for specialized content
Click "Generate timed subtitles" and monitor progress
Long videos are automatically processed in parallel segments

3. Edit & Refine Subtitles

Visual timeline: Drag timing handles with waveform visualization
Real-time preview: See changes instantly synchronized with video
Text editing: Click to edit subtitle content directly
Batch operations: Use sticky timing for multiple subtitle adjustments
Advanced tools: Merge, split, insert, or delete subtitle segments

4. Translate Content (Optional)

Select target languages for translation
Configure output formatting (brackets, delimiters, chains)
Use context-aware AI translation with video understanding
Preserve original timing while adapting text

5. Generate AI Narration (Optional)

Set up reference audio: Upload, record, or extract from video
Choose TTS engine: F5-TTS (voice cloning), Chatterbox, Edge TTS, or Google TTS
Configure voice settings: Adjust speed, pitch, and style parameters
Generate narration: Create AI voice for original or translated subtitles

6. Create Background Images (Optional)

Upload album art or reference images
Generate AI-powered backgrounds based on content
Create multiple variations with unique prompts
Use generated images in video rendering

7. Render Professional Videos

Open video renderer: Access the integrated Remotion-based renderer
Customize subtitles: Extensive styling options (fonts, colors, effects, animations)
Configure audio: Balance original video audio with AI narration
Set output quality: Choose resolution from 360p to 8K
Render with GPU acceleration: Hardware-optimized processing for fast output

8. Export & Download

Subtitle files: SRT, JSON, or custom formats
Audio files: Generated narration in various formats
Background images: AI-generated artwork
Rendered videos: Professional subtitled videos with custom styling

Configuration

Access settings via the gear icon in the top-right corner:

API Keys: Gemini (required), YouTube (optional for search)
AI Models: Choose between Gemini 2.5 Flash, Pro, or experimental models
Languages: English, Vietnamese, Korean interface support
Video Processing: Segment duration, quality preferences, cookie management
TTS Engines: F5-TTS, Chatterbox, Edge TTS, or Google TTS selection
Interface: Dark/light themes, time format, waveform visualization
Cache Management: Clear caches and monitor storage usage

Technical Stack

Frontend: React 18, Material-UI, Styled Components, i18next
Video Rendering: Remotion 4 with GPU acceleration (Vulkan/OpenGL)
Backend: Node.js/Express, Python Flask, FastAPI
AI Integration: Google Gemini API, F5-TTS, Chatterbox TTS
Audio/Video: FFmpeg, Web Audio API, yt-dlp, Playwright
Performance: React Window virtualization, multi-level caching, hardware acceleration

Performance Features

GPU Acceleration: Hardware-accelerated video rendering with Vulkan/OpenGL
Virtualized UI: Only renders visible elements for optimal performance with long videos
Parallel Processing: Multi-core subtitle generation and video processing
Smart Caching: Multi-layer cache system for subtitles, videos, and generated content
Optimized Timeline: Hardware-accelerated canvas visualization with adaptive rendering
Efficient Memory: Automatic cleanup and smart resource management

Acknowledgements

🎯 Core Technologies

React - Modern UI framework with hooks and context
Material-UI - Professional design system and components
Remotion - Programmatic video creation and rendering
Node.js - JavaScript runtime for backend services
Express - Web application framework for Node.js

🤖 AI & Machine Learning

Google Gemini AI - Advanced language models for transcription and image generation
F5-TTS - State-of-the-art voice cloning technology
Chatterbox - High-quality TTS and voice conversion
Microsoft Edge TTS - Neural text-to-speech service
Google Text-to-Speech - Cloud-based speech synthesis

🎬 Video & Audio Processing

FFmpeg - Comprehensive multimedia framework
yt-dlp - Universal video downloader for 1000+ platforms
Playwright - Browser automation for complex site interactions
Puppeteer - Headless Chrome control for web scraping

🎨 UI & Visualization

Styled Components - CSS-in-JS styling solution
React Router - Declarative routing for React
React Window - Efficient virtualization for large lists
React Icons - Popular icon libraries for React
HTML5 Canvas - Hardware-accelerated timeline visualization

🌐 Internationalization & Accessibility

i18next - Internationalization framework
React i18next - React integration for i18next
Material Design 3 - Modern design principles and accessibility standards

🔧 Development & Build Tools

TypeScript - Type-safe JavaScript development
Create React App - React application scaffolding
Concurrently - Multi-service development environment
Cross-env - Cross-platform environment variables

📦 Package Management & Deployment

npm - Package manager for JavaScript
uv - Fast Python package installer and resolver
Python - Backend services for AI processing

🙏 Special Thanks

Open source community for maintaining these incredible tools
Google DeepMind for advancing AI accessibility
Remotion team for revolutionizing programmatic video creation
F5-TTS contributors for open-source voice cloning technology
All beta testers and contributors who helped improve this application

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For Tasks:

Click tags to check more tools for each tasks

generate subtitles edit subtitles translate content generate ai narration create background images

For Jobs:

content creator educator video editor translator ai developer

Alternative AI tools for oneclick-subtitles-generator

Similar Open Source Tools

oneclick-subtitles-generator

github

: 136

ALwrity

ALwrity is a lightweight and user-friendly text analysis tool designed for developers and data scientists. It provides various functionalities for analyzing and processing text data, including sentiment analysis, keyword extraction, and text summarization. With ALwrity, users can easily gain insights from their text data and make informed decisions based on the analysis results. The tool is highly customizable and can be integrated into existing workflows seamlessly, making it a valuable asset for anyone working with text data in their projects.

github

: 568

pluely

Pluely is a versatile and user-friendly tool for managing tasks and projects. It provides a simple interface for creating, organizing, and tracking tasks, making it easy to stay on top of your work. With features like task prioritization, due date reminders, and collaboration options, Pluely helps individuals and teams streamline their workflow and boost productivity. Whether you're a student juggling assignments, a professional managing multiple projects, or a team coordinating tasks, Pluely is the perfect solution to keep you organized and efficient.

github

: 687

timeline-studio

Timeline Studio is a next-generation professional video editor with AI integration that automates content creation for social media. It combines the power of desktop applications with the convenience of web interfaces. With 257 AI tools, GPU acceleration, plugin system, multi-language interface, and local processing, Timeline Studio offers complete video production automation. Users can create videos for various social media platforms like TikTok, YouTube, Vimeo, Telegram, and Instagram with optimized versions. The tool saves time, understands trends, provides professional quality, and allows for easy feature extension through plugins. Timeline Studio is open source, transparent, and offers significant time savings and quality improvements for video editing tasks.

github

: 56

AGiXT

AGiXT is a dynamic Artificial Intelligence Automation Platform engineered to orchestrate efficient AI instruction management and task execution across a multitude of providers. Our solution infuses adaptive memory handling with a broad spectrum of commands to enhance AI's understanding and responsiveness, leading to improved task completion. The platform's smart features, like Smart Instruct and Smart Chat, seamlessly integrate web search, planning strategies, and conversation continuity, transforming the interaction between users and AI. By leveraging a powerful plugin system that includes web browsing and command execution, AGiXT stands as a versatile bridge between AI models and users. With an expanding roster of AI providers, code evaluation capabilities, comprehensive chain management, and platform interoperability, AGiXT is consistently evolving to drive a multitude of applications, affirming its place at the forefront of AI technology.

github

: 3.1k

ToolUniverse

ToolUniverse is a collection of 211 biomedical tools designed for Agentic AI, providing access to biomedical knowledge for solving therapeutic reasoning tasks. The tools cover various aspects of drugs and diseases, linked to trusted sources like US FDA-approved drugs since 1939, Open Targets, and Monarch Initiative.

github

: 218

ito

Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.

github

: 208

ToolNeuron

ToolNeuron is a secure, offline AI ecosystem for Android devices that allows users to run private AI models and dynamic plugins fully offline, with hardware-grade encryption ensuring maximum privacy. It enables users to have an offline-first experience, add capabilities without app updates through pluggable tools, and ensures security by design with strict plugin validation and sandboxing.

github

: 58

llm-apps-java-spring-ai

The 'LLM Applications with Java and Spring AI' repository provides samples demonstrating how to build Java applications powered by Generative AI and Large Language Models (LLMs) using Spring AI. It includes projects for question answering, chat completion models, prompts, templates, multimodality, output converters, embedding models, document ETL pipeline, function calling, image models, and audio models. The repository also lists prerequisites such as Java 21, Docker/Podman, Mistral AI API Key, OpenAI API Key, and Ollama. Users can explore various use cases and projects to leverage LLMs for text generation, vector transformation, document processing, and more.

github

: 634

J.A.R.V.I.S.2.0

J.A.R.V.I.S. 2.0 is an AI-powered assistant designed for voice commands, capable of tasks like providing weather reports, summarizing news, sending emails, and more. It features voice activation, speech recognition, AI responses, and handles multiple tasks including email sending, weather reports, news reading, image generation, database functions, phone call automation, AI-based task execution, website & application automation, and knowledge-based interactions. The assistant also includes timeout handling, automatic input processing, and the ability to call multiple functions simultaneously. It requires Python 3.9 or later and specific API keys for weather, news, email, and AI access. The tool integrates Gemini AI for function execution and Ollama as a fallback mechanism. It utilizes a RAG-based knowledge system and ADB integration for phone automation. Future enhancements include deeper mobile integration, advanced AI-driven automation, improved NLP-based command execution, and multi-modal interactions.

github

: 212

aice_ps

Aice PS is a powerful web-based AI photo editor that utilizes Google aistudio's advanced capabilities to make professional image editing and creation simple and intuitive. Users can enhance images, apply creative filters, make professional adjustments, and even generate new images from scratch using simple text prompts. The tool combines various cutting-edge AI capabilities to provide a one-stop creative image and video solution, including AI image generation, intelligent editing, creative filters, professional adjustments, AI inspiration suggestions, intelligent synthesis, texture overlay, one-click cutout, time travel effects, BeatSync for music and image synchronization, NB prompt word library, basic editing toolkit, and more.

github

: 200

kcores-llm-arena

KCORES LLM Arena is a large model evaluation tool that focuses on real-world scenarios, using human scoring and benchmark testing to assess performance. It aims to provide an unbiased evaluation of large models in real-world applications. The tool includes programming ability tests and specific benchmarks like Mandelbrot Set, Mars Mission, Solar System, and Ball Bouncing Inside Spinning Heptagon. It supports various programming languages and emphasizes performance optimization, rendering, animations, physics simulations, and creative implementations.

github

: 344

Awesome-Mind-Network

Awesome Mind Network is a curated collection of open-source resources, SDKs, and tools by Mind Network, empowering developers and researchers with privacy-preserving technologies, Agentic AI, and decentralized infrastructure.

github

: 197

codemod

Codemod platform is a tool that helps developers create, distribute, and run codemods in codebases of any size. The AI-powered, community-led codemods enable automation of framework upgrades, large refactoring, and boilerplate programming with speed and developer experience. It aims to make dream migrations a reality for developers by providing a platform for seamless codemod operations.

github

: 538

AI-on-the-edge-device

AI-on-the-edge-device is a project that enables users to digitize analog water, gas, power, and other meters using an ESP32 board with a supported camera. It integrates Tensorflow Lite for AI processing, offers a small and affordable device with integrated camera and illumination, provides a web interface for administration and control, supports Homeassistant, Influx DB, MQTT, and REST API. The device captures meter images, extracts Regions of Interest (ROIs), runs them through AI for digitization, and allows users to send data to MQTT, InfluxDb, or access it via REST API. The project also includes 3D-printable housing options and tools for logfile management.

github

: 7.5k

LLM-Navigation

LLM-Navigation is a repository dedicated to documenting learning records related to large models, including basic knowledge, prompt engineering, building effective agents, model expansion capabilities, security measures against prompt injection, and applications in various fields such as AI agent control, browser automation, financial analysis, 3D modeling, and tool navigation using MCP servers. The repository aims to organize and collect information for personal learning and self-improvement through AI exploration.

github

: 110

For similar tasks

oneclick-subtitles-generator

github

: 136

TeroSubtitler

Tero Subtitler is an open source, cross-platform, and free subtitle editing software with a user-friendly interface. It offers fully fledged editing with SMPTE and MEDIA modes, support for various subtitle formats, multi-level undo/redo, search and replace, auto-backup, source and transcription modes, translation memory, audiovisual preview, timeline with waveform visualizer, manipulation tools, formatting options, quality control features, translation and transcription capabilities, validation tools, automation for correcting errors, and more. It also includes features like exporting subtitles to MP3, importing/exporting Blu-ray SUP format, generating blank video, generating video with hardcoded subtitles, video dubbing, and more. The tool utilizes powerful multimedia playback engines like mpv, advanced audio/video manipulation tools like FFmpeg, tools for automatic transcription like whisper.cpp/Faster-Whisper, auto-translation API like Google Translate, and ElevenLabs TTS for video dubbing.

github

: 190

subtitler

Subtitles by fframes is a free, local, on-device AI video transcription tool with a user-friendly GUI. It allows users to transcribe video content, edit transcribed cues, style the subtitles, and render them directly onto the video. The tool provides a convenient way to create accurate subtitles for videos without the need for an internet connection.

github

: 92

VideoCaptioner

VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.

github

: 4.9k

subtitle-translator-electron

Subtitle Translator is a tool that utilizes ChatGPT to translate subtitles in various formats such as .ass, .srt, .ssa, and .vtt. It supports multiple languages and provides translations based on context from preceding and following sentences. Users can download the stable version from the Releases page and contribute through pull requests. The tool aims to simplify the process of translating subtitles for different media content.

github

: 1.6k

Chenyme-AAVT

Chenyme-AAVT is a user-friendly tool that provides automatic video and audio recognition and translation. It leverages the capabilities of Whisper, a powerful speech recognition model, to accurately identify speech in videos and audios. The recognized speech is then translated using ChatGPT or KIMI, ensuring high-quality translations. With Chenyme-AAVT, you can quickly generate字幕 files and merge them with the original video, making video translation a breeze. The tool supports various languages, allowing you to translate videos and audios into your desired language. Additionally, Chenyme-AAVT offers features such as VAD (Voice Activity Detection) to enhance recognition accuracy, GPU acceleration for faster processing, and support for multiple字幕 formats. Whether you're a content creator, translator, or anyone looking to make video translation more efficient, Chenyme-AAVT is an invaluable tool.

github

: 1.2k

MoneyPrinterTurbo

MoneyPrinterTurbo is a tool that can automatically generate video content based on a provided theme or keyword. It can create video scripts, materials, subtitles, and background music, and then compile them into a high-definition short video. The tool features a web interface and an API interface, supporting AI-generated video scripts, customizable scripts, multiple HD video sizes, batch video generation, customizable video segment duration, multilingual video scripts, multiple voice synthesis options, subtitle generation with font customization, background music selection, access to high-definition and copyright-free video materials, and integration with various AI models like OpenAI, moonshot, Azure, and more. The tool aims to simplify the video creation process and offers future plans to enhance voice synthesis, add video transition effects, provide more video material sources, offer video length options, include free network proxies, enable real-time voice and music previews, support additional voice synthesis services, and facilitate automatic uploads to YouTube platform.

github

: 25.7k

Whisper-WebUI

Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.

github

: 1.8k

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 11.3k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 186

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529