GMTalker

GMTalker是一个智能数字人系统，集成了语音识别、语音合成、自然语言理解、嘴型动画驱动与 3D 渲染能力，为科研、教育及虚拟人应用开发场景提供强大的技术支持。

Stars: 385

Visit

GMTalker is an interactive digital human rendered by Unreal Engine, developed by the Media Intelligence Team at Bright Laboratory. The system integrates speech recognition, speech synthesis, natural language understanding, and lip-sync animation driving. It supports rapid deployment on Windows with only 2GB of VRAM required. The project showcases two 3D cartoon digital human avatars suitable for presentations, expansions, and commercial integration.

README:

GMTalker

English | 中文

GMTalker, an interactive digital human rendered by Unreal Engine, is developed by the Media Intelligence Team at Bright Laboratory. The system integrates speech recognition, speech synthesis, natural language understanding, and lip-sync animation driving. It supports rapid deployment on Windows and requires only 2GB of VRAM to run the entire project.This project showcases demonstrations of two 3D cartoon digital human avatars , suitable for presentations, expansions, and commercial integration.

🧱 Features

Supports fully offline, real-time streaming conversation services with millisecond-level response
Supports wake-up and interruption during dialogue, and training/cloning of various voice styles
Compatible with integration of large models like Qwen and DeepSeek
Supports connection to local knowledge bases and customization of Agents
Allows customization of characters, lip-sync driving, and facial micro-expressions such as blinking
Fully open-source; free of commercial restrictions except for the character, and supports secondary development
Provides efficient backend configuration services, enabling effortless startup without downloading any additional dependencies

Feature Introduction	Demonstration Video
Interrupt Allows users to interrupt conversations in real time via voice, enhancing interaction flexibility

🔥 NEWS

🗓️ 2025.9.1: Upgrade the DunDun model with a lightweight lip-sync driver and package the complete Unreal Engine project into an executable (exe) for rapid deployment on a laptop with 2GB VRAM.
🗓️ 2025.8.25: Updated UE Import Tutorial, Character Overview and Animation Overview documents: import_tutorial.md | character_overview.md | animation_overview.md
🗓️ 2025.8.19: Released UE5 project files, including the GuangDUNDUN character (jointly developed by Guangming Lab and the Shenzhen Guangming District Government).
🗓️ 2025.8.12: Added WebUI usage guide for quick project deployment.
🗓️ 2025.8.11: Added a detailed deployment guide covering C++ environment, CUDA installation, Unreal Engine installation, and Audio2Face setup.
🗓️ 2025.8.5: Released the backend system of the digital human, supporting both command-line and WebUI startup.
🗓️ 2025.7.22: Added the configuration process for ASR and TTS.
🗓️ 2025.7.15: Announced the open-source release of the 3D interactive emotional digital human, supporting local deployment and UE5 rendering.

💬 Join Our Community

Scan QR code to join GMTalker technical exchange group

Quick Start

(Requires: Backend deployment + GLM3.exe + Essential local AI services to run)

Cloning project

git clone  https://github.com/feima09/GMTalker.git

One click start

webui.bat

Accessing Services

Main service:http://127.0.0.1:5002
Web configuration interface:http://127.0.0.1:7860

👉 Click here to view the WebUI User Guide webui.md

Download UE Executable

Download and launch GLM3.exe from: Project Address

Deploy Essential Local AI Services

Download the FunASR speech recognition lazy package here, then run run_server_2pass.batto start it with one click.
Download the MeloTTS speech recognition lazy package here, then run start.batto start it with one click.

👉 If you need to develop from source code, please click here to view the complete installation guide install.md

🔁 System Module Interaction Diagram

Frontend Presentation (UE5 Client)
Backend Services (AI Digital Human Backend System)
AI Core Service Capabilities (Models + APIs)
Environment Management and Deployment Layer (Conda + Local Execution)

graph TB
    %% Client Layer
    UE5[UE5 Client]
    
    %% Main Service Layer
    subgraph "AI Digital Human Backend System"
        App[Main Application]
        
        %% Core Service Components
        subgraph "Core Services"
            GPT[GPT Service]
            TTS[TTS Service]
            ASR[ASR Service]
            Player[Player Service]
        end
        
        %% Utility Modules
        subgraph "Utility Modules"
            Config[Configuration Management]
            Logger[Log Management]
            Tokenizer[Text Tokenization]
        end
        
        %% Web UI Control Panel
        subgraph "Web UI Control Panel"
            WebUI[webui.py]
            Dashboard[Process Management]
            ConfigUI[Configuration Interface]
        end
    end
    
    %% External Services
    subgraph "External Services"
        OpenAI[OpenAI API<br/>or other LLM]
        FunASR[FunASR<br/>Speech Recognition]
        GPTSOVITS[GPT-SoVITS<br/>TTS Service]
        Audio2Face[Audio2Face<br/>Facial Animation]
    end
    
    %% Connections
    UE5 -.->|Socket.IO<br/>/ue namespace| App
    UE5 -.->|HTTP REST API<br/>/v1/chat/completions| App
    
    App --> GPT
    App --> TTS
    App --> ASR
    App --> Player
    
    GPT -.->|HTTP/HTTPS| OpenAI
    ASR -.->|WebSocket| FunASR
    TTS -.->|HTTP| GPTSOVITS
    Player -.->|gRPC| Audio2Face
    
    App --> Config
    App --> Logger
    App --> Tokenizer
    
    WebUI --> Dashboard
    WebUI --> ConfigUI
    Dashboard -.->|Process Management| App
    
    %% Styling
    classDef clientStyle fill:#e1f5fe
    classDef serviceStyle fill:#f3e5f5
    classDef utilStyle fill:#e8f5e8
    classDef externalStyle fill:#fff3e0
    classDef configStyle fill:#fce4ec
    
    class UE5 clientStyle
    class GPT,TTS,ASR,Player serviceStyle
    class Config,Logger,Tokenizer utilStyle
    class OpenAI,FunASR,GPTSOVITS,Audio2Face externalStyle

📊 Comparison with Other Open-Source Solutions

Project Name	3D Avatar	UE5 Rendering	Voice Input	Voice Interruption	Lip Sync	Body Movements	Local Deployment (Win)	Star ⭐
LiveTalking	❌	❌	❌	❌	✅	❌	❌	6.1k
OpenAvatarChat	✅	❌	✅	❌	✅	❌	❌	1.6k
MNN	✅	❌	✅	❌	✅	✅	❌	12.6k
Fay	❌	✅	✅	✅	✅	✅	✅	11.6k
GMTalker	✅	✅	✅	✅	✅	✅	✅	🚀

✅ indicates full support for the feature, while ❌ indicates it is missing or unsupported.

📦 Quick Start

After configuring the backend, launch the application by downloading the installation package. With FunASR and MeloTTS, it can be started with one click—no additional environment setup or dependencies required.

Hardware Requirements

Operating System: Windows 10/11 (recommended)
Memory: 8GB+ RAM
GPU Support: Minimum 2GB VRAM (NVIDIA GPU with CUDA support recommended)

Main Configuration Files

configs/config.yaml - Main configuration file
configs/gpt/ - GPT model configuration presets
configs/tts/ - TTS service configuration presets
configs/hotword.txt - Hotword configuration for wake-up
configs/prompt.txt - System prompt configuration

API Documentation

REST API

POST `/v1/chat/completions`

Create a new chat session, get AI responses, and play the generated speech.

Request Body:

{
  "messages": [
    {
      "content": "User input text"
    }
  ],
  "stream": true
}

Response:

Format: text/event-stream
Content: AI reply streaming text

Response:

Format: text/event-stream
Content: AI's streaming text reply

GET `/v1/chat/new`

Create a new chat session.

SocketIO API

Connection Address

ws://127.0.0.1:5002/socket.io

namespace: /ue

Event Types

question - Send user question
aniplay - Animation playback control
connect/disconnect - Connection status

Service Components

GPT Service (`services/gpt/`)

OpenAI Compatible: Supports OpenAI API format
Multi-Model: Supports OpenAI, Qwen, etc.
Streaming Response: Real-time text stream generation
RAG Support: Configurable Retrieval-Augmented Generation

TTS Service (`services/tts/`)

MeloTTS: High-quality Chinese speech synthesis
Asynchronous Processing: Handle multiple TTS requests in parallel
Fine-tuning & Inference: Detailed fine-tuning + inference available at MeloTTS
Weight: For project-specific voice weights, contact Contributor

ASR Service (`services/asr/`)

FunASR Integration: Speech recognition based on Alibaba's FunASR
Wake Word Detection: Supports custom wake words
Real-time Recognition: Continuous speech recognition mode

Player Service (`services/player/`)

Local Playback: Local audio playback based on pygame
Lip Sync: Synchronizes speech with facial animation
Audio2Face: Audio2Face requires downloading character models via VPN and has slow initial loading; version 2023.1.1 is recommended.
ovrlipsync: ovrlipsync lightweight lip-sync algorithm with low latency but slightly less effective results.

🖼️ User Interaction Flowchart

flowchart TD
    Start([User Starts System]) --> Launch{Launch Method}
    
    %% Launch Method Branch
    Launch -->|Script Launch| Script[Run app.bat/app.ps1]
    Launch -->|Command Line Launch| CLI[python app.py]
    Launch -->|Web Control Panel| WebUI[Run webui.bat/webui.ps1]
    
    Script --> InitCheck[System Initialization Check]
    CLI --> InitCheck
    WebUI --> Dashboard[Web Control Panel]
    
    %% Web Control Panel Flow
    Dashboard --> ConfigPanel{Configuration Panel}
    ConfigPanel --> SetGPT[Configure GPT Service]
    ConfigPanel --> SetTTS[Configure TTS Service]
    ConfigPanel --> SetASR[Configure ASR Service]
    ConfigPanel --> SetPlayer[Configure Player]
    
    SetGPT --> StartServices[Start Services]
    SetTTS --> StartServices
    SetASR --> StartServices
    SetPlayer --> StartServices
    
    %% System Initialization
    InitCheck --> LoadConfig[Load Configuration File]
    LoadConfig --> InitServices[Initialize Service Components]
    InitServices --> StartServer[Start HTTP/Socket.IO Server]
    StartServices --> StartServer
    
    %% User Interaction Method
    StartServer --> UserInteraction{User Interaction Method}
    
    %% HTTP API Interaction
    UserInteraction -->|HTTP API| HTTPRequest[Send Chat Request<br/>/v1/chat/completions]
    HTTPRequest --> ProcessMessage[Process User Message]
    
    %% Socket.IO Interaction (UE5)
    UserInteraction -->|UE5 Socket.IO| UEConnect[UE5 Client Connects<br/>/ue namespace]
    UEConnect --> WaitQuestion[Wait for User Question]
    
    %% Voice Interaction
    UserInteraction -->|Voice Interaction| VoiceWake[Voice Wake-up Detection]
    VoiceWake --> WakeDetected{Wake Word Detected?}
    WakeDetected -->|Yes| VoiceInput[Voice Input to Text]
    WakeDetected -->|No| VoiceWake
    VoiceInput --> ProcessMessage
    
    %% Message Processing Flow
    ProcessMessage --> GPTProcess[GPT Generates Response]
    GPTProcess --> TextStream[Text Stream Output]
    TextStream --> SentenceSplit[Sentence Splitting]
    
    %% Parallel Processing
    SentenceSplit --> TTSConvert[TTS Text-to-Speech]
    SentenceSplit --> ResponseOutput[Real-time Text Response]
    
    TTSConvert --> AudioQueue[Audio Queue]
    AudioQueue --> PlayAudio[Audio Playback]
    
    %% Playback Method Branch
    PlayAudio --> PlayMode{Playback Mode}
    PlayMode -->|Local Playback| LocalPlay[Local Audio Playback]
    PlayMode -->|Audio2Face| A2FPlay[Send to Audio2Face<br/>Facial Animation Sync]
    
    %% Socket.IO Events
    VoiceInput -.->|question event| UEConnect
    LocalPlay -.->|aniplay event| UEConnect
    A2FPlay -.->|aniplay event| UEConnect
    
    %% End or Continue
    LocalPlay --> WaitNext[Wait for Next Interaction]
    A2FPlay --> WaitNext
    ResponseOutput --> WaitNext
    
    WaitNext --> UserInteraction
    
    %% System Monitoring and Management
    StartServer -.-> Monitor[System Monitoring]
    Monitor --> LogOutput[Log Output<br/>logs/YYYY-MM-DD.txt]
    Monitor --> StatusCheck[Status Check]
    
    %% Error Handling
    ProcessMessage --> ErrorHandle{Process Successful?}
    ErrorHandle -->|No| ErrorLog[Error Logging]
    ErrorLog --> WaitNext
    ErrorHandle -->|Yes| TextStream
    
    %% Style Definitions
    classDef startStyle fill:#c8e6c9
    classDef processStyle fill:#bbdefb
    classDef decisionStyle fill:#ffe0b2
    classDef endStyle fill:#ffcdd2
    classDef externalStyle fill:#f3e5f5
    
    class Start,Launch startStyle
    class ProcessMessage,GPTProcess,TTSConvert,PlayAudio processStyle
    class UserInteraction,PlayMode,WakeDetected,ErrorHandle decisionStyle
    class WaitNext endStyle
    class UEConnect,A2FPlay,HTTPRequest externalStyle

📚 About Guangming Laboratory

The Guangdong Provincial Laboratory of Artificial Intelligence and Digital Economy (Shenzhen) (hereinafter referred to as Guangming Laboratory) is one of the third batch of Guangdong Provincial Laboratories approved for construction by the Guangdong Provincial Government. The laboratory focuses on cutting-edge theories and future technological trends in global artificial intelligence and the digital economy, dedicated to serving major national development strategies and significant needs.

Relying on Shenzhen's industrial, geographical, and policy advantages, Guangming Laboratory brings together global scientific research forces and fully unleashes the agglomeration effect of scientific and technological innovation resources. Centered around the core task of building a domestic AI computing power ecosystem, and driven by the development of multimodal AI technology and its application ecosystem, the laboratory strives to break through key technologies, produce original achievements, and continuously advance technological innovation and industrial empowerment.

The laboratory's goal is to accelerate the supply of diversified applications and full-scenario penetration of artificial intelligence technology, achieving mutual reinforcement of technological innovation and industrial driving forces, and continuously promoting the generation of new quality productivity powered by AI.

🌐 Contact Us (Project Collaboration)

Website: Guangming Laboratory Official Site
Email: [email protected]/[email protected]

Acknowledgements
Thanks to all team members and partners who participated in the development and support of the GMTalker project. (Fei Ma, Hongbo Xu, Yiming Luo, Minghui Li, Haijun Zhu, Chao Song, Yiyao Zhuo)

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to use, modify, and share the code and assets for non-commercial purposes, provided that you give appropriate credit.

🔗 Full License Text
🔍 Human-readable Summary

For Tasks:

Click tags to check more tools for each tasks

create presentations engage in real-time conversations customize characters drive lip-sync animations integrate commercial solutions

For Jobs:

digital media designer ai developer software engineer ux/ui designer content creator

Alternative AI tools for GMTalker

Similar Open Source Tools

GMTalker

github

: 385

roo-code-memory-bank

Roo Code Memory Bank is a tool designed for AI-assisted development to maintain project context across sessions. It provides a structured memory system integrated with VS Code, ensuring deep understanding of the project for the AI assistant. The tool includes key components such as Memory Bank for persistent storage, Mode Rules for behavior configuration, VS Code Integration for seamless development experience, and Real-time Updates for continuous context synchronization. Users can configure custom instructions, initialize the Memory Bank, and organize files within the project root directory. The Memory Bank structure includes files for tracking session state, technical decisions, project overview, progress tracking, and optional project brief and system patterns documentation. Features include persistent context, smart workflows for specialized tasks, knowledge management with structured documentation, and cross-referenced project knowledge. Pro tips include handling multiple projects, utilizing Debug mode for troubleshooting, and managing session updates for synchronization. The tool aims to enhance AI-assisted development by providing a comprehensive solution for maintaining project context and facilitating efficient workflows.

github

: 200

llm4s

LLM4S provides a simple, robust, and scalable framework for building Large Language Models (LLM) applications in Scala. It aims to leverage Scala's type safety, functional programming, JVM ecosystem, concurrency, and performance advantages to create reliable and maintainable AI-powered applications. The framework supports multi-provider integration, execution environments, error handling, Model Context Protocol (MCP) support, agent frameworks, multimodal generation, and Retrieval-Augmented Generation (RAG) workflows. It also offers observability features like detailed trace logging, monitoring, and analytics for debugging and performance insights.

github

: 135

local-deep-research

Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

github

: 3.4k

abi

ABI (Agentic Brain Infrastructure) is a Python-based AI Operating System designed to serve as the core infrastructure for building an Agentic AI Ontology Engine. It empowers organizations to integrate, manage, and scale AI-driven operations with multiple AI models, focusing on ontology, agent-driven workflows, and analytics. ABI emphasizes modularity and customization, providing a customizable framework aligned with international standards and regulatory frameworks. It offers features such as configurable AI agents, ontology management, integrations with external data sources, data processing pipelines, workflow automation, analytics, and data handling capabilities.

github

: 61

MassGen

MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other's progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result. The system operates through an architecture designed for seamless multi-agent collaboration, with key features including cross-model/agent synergy, parallel processing, intelligence sharing, consensus building, and live visualization. Users can install the system, configure API settings, and run MassGen for various tasks such as question answering, creative writing, research, development & coding tasks, and web automation & browser tasks. The roadmap includes plans for advanced agent collaboration, expanded model, tool & agent integration, improved performance & scalability, enhanced developer experience, and a web interface.

github

: 454

open-health

OpenHealth is an AI health assistant that helps users manage their health data by leveraging AI and personal health information. It allows users to consolidate health data, parse it smartly, and engage in contextual conversations with GPT-powered AI. The tool supports various data sources like blood test results, health checkup data, personal physical information, family history, and symptoms. OpenHealth aims to empower users to take control of their health by combining data and intelligence for actionable health management.

github

: 3.2k

netdata

Netdata is an open-source, real-time infrastructure monitoring platform that provides instant insights, zero configuration deployment, ML-powered anomaly detection, efficient monitoring with minimal resource usage, and secure & distributed data storage. It offers real-time, per-second updates and clear insights at a glance. Netdata's origin story involves addressing the limitations of existing monitoring tools and led to a fundamental shift in infrastructure monitoring. It is recognized as the most energy-efficient tool for monitoring Docker-based systems according to a study by the University of Amsterdam.

github

: 76.2k

neuropilot

NeuroPilot is an open-source AI-powered education platform that transforms study materials into interactive learning resources. It provides tools like contextual chat, smart notes, flashcards, quizzes, and AI podcasts. Supported by various AI models and embedding providers, it offers features like WebSocket streaming, JSON or vector database support, file-based storage, and configurable multi-provider setup for LLMs and TTS engines. The technology stack includes Node.js, TypeScript, Vite, React, TailwindCSS, JSON database, multiple LLM providers, and Docker for deployment. Users can contribute to the project by integrating AI models, adding mobile app support, improving performance, enhancing accessibility features, and creating documentation and tutorials.

github

: 108

RookieAI_yolov8

RookieAI_yolov8 is an open-source project designed for developers and users interested in utilizing YOLOv8 models for object detection tasks. The project provides instructions for setting up the required libraries and Pytorch, as well as guidance on using custom or official YOLOv8 models. Users can easily train their own models and integrate them with the software. The tool offers features for packaging the code, managing model files, and organizing the necessary resources for running the software. It also includes updates and optimizations for better performance and functionality, with a focus on FPS game aimbot functionalities. The project aims to provide a comprehensive solution for object detection tasks using YOLOv8 models.

github

: 340

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

astrsk

astrsk is a tool that pushes the boundaries of AI storytelling by offering advanced AI agents, customizable response formatting, and flexible prompt editing for immersive roleplaying experiences. It provides complete AI agent control, a visual flow editor for conversation flows, and ensures 100% local-first data storage. The tool is true cross-platform with support for various AI providers and modern technologies like React, TypeScript, and Tailwind CSS. Coming soon features include cross-device sync, enhanced session customization, and community features.

github

: 106

WeKnora

WeKnora is a document understanding and semantic retrieval framework based on large language models (LLM), designed specifically for scenarios with complex structures and heterogeneous content. The framework adopts a modular architecture, integrating multimodal preprocessing, semantic vector indexing, intelligent recall, and large model generation reasoning to build an efficient and controllable document question-answering process. The core retrieval process is based on the RAG (Retrieval-Augmented Generation) mechanism, combining context-relevant segments with language models to achieve higher-quality semantic answers. It supports various document formats, intelligent inference, flexible extension, efficient retrieval, ease of use, and security and control. Suitable for enterprise knowledge management, scientific literature analysis, product technical support, legal compliance review, and medical knowledge assistance.

github

: 5.8k

kweaver

KWeaver is an open-source cognitive intelligence development framework that provides data scientists, application developers, and domain experts with the ability for rapid development, comprehensive openness, and high-performance knowledge network generation and cognitive intelligence large model framework. It offers features such as automated and visual knowledge graph construction, visualization and analysis of knowledge graph data, knowledge graph integration, knowledge graph resource management, large model prompt engineering and debugging, and visual configuration for large model access.

github

: 60

conduit

Conduit is an open-source, cross-platform mobile application for Open-WebUI, providing a native mobile experience for interacting with your self-hosted AI infrastructure. It supports real-time chat, model selection, conversation management, markdown rendering, theme support, voice input, file uploads, multi-modal support, secure storage, folder management, and tools invocation. Conduit offers multiple authentication flows and follows a clean architecture pattern with Riverpod for state management, Dio for HTTP networking, WebSocket for real-time streaming, and Flutter Secure Storage for credential management.

github

: 429

WebMasterLog

WebMasterLog is a comprehensive repository showcasing various web development projects built with front-end and back-end technologies. It highlights interactive user interfaces, dynamic web applications, and a spectrum of web development solutions. The repository encourages contributions in areas such as adding new projects, improving existing projects, updating documentation, fixing bugs, implementing responsive design, enhancing code readability, and optimizing project functionalities. Contributors are guided to follow specific guidelines for project submissions, including directory naming conventions, README file inclusion, project screenshots, and commit practices. Pull requests are reviewed based on criteria such as proper PR template completion, originality of work, code comments for clarity, and sharing screenshots for frontend updates. The repository also participates in various open-source programs like JWOC, GSSoC, Hacktoberfest, KWOC, 24 Pull Requests, IWOC, SWOC, and DWOC, welcoming valuable contributors.

github

: 111

For similar tasks

ai-to-pptx

Ai-to-pptx is a tool that uses AI technology to automatically generate PPTX, and supports online editing and exporting of PPTX. Main functions: - 1 Use large language models such as ChatGPT to generate outlines - 2 The generated content allows users to modify again - 3 Different templates can be selected when generating PPTX - 4 Support online editing of PPTX text content, style, pictures, etc. - 5 Supports exporting PPTX, PDF, PNG and other formats - 6 Support users to set their own LOGO and related background pictures to create their own exclusive PPTX style - 7 Support users to design their own templates and upload them to the sharing platform for others to use

github

: 576

cannoli

Cannoli allows you to build and run no-code LLM scripts using the Obsidian Canvas editor. Cannolis are scripts that leverage the OpenAI API to read/write to your vault, and take actions using HTTP requests. They can be used to automate tasks, create custom llm-chatbots, and more.

github

: 279

awesome-chatgpt

Awesome ChatGPT is an artificial intelligence chatbot developed by OpenAI. It offers a wide range of applications, web apps, browser extensions, CLI tools, bots, integrations, and packages for various platforms. Users can interact with ChatGPT through different interfaces and use it for tasks like generating text, creating presentations, summarizing content, and more. The ecosystem around ChatGPT includes tools for developers, writers, researchers, and individuals looking to leverage AI technology for different purposes.

github

: 4.9k

Powerpointer-For-Local-LLMs

PowerPointer For Local LLMs is a PowerPoint generator that uses python-pptx and local llm's via the Oobabooga Text Generation WebUI api to create beautiful and informative presentations. It runs locally on your computer, eliminating privacy concerns. The tool allows users to select from 7 designs, make placeholders for images, and easily customize presentations within PowerPoint. Users provide information for the PowerPoint, which is then used to generate text using optimized prompts and the text generation webui api. The generated text is converted into a PowerPoint presentation using the python-pptx library.

github

: 157

aippt

Aippt is a commercial-grade AI tool for generating, parsing, and rendering PowerPoint presentations. It offers functionalities such as AI-powered PPT generation, PPT to JSON conversion, and JSON to PPT rendering. Users can experience online editing, upload PPT files for rendering, and download edited PPT files. The tool also supports commercial partnerships for custom industry solutions, native chart and animation support, user-defined templates, and competitive pricing. Aippt is available for commercial use with options for agency support and private deployment. The official website offers open APIs and an open platform for API/UI integration.

github

: 335

aippt_PresentationGen

A SpringBoot web application that generates PPT files using a llm. The tool preprocesses single-page templates and dynamically combines them to generate PPTX files with text replacement functionality. It utilizes technologies such as SpringBoot, MyBatis, MySQL, Redis, WebFlux, Apache POI, Aspose Slides, OSS, and Vue2. Users can deploy the tool by configuring various parameters in the application.yml file and setting up necessary resources like MySQL, OSS, and API keys. The tool also supports integration with open-source image libraries like Unsplash for adding images to the presentations.

github

: 131

PPTAgent

PPTAgent is an innovative system that automatically generates presentations from documents. It employs a two-step process for quality assurance and introduces PPTEval for comprehensive evaluation. With dynamic content generation, smart reference learning, and quality assessment, PPTAgent aims to streamline presentation creation. The tool follows an analysis phase to learn from reference presentations and a generation phase to develop structured outlines and cohesive slides. PPTEval evaluates presentations based on content accuracy, visual appeal, and logical coherence.

github

: 2.0k

Sentient

Sentient is a personal, private, and interactive AI companion developed by Existence. The project aims to build a completely private AI companion that is deeply personalized and context-aware of the user. It utilizes automation and privacy to create a true companion for humans. The tool is designed to remember information about the user and use it to respond to queries and perform various actions. Sentient features a local and private environment, MBTI personality test, integrations with LinkedIn, Reddit, and more, self-managed graph memory, web search capabilities, multi-chat functionality, and auto-updates for the app. The project is built using technologies like ElectronJS, Next.js, TailwindCSS, FastAPI, Neo4j, and various APIs.

github

: 52

For similar jobs

promptflow

**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

github

: 9.2k

deepeval

DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

github

: 11.3k

MegaDetector

MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

github

: 186

leapfrogai

LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

github

: 255

llava-docker

This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

github

: 59

carrot

The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

github

: 17.1k

TrustLLM

TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

github

: 535

AI-YinMei

AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.

github

: 529

GMTalker

README:

GMTalker

🧱 Features

🔥 NEWS

💬 Join Our Community

Quick Start

🔁 System Module Interaction Diagram

📊 Comparison with Other Open-Source Solutions

📦 Quick Start

After configuring the backend, launch the application by downloading the installation package. With FunASR and MeloTTS, it can be started with one click—no additional environment setup or dependencies required.

​​Hardware Requirements​

Main Configuration Files

API Documentation

REST API

POST /v1/chat/completions

GET /v1/chat/new

SocketIO API

Connection Address

Event Types

Service Components

GPT Service (services/gpt/)

TTS Service (services/tts/)

ASR Service (services/asr/)

Player Service (services/player/)

🖼️ User Interaction Flowchart

📚 About Guangming Laboratory

🌐 Contact Us (Project Collaboration)

License

For Tasks:

For Jobs:

Alternative AI tools for GMTalker

Similar Open Source Tools

GMTalker

roo-code-memory-bank

llm4s

local-deep-research

abi

MassGen

open-health

netdata

neuropilot

RookieAI_yolov8

Starmoon

astrsk

WeKnora

kweaver

conduit

WebMasterLog

For similar tasks

ai-to-pptx

cannoli

awesome-chatgpt

Powerpointer-For-Local-LLMs

aippt

aippt_PresentationGen

PPTAgent

Sentient

For similar jobs

promptflow

deepeval

MegaDetector

leapfrogai

llava-docker

carrot

TrustLLM

AI-YinMei

Hardware Requirements

POST `/v1/chat/completions`

GET `/v1/chat/new`

GPT Service (`services/gpt/`)

TTS Service (`services/tts/`)

ASR Service (`services/asr/`)

Player Service (`services/player/`)