
GMTalker
GMTalker是一个智能数字人系统 ,集成了语音识别、语音合成、自然语言理解、嘴型动画驱动与 3D 渲染能力,为科研、教育及虚拟人应用开发场景提供强大的技术支持。
Stars: 385

GMTalker is an interactive digital human rendered by Unreal Engine, developed by the Media Intelligence Team at Bright Laboratory. The system integrates speech recognition, speech synthesis, natural language understanding, and lip-sync animation driving. It supports rapid deployment on Windows with only 2GB of VRAM required. The project showcases two 3D cartoon digital human avatars suitable for presentations, expansions, and commercial integration.
README:
English | 中文
GMTalker, an interactive digital human rendered by Unreal Engine, is developed by the Media Intelligence Team at Bright Laboratory. The system integrates speech recognition, speech synthesis, natural language understanding, and lip-sync animation driving. It supports rapid deployment on Windows and requires only 2GB of VRAM to run the entire project.This project showcases demonstrations of two 3D cartoon digital human avatars , suitable for presentations, expansions, and commercial integration.
- Supports fully offline, real-time streaming conversation services with millisecond-level response
- Supports wake-up and interruption during dialogue, and training/cloning of various voice styles
- Compatible with integration of large models like Qwen and DeepSeek
- Supports connection to local knowledge bases and customization of Agents
- Allows customization of characters, lip-sync driving, and facial micro-expressions such as blinking
- Fully open-source; free of commercial restrictions except for the character, and supports secondary development
- Provides efficient backend configuration services, enabling effortless startup without downloading any additional dependencies
![]() |
![]() |
Feature Introduction | Demonstration Video |
---|---|
Interrupt Allows users to interrupt conversations in real time via voice, enhancing interaction flexibility |
- 🗓️ 2025.9.1: Upgrade the DunDun model with a lightweight lip-sync driver and package the complete Unreal Engine project into an executable (exe) for rapid deployment on a laptop with 2GB VRAM.
- 🗓️ 2025.8.25: Updated UE Import Tutorial, Character Overview and Animation Overview documents: import_tutorial.md | character_overview.md | animation_overview.md
- 🗓️ 2025.8.19: Released UE5 project files, including the GuangDUNDUN character (jointly developed by Guangming Lab and the Shenzhen Guangming District Government).
- 🗓️ 2025.8.12: Added WebUI usage guide for quick project deployment.
- 🗓️ 2025.8.11: Added a detailed deployment guide covering C++ environment, CUDA installation, Unreal Engine installation, and Audio2Face setup.
- 🗓️ 2025.8.5: Released the backend system of the digital human, supporting both command-line and WebUI startup.
- 🗓️ 2025.7.22: Added the configuration process for ASR and TTS.
- 🗓️ 2025.7.15: Announced the open-source release of the 3D interactive emotional digital human, supporting local deployment and UE5 rendering.
Scan QR code to join GMTalker technical exchange group
- (Requires: Backend deployment + GLM3.exe + Essential local AI services to run)
- Cloning project
git clone https://github.com/feima09/GMTalker.git
- One click start
webui.bat
- Accessing Services
- Main service:
http://127.0.0.1:5002
- Web configuration interface:
http://127.0.0.1:7860
👉 Click here to view the WebUI User Guide webui.md
- Download UE Executable
- Download and launch GLM3.exe from: Project Address
- Deploy Essential Local AI Services
- Download the FunASR speech recognition lazy package here, then run run_server_2pass.batto start it with one click.
- Download the MeloTTS speech recognition lazy package here, then run start.batto start it with one click.
- Frontend Presentation (UE5 Client)
- Backend Services (AI Digital Human Backend System)
- AI Core Service Capabilities (Models + APIs)
- Environment Management and Deployment Layer (Conda + Local Execution)
graph TB
%% Client Layer
UE5[UE5 Client]
%% Main Service Layer
subgraph "AI Digital Human Backend System"
App[Main Application]
%% Core Service Components
subgraph "Core Services"
GPT[GPT Service]
TTS[TTS Service]
ASR[ASR Service]
Player[Player Service]
end
%% Utility Modules
subgraph "Utility Modules"
Config[Configuration Management]
Logger[Log Management]
Tokenizer[Text Tokenization]
end
%% Web UI Control Panel
subgraph "Web UI Control Panel"
WebUI[webui.py]
Dashboard[Process Management]
ConfigUI[Configuration Interface]
end
end
%% External Services
subgraph "External Services"
OpenAI[OpenAI API<br/>or other LLM]
FunASR[FunASR<br/>Speech Recognition]
GPTSOVITS[GPT-SoVITS<br/>TTS Service]
Audio2Face[Audio2Face<br/>Facial Animation]
end
%% Connections
UE5 -.->|Socket.IO<br/>/ue namespace| App
UE5 -.->|HTTP REST API<br/>/v1/chat/completions| App
App --> GPT
App --> TTS
App --> ASR
App --> Player
GPT -.->|HTTP/HTTPS| OpenAI
ASR -.->|WebSocket| FunASR
TTS -.->|HTTP| GPTSOVITS
Player -.->|gRPC| Audio2Face
App --> Config
App --> Logger
App --> Tokenizer
WebUI --> Dashboard
WebUI --> ConfigUI
Dashboard -.->|Process Management| App
%% Styling
classDef clientStyle fill:#e1f5fe
classDef serviceStyle fill:#f3e5f5
classDef utilStyle fill:#e8f5e8
classDef externalStyle fill:#fff3e0
classDef configStyle fill:#fce4ec
class UE5 clientStyle
class GPT,TTS,ASR,Player serviceStyle
class Config,Logger,Tokenizer utilStyle
class OpenAI,FunASR,GPTSOVITS,Audio2Face externalStyle
Project Name | 3D Avatar | UE5 Rendering | Voice Input | Voice Interruption | Lip Sync | Body Movements | Local Deployment (Win) | Star ⭐ |
---|---|---|---|---|---|---|---|---|
LiveTalking | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | 6.1k |
OpenAvatarChat | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | 1.6k |
MNN | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ❌ | 12.6k |
Fay | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 11.6k |
GMTalker | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | 🚀 |
✅ indicates full support for the feature, while ❌ indicates it is missing or unsupported.
After configuring the backend, launch the application by downloading the installation package. With FunASR and MeloTTS, it can be started with one click—no additional environment setup or dependencies required.
- Operating System: Windows 10/11 (recommended)
- Memory: 8GB+ RAM
- GPU Support: Minimum 2GB VRAM (NVIDIA GPU with CUDA support recommended)
-
configs/config.yaml
- Main configuration file -
configs/gpt/
- GPT model configuration presets -
configs/tts/
- TTS service configuration presets -
configs/hotword.txt
- Hotword configuration for wake-up -
configs/prompt.txt
- System prompt configuration
Create a new chat session, get AI responses, and play the generated speech.
Request Body:
{
"messages": [
{
"content": "User input text"
}
],
"stream": true
}
Response:
- Format:
text/event-stream
- Content: AI reply streaming text
Response:
- Format:
text/event-stream
- Content: AI's streaming text reply
Create a new chat session.
ws://127.0.0.1:5002/socket.io
namespace: /ue
-
question
- Send user question -
aniplay
- Animation playback control -
connect/disconnect
- Connection status
- OpenAI Compatible: Supports OpenAI API format
- Multi-Model: Supports OpenAI, Qwen, etc.
- Streaming Response: Real-time text stream generation
- RAG Support: Configurable Retrieval-Augmented Generation
- MeloTTS: High-quality Chinese speech synthesis
- Asynchronous Processing: Handle multiple TTS requests in parallel
- Fine-tuning & Inference: Detailed fine-tuning + inference available at MeloTTS
- Weight: For project-specific voice weights, contact Contributor
- FunASR Integration: Speech recognition based on Alibaba's FunASR
- Wake Word Detection: Supports custom wake words
- Real-time Recognition: Continuous speech recognition mode
- Local Playback: Local audio playback based on pygame
- Lip Sync: Synchronizes speech with facial animation
- Audio2Face: Audio2Face requires downloading character models via VPN and has slow initial loading; version 2023.1.1 is recommended.
- ovrlipsync: ovrlipsync lightweight lip-sync algorithm with low latency but slightly less effective results.
flowchart TD
Start([User Starts System]) --> Launch{Launch Method}
%% Launch Method Branch
Launch -->|Script Launch| Script[Run app.bat/app.ps1]
Launch -->|Command Line Launch| CLI[python app.py]
Launch -->|Web Control Panel| WebUI[Run webui.bat/webui.ps1]
Script --> InitCheck[System Initialization Check]
CLI --> InitCheck
WebUI --> Dashboard[Web Control Panel]
%% Web Control Panel Flow
Dashboard --> ConfigPanel{Configuration Panel}
ConfigPanel --> SetGPT[Configure GPT Service]
ConfigPanel --> SetTTS[Configure TTS Service]
ConfigPanel --> SetASR[Configure ASR Service]
ConfigPanel --> SetPlayer[Configure Player]
SetGPT --> StartServices[Start Services]
SetTTS --> StartServices
SetASR --> StartServices
SetPlayer --> StartServices
%% System Initialization
InitCheck --> LoadConfig[Load Configuration File]
LoadConfig --> InitServices[Initialize Service Components]
InitServices --> StartServer[Start HTTP/Socket.IO Server]
StartServices --> StartServer
%% User Interaction Method
StartServer --> UserInteraction{User Interaction Method}
%% HTTP API Interaction
UserInteraction -->|HTTP API| HTTPRequest[Send Chat Request<br/>/v1/chat/completions]
HTTPRequest --> ProcessMessage[Process User Message]
%% Socket.IO Interaction (UE5)
UserInteraction -->|UE5 Socket.IO| UEConnect[UE5 Client Connects<br/>/ue namespace]
UEConnect --> WaitQuestion[Wait for User Question]
%% Voice Interaction
UserInteraction -->|Voice Interaction| VoiceWake[Voice Wake-up Detection]
VoiceWake --> WakeDetected{Wake Word Detected?}
WakeDetected -->|Yes| VoiceInput[Voice Input to Text]
WakeDetected -->|No| VoiceWake
VoiceInput --> ProcessMessage
%% Message Processing Flow
ProcessMessage --> GPTProcess[GPT Generates Response]
GPTProcess --> TextStream[Text Stream Output]
TextStream --> SentenceSplit[Sentence Splitting]
%% Parallel Processing
SentenceSplit --> TTSConvert[TTS Text-to-Speech]
SentenceSplit --> ResponseOutput[Real-time Text Response]
TTSConvert --> AudioQueue[Audio Queue]
AudioQueue --> PlayAudio[Audio Playback]
%% Playback Method Branch
PlayAudio --> PlayMode{Playback Mode}
PlayMode -->|Local Playback| LocalPlay[Local Audio Playback]
PlayMode -->|Audio2Face| A2FPlay[Send to Audio2Face<br/>Facial Animation Sync]
%% Socket.IO Events
VoiceInput -.->|question event| UEConnect
LocalPlay -.->|aniplay event| UEConnect
A2FPlay -.->|aniplay event| UEConnect
%% End or Continue
LocalPlay --> WaitNext[Wait for Next Interaction]
A2FPlay --> WaitNext
ResponseOutput --> WaitNext
WaitNext --> UserInteraction
%% System Monitoring and Management
StartServer -.-> Monitor[System Monitoring]
Monitor --> LogOutput[Log Output<br/>logs/YYYY-MM-DD.txt]
Monitor --> StatusCheck[Status Check]
%% Error Handling
ProcessMessage --> ErrorHandle{Process Successful?}
ErrorHandle -->|No| ErrorLog[Error Logging]
ErrorLog --> WaitNext
ErrorHandle -->|Yes| TextStream
%% Style Definitions
classDef startStyle fill:#c8e6c9
classDef processStyle fill:#bbdefb
classDef decisionStyle fill:#ffe0b2
classDef endStyle fill:#ffcdd2
classDef externalStyle fill:#f3e5f5
class Start,Launch startStyle
class ProcessMessage,GPTProcess,TTSConvert,PlayAudio processStyle
class UserInteraction,PlayMode,WakeDetected,ErrorHandle decisionStyle
class WaitNext endStyle
class UEConnect,A2FPlay,HTTPRequest externalStyle
The Guangdong Provincial Laboratory of Artificial Intelligence and Digital Economy (Shenzhen) (hereinafter referred to as Guangming Laboratory) is one of the third batch of Guangdong Provincial Laboratories approved for construction by the Guangdong Provincial Government. The laboratory focuses on cutting-edge theories and future technological trends in global artificial intelligence and the digital economy, dedicated to serving major national development strategies and significant needs.
Relying on Shenzhen's industrial, geographical, and policy advantages, Guangming Laboratory brings together global scientific research forces and fully unleashes the agglomeration effect of scientific and technological innovation resources. Centered around the core task of building a domestic AI computing power ecosystem, and driven by the development of multimodal AI technology and its application ecosystem, the laboratory strives to break through key technologies, produce original achievements, and continuously advance technological innovation and industrial empowerment.
The laboratory's goal is to accelerate the supply of diversified applications and full-scenario penetration of artificial intelligence technology, achieving mutual reinforcement of technological innovation and industrial driving forces, and continuously promoting the generation of new quality productivity powered by AI.
- Website: Guangming Laboratory Official Site
- Email: [email protected]/[email protected]
Acknowledgements
Thanks to all team members and partners who participated in the development and support of the GMTalker project. (Fei Ma, Hongbo Xu, Yiming Luo, Minghui Li, Haijun Zhu, Chao Song, Yiyao Zhuo)
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
You are free to use, modify, and share the code and assets for non-commercial purposes, provided that you give appropriate credit.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for GMTalker
Similar Open Source Tools

GMTalker
GMTalker is an interactive digital human rendered by Unreal Engine, developed by the Media Intelligence Team at Bright Laboratory. The system integrates speech recognition, speech synthesis, natural language understanding, and lip-sync animation driving. It supports rapid deployment on Windows with only 2GB of VRAM required. The project showcases two 3D cartoon digital human avatars suitable for presentations, expansions, and commercial integration.

roo-code-memory-bank
Roo Code Memory Bank is a tool designed for AI-assisted development to maintain project context across sessions. It provides a structured memory system integrated with VS Code, ensuring deep understanding of the project for the AI assistant. The tool includes key components such as Memory Bank for persistent storage, Mode Rules for behavior configuration, VS Code Integration for seamless development experience, and Real-time Updates for continuous context synchronization. Users can configure custom instructions, initialize the Memory Bank, and organize files within the project root directory. The Memory Bank structure includes files for tracking session state, technical decisions, project overview, progress tracking, and optional project brief and system patterns documentation. Features include persistent context, smart workflows for specialized tasks, knowledge management with structured documentation, and cross-referenced project knowledge. Pro tips include handling multiple projects, utilizing Debug mode for troubleshooting, and managing session updates for synchronization. The tool aims to enhance AI-assisted development by providing a comprehensive solution for maintaining project context and facilitating efficient workflows.

open-health
OpenHealth is an AI health assistant that helps users manage their health data by leveraging AI and personal health information. It allows users to consolidate health data, parse it smartly, and engage in contextual conversations with GPT-powered AI. The tool supports various data sources like blood test results, health checkup data, personal physical information, family history, and symptoms. OpenHealth aims to empower users to take control of their health by combining data and intelligence for actionable health management.

RookieAI_yolov8
RookieAI_yolov8 is an open-source project designed for developers and users interested in utilizing YOLOv8 models for object detection tasks. The project provides instructions for setting up the required libraries and Pytorch, as well as guidance on using custom or official YOLOv8 models. Users can easily train their own models and integrate them with the software. The tool offers features for packaging the code, managing model files, and organizing the necessary resources for running the software. It also includes updates and optimizations for better performance and functionality, with a focus on FPS game aimbot functionalities. The project aims to provide a comprehensive solution for object detection tasks using YOLOv8 models.

Starmoon
Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

cia
CIA is a powerful open-source tool designed for data analysis and visualization. It provides a user-friendly interface for processing large datasets and generating insightful reports. With CIA, users can easily explore data, perform statistical analysis, and create interactive visualizations to communicate findings effectively. Whether you are a data scientist, analyst, or researcher, CIA offers a comprehensive set of features to streamline your data analysis workflow and uncover valuable insights.

superagentx
SuperAgentX is a lightweight open-source AI framework designed for multi-agent applications with Artificial General Intelligence (AGI) capabilities. It offers goal-oriented multi-agents with retry mechanisms, easy deployment through WebSocket, RESTful API, and IO console interfaces, streamlined architecture with no major dependencies, contextual memory using SQL + Vector databases, flexible LLM configuration supporting various Gen AI models, and extendable handlers for integration with diverse APIs and data sources. It aims to accelerate the development of AGI by providing a powerful platform for building autonomous AI agents capable of executing complex tasks with minimal human intervention.

paelladoc
PAELLADOC is an intelligent documentation system that uses AI to analyze code repositories and generate comprehensive technical documentation. It offers a modular architecture with MECE principles, interactive documentation process, key features like Orchestrator and Commands, and a focus on context for successful AI programming. The tool aims to streamline documentation creation, code generation, and product management tasks for software development teams, providing a definitive standard for AI-assisted development documentation.

kweaver
KWeaver is an open-source cognitive intelligence development framework that provides data scientists, application developers, and domain experts with the ability for rapid development, comprehensive openness, and high-performance knowledge network generation and cognitive intelligence large model framework. It offers features such as automated and visual knowledge graph construction, visualization and analysis of knowledge graph data, knowledge graph integration, knowledge graph resource management, large model prompt engineering and debugging, and visual configuration for large model access.

databend
Databend is an open-source cloud data warehouse built in Rust, offering fast query execution and data ingestion for complex analysis of large datasets. It integrates with major cloud platforms, provides high performance with AI-powered analytics, supports multiple data formats, ensures data integrity with ACID transactions, offers flexible indexing options, and features community-driven development. Users can try Databend through a serverless cloud or Docker installation, and perform tasks such as data import/export, querying semi-structured data, managing users/databases/tables, and utilizing AI functions.

eko
Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.

WebMasterLog
WebMasterLog is a comprehensive repository showcasing various web development projects built with front-end and back-end technologies. It highlights interactive user interfaces, dynamic web applications, and a spectrum of web development solutions. The repository encourages contributions in areas such as adding new projects, improving existing projects, updating documentation, fixing bugs, implementing responsive design, enhancing code readability, and optimizing project functionalities. Contributors are guided to follow specific guidelines for project submissions, including directory naming conventions, README file inclusion, project screenshots, and commit practices. Pull requests are reviewed based on criteria such as proper PR template completion, originality of work, code comments for clarity, and sharing screenshots for frontend updates. The repository also participates in various open-source programs like JWOC, GSSoC, Hacktoberfest, KWOC, 24 Pull Requests, IWOC, SWOC, and DWOC, welcoming valuable contributors.

LLM-on-Tabular-Data-Prediction-Table-Understanding-Data-Generation
This repository serves as a comprehensive survey on the application of Large Language Models (LLMs) on tabular data, focusing on tasks such as prediction, data generation, and table understanding. It aims to consolidate recent progress in this field by summarizing key techniques, metrics, datasets, models, and optimization approaches. The survey identifies strengths, limitations, unexplored territories, and gaps in the existing literature, providing insights for future research directions. It also offers code and dataset references to empower readers with the necessary tools and knowledge to address challenges in this rapidly evolving domain.

mem0
Mem0 is a tool that provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications. It offers persistent memory for users, sessions, and agents, self-improving personalization, a simple API for easy integration, and cross-platform consistency. Users can store memories, retrieve memories, search for related memories, update memories, get the history of a memory, and delete memories using Mem0. It is designed to enhance AI experiences by enabling long-term memory storage and retrieval.

lyraios
LYRAIOS (LLM-based Your Reliable AI Operating System) is an advanced AI assistant platform built with FastAPI and Streamlit, designed to serve as an operating system for AI applications. It offers core features such as AI process management, memory system, and I/O system. The platform includes built-in tools like Calculator, Web Search, Financial Analysis, File Management, and Research Tools. It also provides specialized assistant teams for Python and research tasks. LYRAIOS is built on a technical architecture comprising FastAPI backend, Streamlit frontend, Vector Database, PostgreSQL storage, and Docker support. It offers features like knowledge management, process control, and security & access control. The roadmap includes enhancements in core platform, AI process management, memory system, tools & integrations, security & access control, open protocol architecture, multi-agent collaboration, and cross-platform support.
For similar tasks

ai-to-pptx
Ai-to-pptx is a tool that uses AI technology to automatically generate PPTX, and supports online editing and exporting of PPTX. Main functions: - 1 Use large language models such as ChatGPT to generate outlines - 2 The generated content allows users to modify again - 3 Different templates can be selected when generating PPTX - 4 Support online editing of PPTX text content, style, pictures, etc. - 5 Supports exporting PPTX, PDF, PNG and other formats - 6 Support users to set their own LOGO and related background pictures to create their own exclusive PPTX style - 7 Support users to design their own templates and upload them to the sharing platform for others to use

cannoli
Cannoli allows you to build and run no-code LLM scripts using the Obsidian Canvas editor. Cannolis are scripts that leverage the OpenAI API to read/write to your vault, and take actions using HTTP requests. They can be used to automate tasks, create custom llm-chatbots, and more.

awesome-chatgpt
Awesome ChatGPT is an artificial intelligence chatbot developed by OpenAI. It offers a wide range of applications, web apps, browser extensions, CLI tools, bots, integrations, and packages for various platforms. Users can interact with ChatGPT through different interfaces and use it for tasks like generating text, creating presentations, summarizing content, and more. The ecosystem around ChatGPT includes tools for developers, writers, researchers, and individuals looking to leverage AI technology for different purposes.

Powerpointer-For-Local-LLMs
PowerPointer For Local LLMs is a PowerPoint generator that uses python-pptx and local llm's via the Oobabooga Text Generation WebUI api to create beautiful and informative presentations. It runs locally on your computer, eliminating privacy concerns. The tool allows users to select from 7 designs, make placeholders for images, and easily customize presentations within PowerPoint. Users provide information for the PowerPoint, which is then used to generate text using optimized prompts and the text generation webui api. The generated text is converted into a PowerPoint presentation using the python-pptx library.

aippt
Aippt is a commercial-grade AI tool for generating, parsing, and rendering PowerPoint presentations. It offers functionalities such as AI-powered PPT generation, PPT to JSON conversion, and JSON to PPT rendering. Users can experience online editing, upload PPT files for rendering, and download edited PPT files. The tool also supports commercial partnerships for custom industry solutions, native chart and animation support, user-defined templates, and competitive pricing. Aippt is available for commercial use with options for agency support and private deployment. The official website offers open APIs and an open platform for API/UI integration.

aippt_PresentationGen
A SpringBoot web application that generates PPT files using a llm. The tool preprocesses single-page templates and dynamically combines them to generate PPTX files with text replacement functionality. It utilizes technologies such as SpringBoot, MyBatis, MySQL, Redis, WebFlux, Apache POI, Aspose Slides, OSS, and Vue2. Users can deploy the tool by configuring various parameters in the application.yml file and setting up necessary resources like MySQL, OSS, and API keys. The tool also supports integration with open-source image libraries like Unsplash for adding images to the presentations.

PPTAgent
PPTAgent is an innovative system that automatically generates presentations from documents. It employs a two-step process for quality assurance and introduces PPTEval for comprehensive evaluation. With dynamic content generation, smart reference learning, and quality assessment, PPTAgent aims to streamline presentation creation. The tool follows an analysis phase to learn from reference presentations and a generation phase to develop structured outlines and cohesive slides. PPTEval evaluates presentations based on content accuracy, visual appeal, and logical coherence.

Sentient
Sentient is a personal, private, and interactive AI companion developed by Existence. The project aims to build a completely private AI companion that is deeply personalized and context-aware of the user. It utilizes automation and privacy to create a true companion for humans. The tool is designed to remember information about the user and use it to respond to queries and perform various actions. Sentient features a local and private environment, MBTI personality test, integrations with LinkedIn, Reddit, and more, self-managed graph memory, web search capabilities, multi-chat functionality, and auto-updates for the app. The project is built using technologies like ElectronJS, Next.js, TailwindCSS, FastAPI, Neo4j, and various APIs.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.