z.ai2api_python
支持 Z.AI、K2Think、LongCat 等多个 AI 提供商的高性能代理服务
Stars: 210
Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.
README:
基于 FastAPI 的高性能 OpenAI API 兼容代理服务,采用多提供商架构设计,支持 GLM-4.5 系列、K2Think、LongCat 等多种 AI 模型的完整功能。
- 🔌 完全兼容 OpenAI API - 无缝集成现有应用
- 🏗️ 多提供商架构 - 支持 Z.AI、K2Think、LongCat 等多个 AI 提供商
- 🤖 Claude Code 支持 - 通过 Claude Code Router 接入 Claude Code (CCR 工具请升级到 v1.0.47 以上)
- 🍒 Cherry Studio支持 - Cherry Studio 中可以直接调用 MCP 工具
- 🚀 高性能流式响应 - Server-Sent Events (SSE) 支持
- 🛠️ 增强工具调用 - 改进的 Function Call 实现,支持复杂工具链
- 🧠 思考模式支持 - 智能处理模型推理过程
- 🐳 Docker 部署 - 一键容器化部署(环境变量请参考
.env.example) - 🛡️ 会话隔离 - 匿名模式保护隐私
- 🔧 灵活配置 - 环境变量灵活配置
- 🔄 Token 池管理 - 自动轮询、容错恢复、动态更新
- 🛡️ 错误处理 - 完善的异常捕获和重试机制
- 🔒 服务唯一性 - 基于进程名称(pname)的服务唯一性验证,防止重复启动
- Python 3.9-3.12
- pip 或 uv (推荐)
# 克隆项目
git clone https://github.com/ZyphrZero/z.ai2api_python.git
cd z.ai2api_python
# 使用 uv (推荐)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
uv run python main.py
# 或使用 pip (推荐使用清华源)
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python main.py🍋🟩 服务启动后访问接口文档:http://localhost:8080/docs
💡 提示:默认端口为 8080,可通过环境变量LISTEN_PORT修改
⚠️ 注意:请勿将AUTH_TOKEN泄露给其他人,请使用AUTH_TOKENS配置多个认证令牌
服务启动后,可以通过标准的 OpenAI API 客户端进行调用。详细的 API 使用方法请参考 OpenAI API 文档。
从 Docker Hub 拉取最新镜像:
# 拉取最新版本
docker pull zyphrzero/z-ai2api-python:latest
# 或拉取指定版本
docker pull zyphrzero/z-ai2api-python:v0.1.0快速启动:
# 基础启动(使用默认配置)
docker run -d \
--name z-ai2api \
-p 8080:8080 \
-e AUTH_TOKEN="sk-your-api-key" \
zyphrzero/z-ai2api-python:latest
# 完整配置启动
docker run -d \
--name z-ai2api \
-p 8080:8080 \
-e AUTH_TOKEN="sk-your-api-key" \
-e ANONYMOUS_MODE="true" \
-e DEBUG_LOGGING="true" \
-e TOOL_SUPPORT="true" \
-v $(pwd)/tokens.txt:/app/tokens.txt \
-v $(pwd)/logs:/app/logs \
zyphrzero/z-ai2api-python:latest使用 Docker Compose:
创建 docker-compose.yml 文件:
version: '3.8'
services:
z-ai2api:
image: zyphrzero/z-ai2api-python:latest
container_name: z-ai2api
ports:
- "8080:8080"
environment:
- AUTH_TOKEN=sk-your-api-key
- ANONYMOUS_MODE=true
- DEBUG_LOGGING=true
- TOOL_SUPPORT=true
- LISTEN_PORT=8080
volumes:
- ./tokens.txt:/app/tokens.txt
- ./logs:/app/logs
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3然后启动:
docker-compose up -dcd deploy
docker-compose up -d- 镜像地址: https://hub.docker.com/r/zyphrzero/z-ai2api-python
-
支持架构:
linux/amd64,linux/arm64 -
基础镜像:
python:3.11-slim
为了保持日志和配置文件的持久化,建议挂载以下目录:
# 启动时挂载数据目录
docker run -d \
--name z-ai2api \
-p 8080:8080 \
-e AUTH_TOKEN="sk-your-api-key" \
-v $(pwd)/tokens.txt:/app/tokens.txt \
-v $(pwd)/logs:/app/logs \
-v $(pwd)/.env:/app/.env \
zyphrzero/z-ai2api-python:latest| 模型 | 上游 ID | 描述 | 特性 |
|---|---|---|---|
GLM-4.5 |
0727-360B-API | 标准模型 | 通用对话,平衡性能 |
GLM-4.5-Thinking |
0727-360B-API | 思考模型 | 显示推理过程,透明度高 |
GLM-4.5-Search |
0727-360B-API | 搜索模型 | 实时网络搜索,信息更新 |
GLM-4.5-Air |
0727-106B-API | 轻量模型 | 快速响应,高效推理 |
| 模型 | 描述 | 特性 |
|---|---|---|
MBZUAI-IFM/K2-Think |
K2Think 模型 | 快速的高质量推理 |
| 模型 | 描述 | 特性 |
|---|---|---|
LongCat-Flash |
快速响应模型 | 高速处理,适合实时对话 |
LongCat |
标准模型 | 平衡性能,通用场景 |
LongCat-Search |
搜索增强模型 | 集成搜索功能,信息检索 |
| 变量名 | 默认值 | 说明 |
|---|---|---|
AUTH_TOKEN |
sk-your-api-key |
客户端认证密钥 |
LISTEN_PORT |
8080 |
服务监听端口 |
DEBUG_LOGGING |
true |
调试日志开关 |
ANONYMOUS_MODE |
true |
匿名用户模式开关 |
TOOL_SUPPORT |
true |
Function Call 功能开关 |
SKIP_AUTH_TOKEN |
false |
跳过认证令牌验证 |
SCAN_LIMIT |
200000 |
扫描限制 |
AUTH_TOKENS_FILE |
tokens.txt |
Z.AI 认证token文件路径 |
| 变量名 | 默认值 | 说明 |
|---|---|---|
LONGCAT_PASSPORT_TOKEN |
- | LongCat 单个认证token |
LONGCAT_TOKENS_FILE |
- | LongCat 多个token文件路径 |
💡 详细配置请查看
.env.example文件
# Z.AI 认证配置
AUTH_TOKENS_FILE=tokens.txt
ANONYMOUS_MODE=true# LongCat 认证配置
LONGCAT_PASSPORT_TOKEN=your_passport_token
# 或使用多个token文件
LONGCAT_TOKENS_FILE=longcat_tokens.txt# K2Think 自动处理认证,无需额外配置- 负载均衡:轮询使用多个auth token,分散请求负载
- 自动容错:token失败时自动切换到下一个可用token
- 健康监控:基于Z.AI API的role字段精确验证token类型
- 自动恢复:失败token在超时后自动重新尝试
- 动态管理:支持运行时更新token池
- 智能去重:自动检测和去除重复token
- 类型验证:只接受认证用户token (role: "user"),拒绝匿名token (role: "guest")
- 回退机制:认证模式失败时自动回退到匿名模式,匿名模式无法回退到认证模式
仅有基础功能,暂未完善
# 查看token池状态
curl http://localhost:8080/v1/token-pool/status
# 手动健康检查
curl -X POST http://localhost:8080/v1/token-pool/health-check
# 动态更新token池
curl -X POST http://localhost:8080/v1/token-pool/update \
-H "Content-Type: application/json" \
-d '["new_token1", "new_token2"]'- 智能客服系统:集成到现有客服平台,提供 24/7 智能问答服务
- 内容生成工具:自动生成文章、摘要、翻译等内容
- 代码助手:提供代码补全、解释、优化建议等功能
- 外部 API 集成:连接天气、搜索、数据库等外部服务
- 自动化工作流:构建复杂的多步骤自动化任务
- 智能决策系统:基于实时数据进行智能分析和决策
Q: 如何获取 AUTH_TOKEN?
A: AUTH_TOKEN 为自己自定义的 api key,在环境变量中配置,需要保证客户端与服务端一致。
Q: 启动时提示"服务已在运行"怎么办?
A: 这是服务唯一性验证功能,防止重复启动。解决方法:
- 检查是否已有服务实例在运行:
ps aux | grep z-ai2api-server - 停止现有实例后再启动新的
- 如果确认没有实例运行,删除 PID 文件:
rm z-ai2api-server.pid - 可通过环境变量
SERVICE_NAME自定义服务名称避免冲突
Q: 如何通过 Claude Code 使用本服务?
A: 创建 zai.js 这个 ccr 插件放在./.claude-code-router/plugins目录下,配置 ./.claude-code-router/config.json 指向本服务地址,使用 AUTH_TOKEN 进行认证。
示例配置:
{
"LOG": false,
"LOG_LEVEL": "debug",
"CLAUDE_PATH": "",
"HOST": "127.0.0.1",
"PORT": 3456,
"APIKEY": "",
"API_TIMEOUT_MS": "600000",
"PROXY_URL": "",
"transformers": [
{
"name": "zai",
"path": "C:\\Users\\Administrator\\.claude-code-router\\plugins\\zai.js",
"options": {}
}
],
"Providers": [
{
"name": "GLM",
"api_base_url": "http://127.0.0.1:8080/v1/chat/completions",
"api_key": "sk-your-api-key",
"models": ["GLM-4.5", "GLM-4.5-Air"],
"transformers": {
"use": ["zai"]
}
}
],
"StatusLine": {
"enabled": false,
"currentStyle": "default",
"default": {
"modules": []
},
"powerline": {
"modules": []
}
},
"Router": {
"default": "GLM,GLM-4.5",
"background": "GLM,GLM-4.5",
"think": "GLM,GLM-4.5",
"longContext": "GLM,GLM-4.5",
"longContextThreshold": 60000,
"webSearch": "GLM,GLM-4.5",
"image": "GLM,GLM-4.5"
},
"CUSTOM_ROUTER_PATH": ""
}Q: 匿名模式是什么?
A: 匿名模式使用临时 token,避免对话历史共享,保护隐私。
Q: 如何自定义配置?
A: 通过环境变量配置,推荐使用 .env 文件。
Q: 如何配置 LongCat 认证?
A: 有两种方式配置 LongCat 认证:
- 单个 token:设置
LONGCAT_PASSPORT_TOKEN环境变量 - 多个 token:创建 token 文件并设置
LONGCAT_TOKENS_FILE环境变量
要使用完整的多模态功能,需要获取正式的 Z.ai API Token:
- 打开 Z.ai 聊天界面,然后登录账号
- 按 F12 打开开发者工具
- 切换到 "Application" -> "Local Storage" -> "Cookie"列表中找到名为
token的值 - 复制
token值设置为环境变量,也可以使用官方个人账号下设置的 API Key
❗ 重要提示: 获取的 token 可能有时效性,多模态模型需要官方 Z.ai API 非匿名 Token,匿名 token 不支持多媒体处理
获取 LongCat API Token 才能正常使用该服务(官网匿名对话次数仅有一次):
- 打开 LongCat 官网,登录自己的美团账号
- 按 F12 打开开发者工具
- 切换到 "Application" -> "Local Storage" -> "Cookie"列表中找到名为
passport_token_key的值 - 复制
passport_token_key值设置为环境变量
| 组件 | 技术 | 版本 | 说明 |
|---|---|---|---|
| Web 框架 | FastAPI | 0.116.1 | 高性能异步 Web 框架,支持自动 API 文档生成 |
| ASGI 服务器 | Granian | 2.5.2 | 基于 Rust 的高性能 ASGI 服务器,支持热重载 |
| HTTP 客户端 | HTTPX / Requests | 0.27.0 / 2.32.5 | 异步/同步 HTTP 库,用于上游 API 调用 |
| 数据验证 | Pydantic | 2.11.7 | 类型安全的数据验证与序列化 |
| 配置管理 | Pydantic Settings | 2.10.1 | 基于 Pydantic 的配置管理 |
| 日志系统 | Loguru | 0.7.3 | 高性能结构化日志库 |
| 用户代理 | Fake UserAgent | 2.2.0 | 动态用户代理生成 |
┌──────────────┐ ┌─────────────────────────────────────┐ ┌─────────────────┐
│ OpenAI │ │ │ │ │
│ Client │────▶│ FastAPI Server │────▶│ Z.AI API │
└──────────────┘ │ │ │ │
┌──────────────┐ │ ┌─────────────────────────────────┐ │ │ ┌─────────────┐ │
│ Claude Code │ │ │ Provider Router │ │ │ │0727-360B-API│ │
│ Router │────▶│ │ ┌─────────┬─────────┬─────────┐ │ │ │ └─────────────┘ │
└──────────────┘ │ │ │Z.AI │K2Think │LongCat │ │ │ │ ┌─────────────┐ │
│ │ │Provider │Provider │Provider │ │ │────▶│ │0727-106B-API│ │
│ │ └─────────┴─────────┴─────────┘ │ │ │ └─────────────┘ │
│ └─────────────────────────────────┘ │ │ │
│ ┌─────────────────────────────────┐ │ └─────────────────┘
│ │ /v1/chat/completions │ │ ┌─────────────────┐
│ │ /v1/models │ │ │ K2Think API │
│ │ Enhanced Tools │ │────▶│ │
│ └─────────────────────────────────┘ │ └─────────────────┘
└─────────────────────────────────────┘ ┌─────────────────┐
OpenAI Compatible API │ LongCat API │
│ │
└─────────────────┘
If you like this project, please give it a star ⭐
我们欢迎所有形式的贡献! 请确保代码符合 PEP 8 规范,并更新相关文档。
本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。
- 本项目与 Z.AI、K2Think、LongCat 等 AI 提供商官方无关
- 使用前请确保遵守各提供商的服务条款
- 请勿用于商业用途或违反使用条款的场景
- 项目仅供学习和研究使用
- 用户需自行承担使用风险
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for z.ai2api_python
Similar Open Source Tools
z.ai2api_python
Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.
AIxVuln
AIxVuln is an automated vulnerability discovery and verification system based on large models (LLM) + function calling + Docker sandbox. The system manages 'projects' through a web UI/desktop client, automatically organizing multiple 'digital humans' for environment setup, code auditing, vulnerability verification, and report generation. It utilizes an isolated Docker environment for dependency installation, service startup, PoC verification, and evidence collection, ultimately producing downloadable vulnerability reports. The system has already discovered dozens of vulnerabilities in real open-source projects.
gin-vue-admin
Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.
vibium
Vibium is a browser automation infrastructure designed for AI agents, providing a single binary that manages browser lifecycle, WebDriver BiDi protocol, and an MCP server. It offers zero configuration, AI-native capabilities, and is lightweight with no runtime dependencies. It is suitable for AI agents, test automation, and any tasks requiring browser interaction.
PaiAgent
PaiAgent is an enterprise-level AI workflow visualization orchestration platform that simplifies the combination and scheduling of AI capabilities. It allows developers and business users to quickly build complex AI processing flows through an intuitive drag-and-drop interface, without the need to write code, enabling collaboration of various large models.
topsha
LocalTopSH is an AI Agent Framework designed for companies and developers who require 100% on-premise AI agents with data privacy. It supports various OpenAI-compatible LLM backends and offers production-ready security features. The framework allows simple deployment using Docker compose and ensures that data stays within the user's network, providing full control and compliance. With cost-effective scaling options and compatibility in regions with restrictions, LocalTopSH is a versatile solution for deploying AI agents on self-hosted infrastructure.
AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.
banana-slides
Banana-slides is a native AI-powered PPT generation application based on the nano banana pro model. It supports generating complete PPT presentations from ideas, outlines, and page descriptions. The app automatically extracts attachment charts, uploads any materials, and allows verbal modifications, aiming to truly 'Vibe PPT'. It lowers the threshold for creating PPTs, enabling everyone to quickly create visually appealing and professional presentations.
memsearch
Memsearch is a tool that allows users to give their AI agents persistent memory in a few lines of code. It enables users to write memories as markdown and search them semantically. Inspired by OpenClaw's markdown-first memory architecture, Memsearch is pluggable into any agent framework. The tool offers features like smart deduplication, live sync, and a ready-made Claude Code plugin for building agent memory.
myclaw
myclaw is a personal AI assistant built on agentsdk-go that offers a CLI agent for single message or interactive REPL mode, full orchestration with channels, cron, and heartbeat, support for various messaging channels like Telegram, Feishu, WeCom, WhatsApp, and a web UI, multi-provider support for Anthropic and OpenAI models, image recognition and document processing, scheduled tasks with JSON persistence, long-term and daily memory storage, custom skill loading, and more. It provides a comprehensive solution for interacting with AI models and managing tasks efficiently.
tinyclaw
TinyClaw is a lightweight wrapper around Claude Code that connects WhatsApp via QR code, processes messages sequentially, maintains conversation context, runs 24/7 in tmux, and is ready for multi-channel support. Its key innovation is the file-based queue system that prevents race conditions and enables multi-channel support. TinyClaw consists of components like whatsapp-client.js for WhatsApp I/O, queue-processor.js for message processing, heartbeat-cron.sh for health checks, and tinyclaw.sh as the main orchestrator with a CLI interface. It ensures no race conditions, is multi-channel ready, provides clean responses using claude -c -p, and supports persistent sessions. Security measures include local storage of WhatsApp session and queue files, channel-specific authentication, and running Claude with user permissions.
py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.
openakita
OpenAkita is a self-evolving AI Agent framework that autonomously learns new skills, performs daily self-checks and repairs, accumulates experience from task execution, and persists until the task is done. It auto-generates skills, installs dependencies, learns from mistakes, and remembers preferences. The framework is standards-based, multi-platform, and provides a Setup Center GUI for intuitive installation and configuration. It features self-learning and evolution mechanisms, a Ralph Wiggum Mode for persistent execution, multi-LLM endpoints, multi-platform IM support, desktop automation, multi-agent architecture, scheduled tasks, identity and memory management, a tool system, and a guided wizard for setup.
WenShape
WenShape is a context engineering system for creating long novels. It addresses the challenge of narrative consistency over thousands of words by using an orchestrated writing process, dynamic fact tracking, and precise token budget management. All project data is stored in YAML/Markdown/JSONL text format, naturally supporting Git version control.
Autopilot-Notes
Autopilot Notes is an open-source knowledge base for systematically learning autonomous driving technology. It covers basic theory, hardware, algorithms, tools, and practical engineering practices across 10+ chapters. The repository provides daily updates on industry trends, in-depth analysis of mainstream solutions like Tesla, Baidu Apollo, and Openpilot, and hands-on content including simulation, deployment, and optimization. Contributors are welcome to submit pull requests to improve the documentation.
boxlite
BoxLite is an embedded, lightweight micro-VM runtime designed for AI agents running OCI containers with hardware-level isolation. It is built for high concurrency with no daemon required, offering features like lightweight VMs, high concurrency, hardware isolation, embeddability, and OCI compatibility. Users can spin up 'Boxes' to run containers for AI agent sandboxes and multi-tenant code execution scenarios where Docker alone is insufficient and full VM infrastructure is too heavy. BoxLite supports Python, Node.js, and Rust with quick start guides for each, along with features like CPU/memory limits, storage options, networking capabilities, security layers, and image registry configuration. The tool provides SDKs for Python and Node.js, with Go support coming soon. It offers detailed documentation, examples, and architecture insights for users to understand how BoxLite works under the hood.
For similar tasks
holoinsight
HoloInsight is a cloud-native observability platform that provides low-cost and high-performance monitoring services for cloud-native applications. It offers deep insights through real-time log analysis and AI integration. The platform is designed to help users gain a comprehensive understanding of their applications' performance and behavior in the cloud environment. HoloInsight is easy to deploy using Docker and Kubernetes, making it a versatile tool for monitoring and optimizing cloud-native applications. With a focus on scalability and efficiency, HoloInsight is suitable for organizations looking to enhance their observability and monitoring capabilities in the cloud.
metaso-free-api
Metaso AI Free service supports high-speed streaming output, secret tower AI super network search (full network or academic as well as concise, in-depth, research three modes), zero-configuration deployment, multi-token support. Fully compatible with ChatGPT interface. It also has seven other free APIs available for use. The tool provides various deployment options such as Docker, Docker-compose, Render, Vercel, and native deployment. Users can access the tool for chat completions and token live checks. Note: Reverse API is unstable, it is recommended to use the official Metaso AI website to avoid the risk of banning. This project is for research and learning purposes only, not for commercial use.
tribe
Tribe AI is a low code tool designed to rapidly build and coordinate multi-agent teams. It leverages the langgraph framework to customize and coordinate teams of agents, allowing tasks to be split among agents with different strengths for faster and better problem-solving. The tool supports persistent conversations, observability, tool calling, human-in-the-loop functionality, easy deployment with Docker, and multi-tenancy for managing multiple users and teams.
melodisco
Melodisco is an AI music player that allows users to listen to music and manage playlists. It provides a user-friendly interface for music playback and organization. Users can deploy Melodisco with Vercel or Docker for easy setup. Local development instructions are provided for setting up the project environment. The project credits various tools and libraries used in its development, such as Next.js, Tailwind CSS, and Stripe. Melodisco is a versatile tool for music enthusiasts looking for an AI-powered music player with features like authentication, payment integration, and multi-language support.
KB-Builder
KB Builder is an open-source knowledge base generation system based on the LLM large language model. It utilizes the RAG (Retrieval-Augmented Generation) data generation enhancement method to provide users with the ability to enhance knowledge generation and quickly build knowledge bases based on RAG. It aims to be the central hub for knowledge construction in enterprises, offering platform-based intelligent dialogue services and document knowledge base management functionality. Users can upload docx, pdf, txt, and md format documents and generate high-quality knowledge base question-answer pairs by invoking large models through the 'Parse Document' feature.
PDFMathTranslate
PDFMathTranslate is a tool designed for translating scientific papers and conducting bilingual comparisons. It preserves formulas, charts, table of contents, and annotations. The tool supports multiple languages and diverse translation services. It provides a command-line tool, interactive user interface, and Docker deployment. Users can try the application through online demos. The tool offers various installation methods including command-line, portable, graphic user interface, and Docker. Advanced options allow users to customize translation settings. Additionally, the tool supports secondary development through APIs for Python and HTTP. Future plans include parsing layout with DocLayNet based models, fixing page rotation and format issues, supporting non-PDF/A files, and integrating plugins for Zotero and Obsidian.
grps_trtllm
The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.
discord-ai-bot
Discord AI Bot is a chatbot tool designed to interact with Ollama and AUTOMATIC1111 Stable Diffusion on Discord. The bot allows users to set up and configure a Discord bot to communicate with the mentioned AI models. Users can follow step-by-step instructions to install Node.js, Ollama, and the required dependencies, create a Discord bot, and interact with the bot by mentioning it in messages. Additionally, the tool provides set-up instructions for Docker users to easily deploy the bot using Docker containers. Overall, Discord AI Bot simplifies the process of integrating AI chatbots into Discord servers for interactive communication.
For similar jobs
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
ChatGPT-On-CS
ChatGPT-On-CS is an intelligent chatbot tool based on large models, supporting various platforms like WeChat, Taobao, Bilibili, Douyin, Weibo, and more. It can handle text, voice, and image inputs, access external resources through plugins, and customize enterprise AI applications based on proprietary knowledge bases. Users can set custom replies, utilize ChatGPT interface for intelligent responses, send images and binary files, and create personalized chatbots using knowledge base files. The tool also features platform-specific plugin systems for accessing external resources and supports enterprise AI applications customization.
call-gpt
Call GPT is a voice application that utilizes Deepgram for Speech to Text, elevenlabs for Text to Speech, and OpenAI for GPT prompt completion. It allows users to chat with ChatGPT on the phone, providing better transcription, understanding, and speaking capabilities than traditional IVR systems. The app returns responses with low latency, allows user interruptions, maintains chat history, and enables GPT to call external tools. It coordinates data flow between Deepgram, OpenAI, ElevenLabs, and Twilio Media Streams, enhancing voice interactions.
awesome-LLM-resourses
A comprehensive repository of resources for Chinese large language models (LLMs), including data processing tools, fine-tuning frameworks, inference libraries, evaluation platforms, RAG engines, agent frameworks, books, courses, tutorials, and tips. The repository covers a wide range of tools and resources for working with LLMs, from data labeling and processing to model fine-tuning, inference, evaluation, and application development. It also includes resources for learning about LLMs through books, courses, and tutorials, as well as insights and strategies from building with LLMs.
tappas
Hailo TAPPAS is a set of full application examples that implement pipeline elements and pre-trained AI tasks. It demonstrates Hailo's system integration scenarios on predefined systems, aiming to accelerate time to market, simplify integration with Hailo's runtime SW stack, and provide a starting point for customers to fine-tune their applications. The tool supports both Hailo-15 and Hailo-8, offering various example applications optimized for different common hosts. TAPPAS includes pipelines for single network, two network, and multi-stream processing, as well as high-resolution processing via tiling. It also provides example use case pipelines like License Plate Recognition and Multi-Person Multi-Camera Tracking. The tool is regularly updated with new features, bug fixes, and platform support.
cloudflare-rag
This repository provides a fullstack example of building a Retrieval Augmented Generation (RAG) app with Cloudflare. It utilizes Cloudflare Workers, Pages, D1, KV, R2, AI Gateway, and Workers AI. The app features streaming interactions to the UI, hybrid RAG with Full-Text Search and Vector Search, switchable providers using AI Gateway, per-IP rate limiting with Cloudflare's KV, OCR within Cloudflare Worker, and Smart Placement for workload optimization. The development setup requires Node, pnpm, and wrangler CLI, along with setting up necessary primitives and API keys. Deployment involves setting up secrets and deploying the app to Cloudflare Pages. The project implements a Hybrid Search RAG approach combining Full Text Search against D1 and Hybrid Search with embeddings against Vectorize to enhance context for the LLM.
pixeltable
Pixeltable is a Python library designed for ML Engineers and Data Scientists to focus on exploration, modeling, and app development without the need to handle data plumbing. It provides a declarative interface for working with text, images, embeddings, and video, enabling users to store, transform, index, and iterate on data within a single table interface. Pixeltable is persistent, acting as a database unlike in-memory Python libraries such as Pandas. It offers features like data storage and versioning, combined data and model lineage, indexing, orchestration of multimodal workloads, incremental updates, and automatic production-ready code generation. The tool emphasizes transparency, reproducibility, cost-saving through incremental data changes, and seamless integration with existing Python code and libraries.
wave-apps
Wave Apps is a directory of sample applications built on H2O Wave, allowing users to build AI apps faster. The apps cover various use cases such as explainable hotel ratings, human-in-the-loop credit risk assessment, mitigating churn risk, online shopping recommendations, and sales forecasting EDA. Users can download, modify, and integrate these sample apps into their own projects to learn about app development and AI model deployment.