
z.ai2api_python
将Z.ai转换为OpenAI兼容格式的高性能代理,无缝接入 GLM-4.5 系列模型
Stars: 146

Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.
README:
轻量级 OpenAI API 兼容代理服务,通过 Claude Code Router 接入 Z.AI,支持 GLM-4.5 系列模型的完整功能。
- 🔌 完全兼容 OpenAI API - 无缝集成现有应用
- 🤖 Claude Code 支持 - 通过 Claude Code Router 接入 Claude Code (CCR 工具请升级到 v1.0.47 以上)
- 🚀 高性能流式响应 - Server-Sent Events (SSE) 支持
- 🛠️ 增强工具调用 - 改进的 Function Call 实现
- 🧠 思考模式支持 - 智能处理模型推理过程
- 🔍 搜索模型集成 - GLM-4.5-Search 网络搜索能力
- 🐳 Docker 部署 - 一键容器化部署
- 🛡️ 会话隔离 - 匿名模式保护隐私
- 🔧 灵活配置 - 环境变量灵活配置
- 📊 多模型映射 - 智能上游模型路由
- Python 3.8+
- pip 或 uv (推荐)
# 克隆项目
git clone https://github.com/ZyphrZero/z.ai2api_python.git
cd z.ai2api_python
# 使用 uv (推荐)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
uv run python main.py
# 或使用 pip (推荐使用清华源)
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
python main.py
服务启动后访问:http://localhost:8080/docs
import openai
# 初始化客户端
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-auth-token" # 替换为你的 AUTH_TOKEN
)
# 普通对话
response = client.chat.completions.create(
model="GLM-4.5",
messages=[{"role": "user", "content": "你好,介绍一下 Python"}],
stream=False
)
print(response.choices[0].message.content)
cd deploy
docker-compose up -d
模型 | 上游 ID | 描述 | 特性 |
---|---|---|---|
GLM-4.5 |
0727-360B-API | 标准模型 | 通用对话,平衡性能 |
GLM-4.5-Thinking |
0727-360B-API | 思考模型 | 显示推理过程,透明度高 |
GLM-4.5-Search |
0727-360B-API | 搜索模型 | 实时网络搜索,信息更新 |
GLM-4.5-Air |
0727-106B-API | 轻量模型 | 快速响应,高效推理 |
GLM-4.5V |
glm-4.5v | ❌ 暂不支持 |
# 定义工具
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "城市名称"}
},
"required": ["city"]
}
}
}]
# 使用工具
response = client.chat.completions.create(
model="GLM-4.5",
messages=[{"role": "user", "content": "北京天气怎么样?"}],
tools=tools,
tool_choice="auto"
)
response = client.chat.completions.create(
model="GLM-4.5-Thinking",
messages=[{"role": "user", "content": "解释量子计算"}],
stream=True
)
for chunk in response:
content = chunk.choices[0].delta.content
reasoning = chunk.choices[0].delta.reasoning_content
if content:
print(content, end="")
if reasoning:
print(f"\n🤔 思考: {reasoning}\n")
变量名 | 默认值 | 说明 |
---|---|---|
AUTH_TOKEN |
sk-your-api-key |
客户端认证密钥 |
API_ENDPOINT |
https://chat.z.ai/api/chat/completions |
上游 API 地址 |
LISTEN_PORT |
8080 |
服务监听端口 |
DEBUG_LOGGING |
true |
调试日志开关 |
THINKING_PROCESSING |
think |
思考内容处理策略 |
ANONYMOUS_MODE |
true |
匿名模式开关 |
TOOL_SUPPORT |
true |
Function Call 功能开关 |
SKIP_AUTH_TOKEN |
false |
跳过认证令牌验证 |
SCAN_LIMIT |
200000 |
扫描限制 |
BACKUP_TOKEN |
eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9... |
Z.ai 固定访问令牌 |
-
think
- 转换为<thinking>
标签(OpenAI 兼容) -
strip
- 移除思考内容 -
raw
- 保留原始格式
# 集成到现有应用
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-token"
)
# 智能客服
def chat_with_ai(message):
response = client.chat.completions.create(
model="GLM-4.5",
messages=[{"role": "user", "content": message}]
)
return response.choices[0].message.content
# 结合外部 API
def call_external_api(tool_name, arguments):
# 执行实际工具调用
return result
# 处理工具调用
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
result = call_external_api(
tool_call.function.name,
json.loads(tool_call.function.arguments)
)
# 将结果返回给模型继续对话
Q: 如何获取 AUTH_TOKEN?
A: AUTH_TOKEN
为自己自定义的 api key,在环境变量中配置,需要保证客户端与服务端一致。
Q: 如何通过 Claude Code 使用本服务?
A: 创建 zai.js 这个 ccr 插件放在./.claude-code-router/plugins
目录下,配置 ./.claude-code-router/config.json
指向本服务地址,使用 AUTH_TOKEN
进行认证。
示例配置:
{
"LOG": false,
"LOG_LEVEL": "debug",
"CLAUDE_PATH": "",
"HOST": "127.0.0.1",
"PORT": 3456,
"APIKEY": "",
"API_TIMEOUT_MS": "600000",
"PROXY_URL": "",
"transformers": [
{
"name": "zai",
"path": "C:\\Users\\Administrator\\.claude-code-router\\plugins\\zai.js",
"options": {}
}
],
"Providers": [
{
"name": "GLM",
"api_base_url": "http://127.0.0.1:8080/v1/chat/completions",
"api_key": "sk-your-api-key",
"models": ["GLM-4.5", "GLM-4.5-Air"],
"transformers": {
"use": ["zai"]
}
}
],
"StatusLine": {
"enabled": false,
"currentStyle": "default",
"default": {
"modules": []
},
"powerline": {
"modules": []
}
},
"Router": {
"default": "GLM,GLM-4.5",
"background": "GLM,GLM-4.5",
"think": "GLM,GLM-4.5",
"longContext": "GLM,GLM-4.5",
"longContextThreshold": 60000,
"webSearch": "GLM,GLM-4.5",
"image": "GLM,GLM-4.5"
},
"CUSTOM_ROUTER_PATH": ""
}
Q: 匿名模式是什么?
A: 匿名模式使用临时 token,避免对话历史共享,保护隐私。
Q: Function Call 如何工作?
A: 通过智能提示注入实现,将工具定义转换为系统提示。
Q: 支持哪些 OpenAI 功能?
A: 支持聊天完成、模型列表、流式响应、工具调用等核心功能。
Q: Function Call 如何优化?
A: 改进了工具调用的请求响应结构,支持更复杂的工具链调用和并行执行。
Q: 如何选择合适的模型?
A:
- GLM-4.5: 通用场景,性能和效果平衡
- GLM-4.5-Thinking: 需要了解推理过程的场景
- GLM-4.5-Search: 需要实时信息的场景
- GLM-4.5-Air: 高并发、低延迟要求的场景
Q: 如何自定义配置?
A: 通过环境变量配置,推荐使用 .env
文件。
要使用完整的多模态功能,需要获取正式的 Z.ai API Token:
- 访问 Z.ai 官网
- 注册账户并登录,进入 Z.ai API Keys 设置页面,在该页面设置 个人 API Token
- 将 Token 放置在
BACKUP_TOKEN
环境变量中
- 打开 Z.ai 聊天界面
- 按 F12 打开开发者工具
- 切换到 "Application" 或 "存储" 标签
- 查看 Local Storage 中的认证 token
- 复制 token 值设置为环境变量
⚠️ 注意: 方式 2 获取的 token 可能有时效性,建议使用方式 1 获取长期有效的 API Token。
❗ 重要提示: 多模态模型需要官方 Z.ai API 非匿名 Token,匿名 token 不支持多媒体处理。
组件 | 技术 | 版本 | 说明 |
---|---|---|---|
Web 框架 | FastAPI | 0.104.1 | 高性能异步 Web 框架,支持自动 API 文档生成 |
ASGI 服务器 | Granian | 2.5.2 | 基于 Rust 的高性能 ASGI 服务器,支持热重载 |
HTTP 客户端 | Requests | 2.32.5 | 简洁易用的 HTTP 库,用于上游 API 调用 |
数据验证 | Pydantic | 2.11.7 | 类型安全的数据验证与序列化 |
配置管理 | Pydantic Settings | 2.10.1 | 基于 Pydantic 的配置管理 |
┌──────────────┐ ┌─────────────────────────┐ ┌─────────────────┐
│ OpenAI │ │ │ │ │
│ Client │────▶│ FastAPI Server │────▶│ Z.AI API │
└──────────────┘ │ │ │ │
┌──────────────┐ │ ┌─────────────────────┐ │ │ ┌─────────────┐ │
│ Claude Code │ │ │ /v1/chat/completions│ │ │ │0727-360B-API│ │
│ Router │────▶│ └─────────────────────┘ │ │ └─────────────┘ │
└──────────────┘ │ ┌─────────────────────┐ │ │ ┌─────────────┐ │
│ │ /v1/models │ │────▶│ │0727-106B-API│ │
│ └─────────────────────┘ │ │ └─────────────┘ │
│ ┌─────────────────────┐ │ │ │
│ │ Enhanced Tools │ │ └─────────────────┘
│ └─────────────────────┘ │
└─────────────────────────┘
OpenAI Compatible API
z.ai2api_python/
├── app/
│ ├── core/
│ │ ├── __init__.py
│ │ ├── config.py # 配置管理
│ │ ├── openai.py # OpenAI API 实现
│ │ └── response_handlers.py # 响应处理器
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Pydantic 模型定义
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── helpers.py # 辅助函数
│ │ ├── tools.py # 增强工具调用处理
│ │ └── sse_parser.py # SSE 流式解析器
│ └── __init__.py
├── tests/ # 单元测试
├── deploy/ # Docker 部署配置
├── main.py # FastAPI 应用入口
├── requirements.txt # Python 依赖
├── .env.example # 环境变量示例
└── README.md # 项目文档
我们欢迎所有形式的贡献! 请确保代码符合 PEP 8 规范,并更新相关文档。
本项目采用 MIT 许可证 - 查看 LICENSE 文件了解详情。
- 本项目与 Z.AI 官方无关
- 使用前请确保遵守 Z.AI 服务条款
- 请勿用于商业用途或违反使用条款的场景
- 项目仅供学习和研究使用
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for z.ai2api_python
Similar Open Source Tools

z.ai2api_python
Z.AI2API Python is a lightweight OpenAI API proxy service that integrates seamlessly with existing applications. It supports the full functionality of GLM-4.5 series models and features high-performance streaming responses, enhanced tool invocation, support for thinking mode, integration with search models, Docker deployment, session isolation for privacy protection, flexible configuration via environment variables, and intelligent upstream model routing.

Streamer-Sales
Streamer-Sales is a large model for live streamers that can explain products based on their characteristics and inspire users to make purchases. It is designed to enhance sales efficiency and user experience, whether for online live sales or offline store promotions. The model can deeply understand product features and create tailored explanations in vivid and precise language, sparking user's desire to purchase. It aims to revolutionize the shopping experience by providing detailed and unique product descriptions to engage users effectively.

chatgpt-mirai-qq-bot
Kirara AI is a chatbot that supports mainstream language models and chat platforms. It features various functionalities such as image sending, keyword-triggered replies, multi-account support, content moderation, personality settings, and support for platforms like QQ, Telegram, Discord, and WeChat. It also offers HTTP server capabilities, plugin support, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, and custom workflows. The tool can be accessed via HTTP API for integration with other platforms.

LangChain-SearXNG
LangChain-SearXNG is an open-source AI search engine built on LangChain and SearXNG. It supports faster and more accurate search and question-answering functionalities. Users can deploy SearXNG and set up Python environment to run LangChain-SearXNG. The tool integrates AI models like OpenAI and ZhipuAI for search queries. It offers two search modes: Searxng and ZhipuWebSearch, allowing users to control the search workflow based on input parameters. LangChain-SearXNG v2 version enhances response speed and content quality compared to the previous version, providing a detailed configuration guide and showcasing the effectiveness of different search modes through comparisons.

gin-vue-admin
Gin-vue-admin is a full-stack development platform based on Vue and Gin, integrating features like JWT authentication, dynamic routing, dynamic menus, Casbin authorization, form generator, code generator, etc. It provides various example files to help users focus more on business development. The project offers detailed documentation, video tutorials for setup and deployment, and a community for support and contributions. Users need a certain level of knowledge in Golang and Vue to work with this project. It is recommended to follow the Apache2.0 license if using the project for commercial purposes.

kirara-ai
Kirara AI is a chatbot that supports mainstream large language models and chat platforms. It provides features such as image sending, keyword-triggered replies, multi-account support, personality settings, and support for various chat platforms like QQ, Telegram, Discord, and WeChat. The tool also supports HTTP server for Web API, popular large models like OpenAI and DeepSeek, plugin mechanism, conditional triggers, admin commands, drawing models, voice replies, multi-turn conversations, cross-platform message sending, custom workflows, web management interface, and built-in Frpc intranet penetration.

TelegramForwarder
Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.

SakuraLLM
SakuraLLM is a project focused on building large language models for Japanese to Chinese translation in the light novel and galgame domain. The models are based on open-source large models and are pre-trained and fine-tuned on general Japanese corpora and specific domains. The project aims to provide high-performance language models for galgame/light novel translation that are comparable to GPT3.5 and can be used offline. It also offers an API backend for running the models, compatible with the OpenAI API format. The project is experimental, with version 0.9 showing improvements in style, fluency, and accuracy over GPT-3.5.

ailab
The 'ailab' project is an experimental ground for code generation combining AI (especially coding agents) and Deno. It aims to manage configuration files defining coding rules and modes in Deno projects, enhancing the quality and efficiency of code generation by AI. The project focuses on defining clear rules and modes for AI coding agents, establishing best practices in Deno projects, providing mechanisms for type-safe code generation and validation, applying test-driven development (TDD) workflow to AI coding, and offering implementation examples utilizing design patterns like adapter pattern.

AivisSpeech-Engine
AivisSpeech-Engine is a powerful open-source tool for speech recognition and synthesis. It provides state-of-the-art algorithms for converting speech to text and text to speech. The tool is designed to be user-friendly and customizable, allowing developers to easily integrate speech capabilities into their applications. With AivisSpeech-Engine, users can transcribe audio recordings, create voice-controlled interfaces, and generate natural-sounding speech output. Whether you are building a virtual assistant, developing a speech-to-text application, or experimenting with voice technology, AivisSpeech-Engine offers a comprehensive solution for all your speech processing needs.

ddddocr
ddddocr is a Rust version of a simple OCR API server that provides easy deployment for captcha recognition without relying on the OpenCV library. It offers a user-friendly general-purpose captcha recognition Rust library. The tool supports recognizing various types of captchas, including single-line text, transparent black PNG images, target detection, and slider matching algorithms. Users can also import custom OCR training models and utilize the OCR API server for flexible OCR result control and range limitation. The tool is cross-platform and can be easily deployed.

Speech-AI-Forge
Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.

Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.

ChuanhuChatGPT
Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.

AIClient-2-API
AIClient-2-API is a versatile and lightweight API proxy designed for developers, providing ample free API request quotas and comprehensive support for various mainstream large models like Gemini, Qwen Code, Claude, etc. It converts multiple backend APIs into standard OpenAI format interfaces through a Node.js HTTP server. The project adopts a modern modular architecture, supports strategy and adapter patterns, comes with complete test coverage and health check mechanisms, and is ready to use after 'npm install'. By easily switching model service providers in the configuration file, any OpenAI-compatible client or application can seamlessly access different large model capabilities through the same API address, eliminating the hassle of maintaining multiple sets of configurations for different services and dealing with incompatible interfaces.

AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.
For similar tasks

holoinsight
HoloInsight is a cloud-native observability platform that provides low-cost and high-performance monitoring services for cloud-native applications. It offers deep insights through real-time log analysis and AI integration. The platform is designed to help users gain a comprehensive understanding of their applications' performance and behavior in the cloud environment. HoloInsight is easy to deploy using Docker and Kubernetes, making it a versatile tool for monitoring and optimizing cloud-native applications. With a focus on scalability and efficiency, HoloInsight is suitable for organizations looking to enhance their observability and monitoring capabilities in the cloud.

metaso-free-api
Metaso AI Free service supports high-speed streaming output, secret tower AI super network search (full network or academic as well as concise, in-depth, research three modes), zero-configuration deployment, multi-token support. Fully compatible with ChatGPT interface. It also has seven other free APIs available for use. The tool provides various deployment options such as Docker, Docker-compose, Render, Vercel, and native deployment. Users can access the tool for chat completions and token live checks. Note: Reverse API is unstable, it is recommended to use the official Metaso AI website to avoid the risk of banning. This project is for research and learning purposes only, not for commercial use.

tribe
Tribe AI is a low code tool designed to rapidly build and coordinate multi-agent teams. It leverages the langgraph framework to customize and coordinate teams of agents, allowing tasks to be split among agents with different strengths for faster and better problem-solving. The tool supports persistent conversations, observability, tool calling, human-in-the-loop functionality, easy deployment with Docker, and multi-tenancy for managing multiple users and teams.

melodisco
Melodisco is an AI music player that allows users to listen to music and manage playlists. It provides a user-friendly interface for music playback and organization. Users can deploy Melodisco with Vercel or Docker for easy setup. Local development instructions are provided for setting up the project environment. The project credits various tools and libraries used in its development, such as Next.js, Tailwind CSS, and Stripe. Melodisco is a versatile tool for music enthusiasts looking for an AI-powered music player with features like authentication, payment integration, and multi-language support.

KB-Builder
KB Builder is an open-source knowledge base generation system based on the LLM large language model. It utilizes the RAG (Retrieval-Augmented Generation) data generation enhancement method to provide users with the ability to enhance knowledge generation and quickly build knowledge bases based on RAG. It aims to be the central hub for knowledge construction in enterprises, offering platform-based intelligent dialogue services and document knowledge base management functionality. Users can upload docx, pdf, txt, and md format documents and generate high-quality knowledge base question-answer pairs by invoking large models through the 'Parse Document' feature.

PDFMathTranslate
PDFMathTranslate is a tool designed for translating scientific papers and conducting bilingual comparisons. It preserves formulas, charts, table of contents, and annotations. The tool supports multiple languages and diverse translation services. It provides a command-line tool, interactive user interface, and Docker deployment. Users can try the application through online demos. The tool offers various installation methods including command-line, portable, graphic user interface, and Docker. Advanced options allow users to customize translation settings. Additionally, the tool supports secondary development through APIs for Python and HTTP. Future plans include parsing layout with DocLayNet based models, fixing page rotation and format issues, supporting non-PDF/A files, and integrating plugins for Zotero and Obsidian.

grps_trtllm
The grps-trtllm repository is a C++ implementation of a high-performance OpenAI LLM service, combining GRPS and TensorRT-LLM. It supports functionalities like Chat, Ai-agent, and Multi-modal. The repository offers advantages over triton-trtllm, including a complete LLM service implemented in pure C++, integrated tokenizer supporting huggingface and sentencepiece, custom HTTP functionality for OpenAI interface, support for different LLM prompt styles and result parsing styles, integration with tensorrt backend and opencv library for multi-modal LLM, and stable performance improvement compared to triton-trtllm.

discord-ai-bot
Discord AI Bot is a chatbot tool designed to interact with Ollama and AUTOMATIC1111 Stable Diffusion on Discord. The bot allows users to set up and configure a Discord bot to communicate with the mentioned AI models. Users can follow step-by-step instructions to install Node.js, Ollama, and the required dependencies, create a Discord bot, and interact with the bot by mentioning it in messages. Additionally, the tool provides set-up instructions for Docker users to easily deploy the bot using Docker containers. Overall, Discord AI Bot simplifies the process of integrating AI chatbots into Discord servers for interactive communication.
For similar jobs

Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.

ChatGPT-On-CS
ChatGPT-On-CS is an intelligent chatbot tool based on large models, supporting various platforms like WeChat, Taobao, Bilibili, Douyin, Weibo, and more. It can handle text, voice, and image inputs, access external resources through plugins, and customize enterprise AI applications based on proprietary knowledge bases. Users can set custom replies, utilize ChatGPT interface for intelligent responses, send images and binary files, and create personalized chatbots using knowledge base files. The tool also features platform-specific plugin systems for accessing external resources and supports enterprise AI applications customization.

call-gpt
Call GPT is a voice application that utilizes Deepgram for Speech to Text, elevenlabs for Text to Speech, and OpenAI for GPT prompt completion. It allows users to chat with ChatGPT on the phone, providing better transcription, understanding, and speaking capabilities than traditional IVR systems. The app returns responses with low latency, allows user interruptions, maintains chat history, and enables GPT to call external tools. It coordinates data flow between Deepgram, OpenAI, ElevenLabs, and Twilio Media Streams, enhancing voice interactions.

awesome-LLM-resourses
A comprehensive repository of resources for Chinese large language models (LLMs), including data processing tools, fine-tuning frameworks, inference libraries, evaluation platforms, RAG engines, agent frameworks, books, courses, tutorials, and tips. The repository covers a wide range of tools and resources for working with LLMs, from data labeling and processing to model fine-tuning, inference, evaluation, and application development. It also includes resources for learning about LLMs through books, courses, and tutorials, as well as insights and strategies from building with LLMs.

tappas
Hailo TAPPAS is a set of full application examples that implement pipeline elements and pre-trained AI tasks. It demonstrates Hailo's system integration scenarios on predefined systems, aiming to accelerate time to market, simplify integration with Hailo's runtime SW stack, and provide a starting point for customers to fine-tune their applications. The tool supports both Hailo-15 and Hailo-8, offering various example applications optimized for different common hosts. TAPPAS includes pipelines for single network, two network, and multi-stream processing, as well as high-resolution processing via tiling. It also provides example use case pipelines like License Plate Recognition and Multi-Person Multi-Camera Tracking. The tool is regularly updated with new features, bug fixes, and platform support.

cloudflare-rag
This repository provides a fullstack example of building a Retrieval Augmented Generation (RAG) app with Cloudflare. It utilizes Cloudflare Workers, Pages, D1, KV, R2, AI Gateway, and Workers AI. The app features streaming interactions to the UI, hybrid RAG with Full-Text Search and Vector Search, switchable providers using AI Gateway, per-IP rate limiting with Cloudflare's KV, OCR within Cloudflare Worker, and Smart Placement for workload optimization. The development setup requires Node, pnpm, and wrangler CLI, along with setting up necessary primitives and API keys. Deployment involves setting up secrets and deploying the app to Cloudflare Pages. The project implements a Hybrid Search RAG approach combining Full Text Search against D1 and Hybrid Search with embeddings against Vectorize to enhance context for the LLM.

pixeltable
Pixeltable is a Python library designed for ML Engineers and Data Scientists to focus on exploration, modeling, and app development without the need to handle data plumbing. It provides a declarative interface for working with text, images, embeddings, and video, enabling users to store, transform, index, and iterate on data within a single table interface. Pixeltable is persistent, acting as a database unlike in-memory Python libraries such as Pandas. It offers features like data storage and versioning, combined data and model lineage, indexing, orchestration of multimodal workloads, incremental updates, and automatic production-ready code generation. The tool emphasizes transparency, reproducibility, cost-saving through incremental data changes, and seamless integration with existing Python code and libraries.

wave-apps
Wave Apps is a directory of sample applications built on H2O Wave, allowing users to build AI apps faster. The apps cover various use cases such as explainable hotel ratings, human-in-the-loop credit risk assessment, mitigating churn risk, online shopping recommendations, and sales forecasting EDA. Users can download, modify, and integrate these sample apps into their own projects to learn about app development and AI model deployment.