 
                Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Stars: 34359
 
    LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.
README:
📃 LangChain-Chatchat (原 Langchain-ChatGLM)
基于 ChatGLM 等大语言模型与 Langchain 等应用框架实现,开源、可离线部署的 RAG 与 Agent 应用项目。
🤖️ 一种利用 langchain 思想实现的基于本地知识库的问答应用,目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。
💡 受 GanymedeNil 的项目 document.ai 和 AlexZhangji 创建的 ChatGLM-6B Pull Request 启发,建立了全流程可使用开源模型实现的本地知识库问答应用。本项目的最新版本中可使用 Xinference、Ollama 等框架接入 GLM-4-Chat、 Qwen2-Instruct、 Llama3 等模型,依托于 langchain 框架支持通过基于 FastAPI 提供的 API 调用服务,或使用基于 Streamlit 的 WebUI 进行操作。
✅ 本项目支持市面上主流的开源 LLM、 Embedding 模型与向量数据库,可实现全部使用开源模型离线私有部署。与此同时,本项目也支持 OpenAI GPT API 的调用,并将在后续持续扩充对各类模型及模型 API 的接入。
⛓️ 本项目实现原理如下图所示,过程包括加载文件 -> 读取文本 -> 文本分割 -> 文本向量化 -> 问句向量化 ->
在文本向量中匹配出与问句向量最相似的 top k个 -> 匹配出的文本作为上下文和问题一起添加到 prompt中 -> 提交给 LLM生成回答。
📺 原理介绍视频
从文档处理角度来看,实现流程如下:
🚩 本项目未涉及微调、训练过程,但可利用微调或训练对本项目效果进行优化。
🌐 AutoDL 镜像 中 0.3.0
版本所使用代码已更新至本项目 v0.3.0 版本。
🐳 Docker 镜像将会在近期更新。
🧑💻 如果你想对本项目做出贡献,欢迎移步开发指南 获取更多开发部署相关信息。
| 功能 | 0.2.x | 0.3.x | 
|---|---|---|
| 模型接入 | 本地:fastchat 在线:XXXModelWorker | 本地:model_provider,支持大部分主流模型加载框架 在线:oneapi 所有模型接入均兼容openai sdk | 
| Agent | ❌不稳定 | ✅针对ChatGLM3和Qwen进行优化,Agent能力显著提升 | 
| LLM对话 | ✅ | ✅ | 
| 知识库对话 | ✅ | ✅ | 
| 搜索引擎对话 | ✅ | ✅ | 
| 文件对话 | ✅仅向量检索 | ✅统一为File RAG功能,支持BM25+KNN等多种检索方式 | 
| 数据库对话 | ❌ | ✅ | 
| 多模态图片对话 | ❌ | ✅ 推荐使用 qwen-vl-chat | 
| ARXIV文献对话 | ❌ | ✅ | 
| Wolfram对话 | ❌ | ✅ | 
| 文生图 | ❌ | ✅ | 
| 本地知识库管理 | ✅ | ✅ | 
| WEBUI | ✅ | ✅更好的多会话支持,自定义系统提示词... | 
0.3.x 版本的核心功能由 Agent 实现,但用户也可以手动实现工具调用:
| 操作方式 | 实现的功能 | 适用场景 | 
|---|---|---|
| 选中"启用Agent",选择多个工具 | 由LLM自动进行工具调用 | 使用ChatGLM3/Qwen或在线API等具备Agent能力的模型 | 
| 选中"启用Agent",选择单个工具 | LLM仅解析工具参数 | 使用的模型Agent能力一般,不能很好的选择工具 想手动选择功能 | 
| 不选中"启用Agent",选择单个工具 | 不使用Agent功能的情况下,手动填入参数进行工具调用 | 使用的模型不具备Agent能力 | 
| 不选中任何工具,上传一个图片 | 图片对话 | 使用 qwen-vl-chat 等多模态模型 | 
更多功能和更新请实际部署体验.
本项目中已经支持市面上主流的如 GLM-4-Chat 与 Qwen2-Instruct 等新近开源大语言模型和 Embedding 模型,这些模型需要用户自行启动模型部署框架后,通过修改配置信息接入项目,本项目已支持的本地模型部署框架如下:
| 模型部署框架 | Xinference | LocalAI | Ollama | FastChat | 
|---|---|---|---|---|
| OpenAI API 接口对齐 | ✅ | ✅ | ✅ | ✅ | 
| 加速推理引擎 | GPTQ, GGML, vLLM, TensorRT, mlx | GPTQ, GGML, vLLM, TensorRT | GGUF, GGML | vLLM | 
| 接入模型类型 | LLM, Embedding, Rerank, Text-to-Image, Vision, Audio | LLM, Embedding, Rerank, Text-to-Image, Vision, Audio | LLM, Text-to-Image, Vision | LLM, Vision | 
| Function Call | ✅ | ✅ | ✅ | / | 
| 更多平台支持(CPU, Metal) | ✅ | ✅ | ✅ | ✅ | 
| 异构 | ✅ | ✅ | / | / | 
| 集群 | ✅ | ✅ | / | / | 
| 操作文档链接 | Xinference 文档 | LocalAI 文档 | Ollama 文档 | FastChat 文档 | 
| 可用模型 | Xinference 已支持模型 | LocalAI 已支持模型 | Ollama 已支持模型 | FastChat 已支持模型 | 
除上述本地模型加载框架外,项目中也为可接入在线 API 的 One API 框架接入提供了支持,支持包括 OpenAI ChatGPT、Azure OpenAI API、Anthropic Claude、智谱清言、百川 等常用在线 API 的接入使用。
[!Note] 关于 Xinference 加载本地模型: Xinference 内置模型会自动下载,如果想让它加载本机下载好的模型,可以在启动 Xinference 服务后,到项目 tools/model_loaders 目录下执行
streamlit run xinference_manager.py,按照页面提示为指定模型设置本地路径即可.
💡 软件方面,本项目已支持在 Python 3.8-3.11 环境中进行使用,并已在 Windows、macOS、Linux 操作系统中进行测试。
💻 硬件方面,因 0.3.0 版本已修改为支持不同模型部署框架接入,因此可在 CPU、GPU、NPU、MPS 等不同硬件条件下使用。
从 0.3.0 版本起,Langchain-Chatchat 提供以 Python 库形式的安装方式,具体安装请执行:
pip install langchain-chatchat -U[!important] 为确保所使用的 Python 库为最新版,建议使用官方 Pypi 源或清华源。
[!Note] 因模型部署框架 Xinference 接入 Langchain-Chatchat 时需要额外安装对应的 Python 依赖库,因此如需搭配 Xinference 框架使用时,建议使用如下安装方式:
pip install "langchain-chatchat[xinference]" -U
从 0.3.0 版本起,Langchain-Chatchat 不再根据用户输入的本地模型路径直接进行模型加载,涉及到的模型种类包括 LLM、Embedding、Reranker 及后续会提供支持的多模态模型等,均改为支持市面常见的各大模型推理框架接入,如 Xinference、Ollama、LocalAI、FastChat、One API 等。
因此,请确认在启动 Langchain-Chatchat 项目前,首先进行模型推理框架的运行,并加载所需使用的模型。
这里以 Xinference 举例, 请参考 Xinference文档 进行框架部署与模型加载。
[!WARNING]
为避免依赖冲突,请将 Langchain-Chatchat 和模型部署框架如 Xinference 等放在不同的 Python 虚拟环境中, 比如 conda, venv, virtualenv 等。
从 0.3.1 版本起,Langchain-Chatchat 使用本地 yaml 文件的方式进行配置,用户可以直接查看并修改其中的内容,服务器会自动更新无需重启。
- 设置 Chatchat 存储配置文件和数据文件的根目录(可选)
# on linux or macos
export CHATCHAT_ROOT=/path/to/chatchat_data
# on windows
set CHATCHAT_ROOT=/path/to/chatchat_data若不设置该环境变量,则自动使用当前目录。
- 执行初始化
chatchat init该命令会执行以下操作:
- 创建所有需要的数据目录
- 复制 samples 知识库内容
- 生成默认 yaml配置文件
- 修改配置文件
- 
配置模型(model_settings.yaml) 
 需要根据步骤 2. 模型推理框架并加载模型 中选用的模型推理框架与加载的模型进行模型接入配置,具体参考model_settings.yaml中的注释。主要修改以下内容:# 默认选用的 LLM 名称 DEFAULT_LLM_MODEL: qwen1.5-chat # 默认选用的 Embedding 名称 DEFAULT_EMBEDDING_MODEL: bge-large-zh-v1.5 # 将 `LLM_MODEL_CONFIG` 中 `llm_model, action_model` 的键改成对应的 LLM 模型 # 在 `MODEL_PLATFORMS` 中修改对应模型平台信息 
- 
配置知识库路径(basic_settings.yaml)(可选) 
 默认知识库位于CHATCHAT_ROOT/data/knowledge_base,如果你想把知识库放在不同的位置,或者想连接现有的知识库,可以在这里修改对应目录即可。# 知识库默认存储路径 KB_ROOT_PATH: D:\chatchat-test\data\knowledge_base # 数据库默认存储路径。如果使用sqlite,可以直接修改DB_ROOT_PATH;如果使用其它数据库,请直接修改SQLALCHEMY_DATABASE_URI。 DB_ROOT_PATH: D:\chatchat-test\data\knowledge_base\info.db # 知识库信息数据库连接URI SQLALCHEMY_DATABASE_URI: sqlite:///D:\chatchat-test\data\knowledge_base\info.db 
- 
配置知识库(kb_settings.yaml)(可选) 默认使用 FAISS知识库,如果想连接其它类型的知识库,可以修改DEFAULT_VS_TYPE和kbs_config。
[!WARNING]
进行知识库初始化前,请确保已经启动模型推理框架及对应embedding模型,且已按照上述步骤3完成模型接入配置。
chatchat kb -r更多功能可以查看 chatchat kb --help
出现以下日志即为成功:
----------------------------------------------------------------------------------------------------
知识库名称      :samples
知识库类型      :faiss
向量模型:      :bge-large-zh-v1.5
知识库路径      :/root/anaconda3/envs/chatchat/lib/python3.11/site-packages/chatchat/data/knowledge_base/samples
文件总数量      :47
入库文件数      :42
知识条目数      :740
用时            :0:02:29.701002
----------------------------------------------------------------------------------------------------
总计用时        :0:02:33.414425
[!Note] 知识库初始化的常见问题
chatchat start -a出现以下界面即为启动成功:
[!WARNING]
由于 chatchat 配置默认监听地址DEFAULT_BIND_HOST为 127.0.0.1, 所以无法通过其他 ip 进行访问。如需通过机器ip 进行访问(如 Linux 系统), 需要到
basic_settings.yaml中将监听地址修改为 0.0.0.0。
- 数据库对话配置请移步这里 数据库对话配置说明
源码安装部署请参考 开发指南
docker pull chatimage/chatchat:0.3.1.3-93e2c87-20240829
docker pull ccr.ccs.tencentyun.com/langchain-chatchat/chatchat:0.3.1.3-93e2c87-20240829 # 国内镜像[!important] 强烈建议: 使用 docker-compose 部署, 具体参考 README_docker
- 0.3.x 结构改变很大,强烈建议您按照文档重新部署. 以下指南不保证100%兼容和成功. 记得提前备份重要数据!
- 首先按照 安装部署中的步骤配置运行环境,修改配置文件
- 将 0.2.x 项目的 knowledge_base 目录拷贝到配置的 DATA目录下
- 
2023年4月:Langchain-ChatGLM 0.1.0发布,支持基于 ChatGLM-6B 模型的本地知识库问答。
- 
2023年8月:Langchain-ChatGLM改名为Langchain-Chatchat,发布0.2.0版本,使用fastchat作为模型加载方案,支持更多的模型和数据库。
- 
2023年10月:Langchain-Chatchat 0.2.5发布,推出 Agent 内容,开源项目在Founder Park & Zhipu AI & Zilliz举办的黑客马拉松获得三等奖。
- 
2023年12月:Langchain-Chatchat开源项目获得超过 20K stars.
- 
2024年6月:Langchain-Chatchat 0.3.0发布,带来全新项目架构。
- 
🔥 让我们一起期待未来 Chatchat 的故事 ··· 
本项目代码遵循 Apache-2.0 协议。
🎉 Langchain-Chatchat 项目微信交流群,如果你也对本项目感兴趣,欢迎加入群聊参与讨论交流。
🎉 Langchain-Chatchat 项目官方公众号,欢迎扫码关注。
如果本项目有帮助到您的研究,请引用我们:
@software{langchain_chatchat,
    title        = {{langchain-chatchat}},
    author       = {Liu, Qian and Song, Jinke, and Huang, Zhiguo, and Zhang, Yuxuan, and glide-the, and liunux4odoo},
    year         = 2024,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/chatchat-space/Langchain-Chatchat}}
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Langchain-Chatchat
Similar Open Source Tools
 
            
            Langchain-Chatchat
LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.
 
            
            ChatTTS-Forge
ChatTTS-Forge is a powerful text-to-speech generation tool that supports generating rich audio long texts using a SSML-like syntax and provides comprehensive API services, suitable for various scenarios. It offers features such as batch generation, support for generating super long texts, style prompt injection, full API services, user-friendly debugging GUI, OpenAI-style API, Google-style API, support for SSML-like syntax, speaker management, style management, independent refine API, text normalization optimized for ChatTTS, and automatic detection and processing of markdown format text. The tool can be experienced and deployed online through HuggingFace Spaces, launched with one click on Colab, deployed using containers, or locally deployed after cloning the project, preparing models, and installing necessary dependencies.
 
            
            LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.
 
            
            VideoCaptioner
VideoCaptioner is a video subtitle processing assistant based on a large language model (LLM), supporting speech recognition, subtitle segmentation, optimization, translation, and full-process handling. It is user-friendly and does not require high configuration, supporting both network calls and local offline (GPU-enabled) speech recognition. It utilizes a large language model for intelligent subtitle segmentation, correction, and translation, providing stunning subtitles for videos. The tool offers features such as accurate subtitle generation without GPU, intelligent segmentation and sentence splitting based on LLM, AI subtitle optimization and translation, batch video subtitle synthesis, intuitive subtitle editing interface with real-time preview and quick editing, and low model token consumption with built-in basic LLM model for easy use.
 
            
            AIClient-2-API
AIClient-2-API is a versatile and lightweight API proxy designed for developers, providing ample free API request quotas and comprehensive support for various mainstream large models like Gemini, Qwen Code, Claude, etc. It converts multiple backend APIs into standard OpenAI format interfaces through a Node.js HTTP server. The project adopts a modern modular architecture, supports strategy and adapter patterns, comes with complete test coverage and health check mechanisms, and is ready to use after 'npm install'. By easily switching model service providers in the configuration file, any OpenAI-compatible client or application can seamlessly access different large model capabilities through the same API address, eliminating the hassle of maintaining multiple sets of configurations for different services and dealing with incompatible interfaces.
 
            
            agentica
Agentica is a human-centric framework for building large language model agents. It provides functionalities for planning, memory management, tool usage, and supports features like reflection, planning and execution, RAG, multi-agent, multi-role, and workflow. The tool allows users to quickly code and orchestrate agents, customize prompts, and make API calls to various services. It supports API calls to OpenAI, Azure, Deepseek, Moonshot, Claude, Ollama, and Together. Agentica aims to simplify the process of building AI agents by providing a user-friendly interface and a range of functionalities for agent development.
 
            
            build_MiniLLM_from_scratch
This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.
 
            
            Muice-Chatbot
Muice-Chatbot is an AI chatbot designed to proactively engage in conversations with users. It is based on the ChatGLM2-6B and Qwen-7B models, with a training dataset of 1.8K+ dialogues. The chatbot has a speaking style similar to a 2D girl, being somewhat tsundere but willing to share daily life details and greet users differently every day. It provides various functionalities, including initiating chats and offering 5 available commands. The project supports model loading through different methods and provides onebot service support for QQ users. Users can interact with the chatbot by running the main.py file in the project directory.
 
            
            TelegramForwarder
Telegram Forwarder is a message forwarding tool that allows you to forward messages from specified chats to other chats without the need for a bot to enter the corresponding channels/groups to listen. It can be used for information stream integration filtering, message reminders, content archiving, and more. The tool supports multiple sources forwarding, keyword filtering in whitelist and blacklist modes, regular expression matching, message content modification, AI processing using major vendors' AI interfaces, media file filtering, and synchronization with a universal forum blocking plugin to achieve three-end blocking.
 
            
            Element-Plus-X
Element-Plus-X is an out-of-the-box enterprise-level AI component library based on Vue 3 + Element-Plus. It features built-in scenario components such as chatbots and voice interactions, seamless integration with zero configuration based on Element-Plus design system, and support for on-demand loading with Tree Shaking optimization.
 
            
            Chinese-Mixtral-8x7B
Chinese-Mixtral-8x7B is an open-source project based on Mistral's Mixtral-8x7B model for incremental pre-training of Chinese vocabulary, aiming to advance research on MoE models in the Chinese natural language processing community. The expanded vocabulary significantly improves the model's encoding and decoding efficiency for Chinese, and the model is pre-trained incrementally on a large-scale open-source corpus, enabling it with powerful Chinese generation and comprehension capabilities. The project includes a large model with expanded Chinese vocabulary and incremental pre-training code.
 
            
            DownEdit
DownEdit is a powerful program that allows you to download videos from various social media platforms such as TikTok, Douyin, Kuaishou, and more. With DownEdit, you can easily download videos from user profiles and edit them in bulk. You have the option to flip the videos horizontally or vertically throughout the entire directory with just a single click. Stay tuned for more exciting features coming soon!
 
            
            api-for-open-llm
This project provides a unified backend interface for open large language models (LLMs), offering a consistent experience with OpenAI's ChatGPT API. It supports various open-source LLMs, enabling developers to seamlessly integrate them into their applications. The interface features streaming responses, text embedding capabilities, and support for LangChain, a tool for developing LLM-based applications. By modifying environment variables, developers can easily use open-source models as alternatives to ChatGPT, providing a cost-effective and customizable solution for various use cases.
 
            
            md
The WeChat Markdown editor automatically renders Markdown documents as WeChat articles, eliminating the need to worry about WeChat content layout! As long as you know basic Markdown syntax (now with AI, you don't even need to know Markdown), you can create a simple and elegant WeChat article. The editor supports all basic Markdown syntax, mathematical formulas, rendering of Mermaid charts, GFM warning blocks, PlantUML rendering support, ruby annotation extension support, rich code block highlighting themes, custom theme colors and CSS styles, multiple image upload functionality with customizable configuration of image hosting services, convenient file import/export functionality, built-in local content management with automatic draft saving, integration of mainstream AI models (such as DeepSeek, OpenAI, Tongyi Qianwen, Tencent Hanyuan, Volcano Ark, etc.) to assist content creation.
 
            
            HivisionIDPhotos
HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.
 
            
            DownEdit
DownEdit is a fast and powerful program for downloading and editing videos from platforms like TikTok, Douyin, and Kuaishou. It allows users to effortlessly grab videos, make bulk edits, and utilize advanced AI features for generating videos, images, and sounds in bulk. The tool offers features like video, photo, and sound editing, downloading videos without watermarks, bulk AI generation, and AI editing for content enhancement.
For similar tasks
 
            
            LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
 
            
            ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
 
            
            onnxruntime-genai
ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.
 
            
            jupyter-ai
Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.
 
            
            khoj
Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.
 
            
            langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).
 
            
            danswer
Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"
 
            
            infinity
Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.
For similar jobs
 
            
            weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
 
            
            agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
 
            
            oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
 
            
            LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
 
            
            VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
 
            
            kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
 
            
            PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
 
            
            Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
 
             
                





