Awesome-LLM-RAG-Application
the resources about the application based on LLM with RAG pattern
Stars: 687
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
README:
Awesome LLM RAG Application is a curated list of application resources based on LLM with RAG pattern.
- 论文:Retrieval-Augmented Generation for Large Language Models: A Survey
- Advanced RAG Techniques: an Illustrated Overview
- 高级RAG应用构建指南和总结
- Patterns for Building LLM-based Systems & Products
- RAG大全
-
Open RAG Base
- Open RAG Base 是一个基于公开资料收集整理汇总的RAG知识库。它基于Notion构建,是目前最全面RAG的资料汇总仓库。目的是为读者提提供前沿和全面的RAG知识,提供多维度的分析汇总,涵盖RAG的方方面,包括:
- 学术论文
- 前沿阅读资料
- RAG评估与基准
- 下游任务与数据集
- 工具与技术栈
- 研究学者和机构
- ……更多内容即将上线 (e.g. ,专题研讨、示例代码、基线测试)
- Open RAG Base 是一个基于公开资料收集整理汇总的RAG知识库。它基于Notion构建,是目前最全面RAG的资料汇总仓库。目的是为读者提提供前沿和全面的RAG知识,提供多维度的分析汇总,涵盖RAG的方方面,包括:
- 一个繁体的RAG资料集
- 关于RAG技术的综合合集RAG_Techniques
- Microsoft-Retrieval Augmented Generation (RAG) in Azure AI Search
- azure openai design patterns- RAG
- IBM-What is retrieval-augmented generation-IBM
- Amazon-Retrieval Augmented Generation (RAG)
- Nvidia-What Is Retrieval-Augmented Generation?
- Meta-Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models
- Cohere-Introducing Chat with Retrieval-Augmented Generation (RAG)
- Pinecone-Retrieval Augmented Generation
- Milvus-Build AI Apps with Retrieval Augmented Generation (RAG)
- Knowledge Retrieval Takes Center Stage
- Disadvantages of RAG
- Retrieval-Augmented Generation (RAG) or Fine-tuning — Which Is the Best Tool to Boost Your LLM Application?
- 提示工程、RAGs 与微调的对比
- RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?
- A Survey on In-context Learning
- LangChain
- langchain4j
- LlamaIndex
-
GPT-RAG
- GPT-RAG提供了一个强大的架构,专为RAG模式的企业级部署量身定制。它确保了扎实的回应,并建立在零信任安全和负责任的人工智能基础上,确保可用性、可扩展性和可审计性。非常适合正在从探索和PoC阶段过渡到全面生产和MVP的组织。
-
QAnything
- 致力于支持任意格式文件或数据库的本地知识库问答系统,可断网安装使用。任何格式的本地文件都可以往里扔,即可获得准确、快速、靠谱的问答体验。目前已支持格式: PDF,Word(doc/docx),PPT,Markdown,Eml,TXT,图片(jpg,png等),网页链接
-
Quivr
- 您的第二大脑,利用 GenerativeAI 的力量成为您的私人助理!但增强了人工智能功能。
- Quivr
-
Dify
- 融合了 Backend as Service 和 LLMOps 的理念,涵盖了构建生成式 AI 原生应用所需的核心技术栈,包括一个内置 RAG 引擎。使用 Dify,你可以基于任何模型自部署类似 Assistants API 和 GPTs 的能力。
-
Verba
- 这是向量数据库weaviate开源的一款RAG应用,旨在为开箱即用的检索增强生成 (RAG) 提供端到端、简化且用户友好的界面。只需几个简单的步骤,即可在本地或通过 OpenAI、Cohere 和 HuggingFace 等 LLM 提供商轻松探索数据集并提取见解。
-
danswer
- 允许您针对内部文档提出自然语言问题,并获得由源材料中的引用和参考文献支持的可靠答案,以便您始终可以信任您得到的结果。您可以连接到许多常用工具,例如 Slack、GitHub、Confluence 等。
-
RAGFlow
- RAGFlow:基于OCR和文档解析的下一代 RAG 引擎。在文档解析上做了增强,2024年4月1日开源,在数据处理上支持文档结构、图片、表格的深度解析,支持可控分片,可对查询进行深入分析识别关键信息,在检索上提供多路找回/重排能力,界面提供友好的引用参考查看功能。
-
Cognita
- Cognita 在底层使用了Langchain/Llamaindex,并对代码进行了结构化组织,其中每个 RAG 组件都是模块化的、API 驱动的、易于扩展的。Cognita 可在本地设置中轻松使用,同时还能为您提供无代码用户界面支持的生产就绪环境。Cognita 默认还支持增量索引。
-
GraphRAG
- GraphRAG 是一种基于图的检索增强方法,由微软开发并开源。 它通过结合LLM和图机器学习的技术,从非结构化的文本中提取结构化的数据,构建知识图谱,以支持问答、摘要等多种应用场景。
-
Unstructured
- 该库提供了用于摄取和预处理图像和文本文档(如 PDF、HTML、WORD 文档等)的开源组件。 unstructured的使用场景围绕着简化和优化LLM数据处理工作流程, unstructured模块化功能和连接器形成了一个有内聚性的系统,简化了数据摄取和预处理,使其能够适应不同的平台,并有效地将非结构化数据转换为结构化输出。
-
Open Parse
- 对文档进行分块是一项具有挑战性的任务,它支撑着任何 RAG 系统。高质量的结果对于人工智能应用的成功至关重要,但大多数开源库处理复杂文档的能力都受到限制。
- Open Parse 旨在通过提供灵活、易于使用的库来填补这一空白,该库能够直观地识别文档布局并有效地对其进行分块。
-
ExtractThinker
- 使用 LLMs 从文件和文档中提取数据的库。 extract_thinker 在文件和 LLMs 之间提供 ORM 风格的交互,从而实现灵活且强大的文档提取工作流程。
-
OmniParser
- OmniParser 是一个统一的框架,无缝地结合了三个基本的 OCR 任务:文本识别、关键信息提取和表格识别。
-
python-readability
- 给定一个 HTML 文档,提取并清理主体文本和标题。
-
firecrawl
- 将整个网站转变为 LLM 可用的 Markdown 或结构化数据。使用单个 API 进行抓取、爬行和提取。
-
jina-reader
- 它将任何 URL 转换为LLM 友好的输入
-
nougat
- Neural Optical Understanding for Academic Documents.这是学术文档 PDF 解析器,它能理解 LaTeX 数学和表格。但对中文支持不好,需要单独微调。
-
Pix2Struct
- Pix2Struct 是一种预训练的图像到文本模型,专为纯视觉语言理解而设计。
-
Indexify
- Indexify 是一个开源引擎,用于使用可重复使用的提取器进行嵌入、转换和特征提取,为非结构化数据(视频、音频、图像和文档)快速构建数据流水线。当;流水线生成嵌入或结构化数据时,Indexify 会自动更新向量数据库、结构化数据库 (Postgres)。
-
ragas
- Ragas是一个用于评估RAG应用的框架,包括忠诚度(Faithfulness)、答案相关度(Answer Relevance)、上下文精确度(Context Precision)、上下文相关度(Context Relevancy)、上下文召回(Context Recall)
-
tonic_validate
- 一个用于 RAG 开发和实验跟踪的平台,用于评估检索增强生成 (RAG) 应用程序响应质量的指标。
-
deepeval
- 一个简单易用的开源LLM评估框架,适用于LLM应用程序。它与 Pytest 类似,但专门用于单元测试 LLM 应用程序。 DeepEval 使用 LLMs 以及在您的计算机上本地运行的各种其他 NLP 模型,根据幻觉、答案相关性、RAGAS 等指标来评估性能。
-
trulens
- TruLens 提供了一套用于开发和监控神经网络的工具,包括大型语言模型。这包括使用 TruLens-Eval 评估基于 LLMs 和 LLM 的应用程序的工具以及使用 TruLens-Explain 进行深度学习可解释性的工具。 TruLens-Eval 和 TruLens-Explain 位于单独的软件包中,可以独立使用。
-
uptrain
- 用于评估和改进生成式人工智能应用的开源统一平台。提供了20多项预配置检查(涵盖语言、代码、嵌入用例)评分,对失败案例进行根本原因分析,并就如何解决这些问题提出见解。
- 比如prompt注入、越狱检测、整通对话的用户满意度等
- langchain-evaluation
- Llamaindex-evaluation
-
BCEmbedding
- 网易有道开发的双语和跨语种语义表征算法模型库,其中包含 EmbeddingModel和 RerankerModel两类基础模型。EmbeddingModel专门用于生成语义向量,在语义搜索和问答中起着关键作用,而 RerankerModel擅长优化语义搜索结果和语义相关顺序精排。
-
BGE-Embedding
- 北京智源人工智能研究院开源的embeeding通用向量模型,使用retromae 对模型进行预训练,再用对比学习在大规模成对数据上训练模型。
-
bge-reranker-large
- 北京智源人工智能研究院开源,交叉编码器将对查询和答案实时计算相关性分数,这比向量模型(即双编码器)更准确,但比向量模型更耗时。 因此,它可以用来对嵌入模型返回的前k个文档重新排序
-
gte-base-zh
- GTE text embedding GTE中文通用文本表示模型 通义实验室提供
-
- NeMo Guardrails 是一个开源工具包,用于为基于 LLM 的对话应用程序轻松添加可编程的保护轨。Guardrails(简称 "轨")是控制大型语言模型输出的特定方式,例如不谈论政治、以特定方式响应特定用户请求、遵循预定义对话路径、使用特定语言风格、提取结构化数据等。
-
- Guardrails 是一个 Python 框架,通过执行两个关键功能来帮助构建可靠的人工智能应用程序:
- Guardrails 在应用程序中运行输入/输出防护装置,以检测、量化和减轻特定类型风险的存在。要查看全套风险,请访问 Guardrails Hub。
- Guardrails 可帮助您从 LLMs 生成结构化数据。对输入和输出进行检测
- Guardrails 是一个 Python 框架,通过执行两个关键功能来帮助构建可靠的人工智能应用程序:
-
- LLM Guard 是一款旨在增强大型语言模型 (LLMs) 安全性的综合工具。
- 输入(Anonymize 匿名化、BanCode 禁止代码、BanCompetitors 禁止竞争对手、BanSubstrings 禁止子串、BanTopics 禁止话题、PromptInjection 提示词注射 、Toxicity 毒性等)
- 输出(代码、anCompetitors 禁止竞争对手、Deanonymize 去匿名化、JSON、LanguageSame 语言相同、MaliciousURLs 恶意URL、NoRefusal 不可拒绝、FactualConsistency 事实一致性、URLReachability URL可达性等)
- 各个检测功能是利用了huggingface上的各种开源模型
-
- Llama Guard 是一个新的实验模型,可为 LLM 部署提供输入和输出防护栏。Llama Guard 是经过微调的 Llama-7B 模型。
-
- DSPy 是一款功能强大的框架。它可以用来自动优化大型语言模型(LLM)的提示词和响应。还能让我们的 LLM 应用即使在 OpenAI/Gemini/Claude版本升级也能正常使用。无论你有多少数据,它都能帮助你优化模型,获得更高的准确度和性能。通过选择合适的优化器,并根据具体需求进行调优,你可以在各种任务中获得出色的结果。
-
- GenAI 应用程序的自动提示工程助手 YiVal 是一款最先进的工具,旨在简化 GenAI 应用程序提示和循环中任何配置的调整过程。有了 YiVal,手动调整已成为过去。这种以数据驱动和以评估为中心的方法可确保最佳提示、精确的 RAG 配置和微调的模型参数。使用 YiVal 使您的应用程序能够轻松实现增强的结果、减少延迟并最大限度地降低推理成本!
-
- Vanna 是一个MIT许可的开源Python RAG(检索增强生成)框架,用于SQL生成和相关功能。
- Vanna 的工作过程分为两个简单步骤 - 在您的数据上训练 RAG“模型”,然后提出问题,这些问题将返回 SQL 查询。训练的数据主要是一些 DDL schema、业务说明文档以及示例sql等,所谓训练主要是将这些数据embedding化,用于向量检索。
-
- 由前阿里巴巴成员创建并开源,一个智能和多功能的通用SQL客户端和报表工具,集成了ChatGPT功能,,14.3k
-
- 一个可以将自然语言转换为SQL查询的工具,4.1k
-
- 使用AI驱动的数据管理平台,帮助用户将自然语言查询转换为SQL。,3.2k
-
- 一个高效的自然语言到SQL转换工具,支持多种数据库。,1.4k
-
- 由腾讯音乐开源,高性能的SQL生成工具,支持复杂查询的自动生成。1.6k
- vllm
- OpenLLM
-
RAGxplorer
- RAGxplorer 是一种交互式 Streamlit 工具,通过将文档块和的查询问句展示为embedding向量空间中可的视化内容来支持检索增强生成 (RAG) 应用程序的构建。
-
Rule-Based-Retrieval
- rule-based-retrieval是一个 Python 包,使您能够创建和管理具有高级筛选功能的检索增强生成 (RAG) 应用程序。它与用于文本生成的 OpenAI 和用于高效矢量数据库管理的 Pinecone 无缝集成。
-
instructor
- 借助大模型从一段文本中提取为结构化数据的库
1 https://github.com/leptonai/search_with_lepton 2 https://github.com/khoj-ai/khoj 3 https://github.com/YassKhazzan/openperplex_front 4 https://github.com/supermemoryai/opensearch-ai 5 https://github.com/InternLM/MindSearch 6 https://github.com/luyu0279/BrainyAI 7、https://github.com/memfreeme/memfree 8 https://github.com/shadowfax92/Fyin 9 https://github.com/Nutlope/turboseek 10 https://github.com/ItzCrazyKns/Perplexica 11 https://github.com/rashadphz/farfalle 12 https://github.com/yokingma/search_with_ai 13 https://github.com/nashsu/FreeAskInternet 14 https://github.com/jjleng/sensei 15 https://github.com/miurla/morphic 16 https://github.com/nilsherzig/LLocalSearch 17 https://github.com/OcularEngineering/ocular 18 https://github.com/QmiAI/Qmedia?tab=readme-ov-file
-
Kimi Chat
- 支持发送网页链接和上传文件进行回答
-
GPTs
- 支持上传文档进行类似RAG应用
-
百川知识库
- 1.新建知识库后得到知识库 ID;
- 2.上传文件,获取文件 ID;
- 3.通过文件 ID 与知识库 ID 进行知识库文件关联,知识库中可以关联多个文档。
- 4.调用对话接口时通过 knowledge_base 字段传入知识库 ID 列表,大模型使用检索到的知识信息回答问题。
-
COZE
- 应用编辑平台,旨在开发下一代人工智能聊天机器人。无论您是否有编程经验,该平台都可以让您快速创建各种类型的聊天机器人并将其部署在不同的社交平台和消息应用程序上。
-
Devv-ai
- 最懂程序员的新一代 AI 搜索引擎,底层采用了RAG的大模型应用模式,LLM模型为其微调的模型。
- B站大模型×领域RAG:打造高效、智能化的用户服务体验
- 哈啰出行从Copilot到Agent模式的探索-贾立
- 51Talk-AI+Agent+-+在业务增长中的落地实践
- 万科物业科技
- 京东商家助手
- 58同城-灵犀大模型- ppt待提供
- 阿⾥云AI搜索RAG⼤模型优化实践- ppt待提供
- 向量化与文档解析技术加速大模型RAG应用落地 - ppt待提供
- 用户案例|向量引擎在携程酒店搜索中的应用场景和探索
- OpenAI 如何优化 LLM 的效果
- What We Learned from a Year of Building with LLMs 系列
- 构建企业级AI助手的经验教训
- Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models
- Lost in the Middle: How Language Models Use Long Contexts
-
论文-设计检索增强生成系统时的七个故障点
- Seven Failure Points When Engineering a Retrieval Augmented Generation System
- Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
- Bridging the Preference Gap between Retrievers and LLMs
- Tuning Language Models by Proxy
-
Zero-Shot Listwise Document Reranking with a Large Language Model
- 两种重新排序方法:逐点重新排名、列表重新排名。
- 逐点重新排名是给定文档列表,我们将查询+每个文档单独提供给 LLM 并要求它产生相关性分数。
- 列表重新排名是给定文档列表,我们同时向 LLM 提供查询 + 文档列表,并要求它按相关性对文档进行重新排序。
- 建议对 RAG 检索到的文档按列表重新排序,列表重排优于逐点重排。
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- 高级RAG之Self-RAG框架的原理和内部实现
- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
- Corrective Retrieval Augmented Generation
这里列出了一些重要的研究论文,它们揭示了 RAG 领域的关键洞察和最新进展。
洞见 | 参考来源 | 发布日期 |
---|---|---|
提出一种名为纠正检索增强生成(CRAG, Corrective Retrieval Augmented Generation)的方法,旨在提升 RAG 系统生成内容的稳定性和准确性。其核心在于增加一个能够自我修正的组件至检索器中,并优化检索文档的使用,以促进更优质的内容生成。此外,引入了一种检索评估机制,用于评价针对特定查询检索到的文档的整体品质。通过网络搜索和知识的优化利用,能够有效提升文档自我修正和利用的效率。 | 纠正检索增强生成 | 2024年1月 |
RAPTOR 模型通过递归方式嵌入、聚类并总结文本信息,自底向上构建出层次化的总结树。在使用时,该模型能够从这棵树中检索信息,实现对长文档在不同抽象层面上信息的综合利用。 | RAPTOR:递归抽象处理用于树组织检索 | 2024年1月 |
开发了一个通用框架,通过大语言模型(LLM)与检索器之间的多步骤互动,有效处理多标签分类难题。 | 在上下文中学习用于极端多标签分类 | 2024年1月 |
研究表明,通过提取高资源语言中语义相似的提示,可以显著提升多语言预训练语言模型在多种任务上的零样本学习能力。 | 从分类到生成:洞察跨语言检索增强的 ICL | 2023年11月 |
针对 RAGs 模型在处理噪声较多、不相关文档以及未知情境时的稳健性进行了改善,通过为检索文档生成序列化阅读笔记,深入评估其与提问的相关性,并整合信息以构建最终答案。 | 链式笔记:增强检索增强语言模型的鲁棒性 | 2023年11月 |
通过去除可能不会对答案生成贡献关键信息的标记,优化了检索增强阅读模型的处理流程,实现了高达 62.2% 的运行时间缩减,同时保持性能仅降低了2%。 | 通过标记消除优化检索增强阅读器模型 | |
通过对小型语言模型 (LM) 进行指令式微调,我们开发了一个独立的验证器,以验证知识增强语言模型 (knowledge-augmented LMs) 的输出及其知识准确性。这种方法特别有助于解决模型在面对特定查询时未能检索相关知识,或在生成文本中未能准确反映检索到的知识的情况。 | 知识增强语言模型验证 | 2023年10月 |
我们设立了一个基准测试,以分析不同大型语言模型 (LLMs) 在检索增强生成 (RAG) 所需的四项核心能力——噪声容忍、排除不相关信息、信息融合和对反事实情境的适应性——的表现。 | 大型语言模型在检索增强生成中的基准测试 | 2023年10月 |
介绍了一种自我反思的检索增强生成 (Self-RAG) 框架,旨在通过检索和自我反思来提升语言模型的质量和事实性。该框架利用语言模型动态检索信息,并通过反思标记来生成和评估检索到的内容及其自生成内容。 | 自我反思检索增强生成: 通过自我反思学习检索、生成及自我批判 | 2023年10月 |
通过生成增强检索 (GAR) 和检索增强生成 (RAG) 的迭代改善,提高了零样本信息检索的能力。该过程中的改写-检索阶段有效提升了召回率,而重排阶段则显著提高了精度。 | 零样本信息检索中的GAR与RAG相结合的新范式 | 2023年10月 |
通过使用基于 43B GPT 模型的预训练和从 1.2 万亿 Token 中检索信息,我们预训练了一个 48B 的检索模型。进一步通过指令式微调,该模型在多种零样本任务上相比经过指令式微调的 GPT 模型显示出显著的性能提升。 | InstructRetro: 检索增强预训练后的指令式微调 | 2023年10月 |
通过两步精细调整,我们为大型语言模型增加了检索功能:一步是优化预训练的语言模型以更有效利用检索到的信息,另一步则是改进检索器以返回更符合语言模型偏好的相关结果。这种分阶段的微调方法,在要求知识利用和上下文感知的任务中,显著提升了性能。 | 检索增强的双重指令微调 (RA-DIT) | 2023年10月 |
介绍了一种提升 RAGs 在面对不相关内容时鲁棒性的方法。该方法通过在训练期间混合使用相关与不相关的上下文,自动产生数据以微调语言模型,从而有效利用检索到的文段。 | 让基于检索增强的语言模型对无关上下文更加鲁棒 | 2023年10月 |
研究表明,采用简单检索增强技术的 4K 上下文窗口的大语言模型在生成过程中,其表现与通过位置插值对长上下文任务进行微调的 16K 上下文窗口的大语言模型相媲美。 | 当检索遇上长上下文的大语言模型 | 2023年10月 |
在上下文融合前将检索文档压缩为文本摘要,既降低了计算成本,也减轻了模型从长文档中识别关键信息的难度。 | RECOMP: 用压缩和选择性增强提升检索增强语言模型 | 2023年10月 |
提出了一个迭代式的检索与生成协同工作框架,它结合了参数化和非参数化知识,通过检索与生成的互动来寻找正确的推理路径。这一框架特别适合需要多步推理的任务,能够显著提高大语言模型的推理能力。 | 检索与生成的协同作用加强了大语言模型的推理能力 | 2023年10月 |
提出“澄清树”框架,该框架通过少样本提示并借助外部知识,为含糊问题递归构建一个消歧树。然后利用这棵树产生详细的答案。 | 利用检索增强大语言模型回答含糊问题的“澄清树”方法 | 2023年10月 |
介绍了一种使大语言模型能够参考其之前遇到的问题,并在面对新问题时动态调用外部资源的方法。 | 借助自我知识的大语言模型检索增强策略 | 2023年10月 |
提供了一组评估指标,用于从多个维度(如检索系统识别相关及集中上下文段落的能力、大语言模型忠实利用这些段落的能力,以及生成内容本身的质量)评价不同方面,而无需依赖人工注释的真实数据。 | RAGAS: 对检索增强生成进行自动化评估的指标体系 | 2023年9月 |
提出了一种创新方法——生成后阅读(GenRead),它让大型语言模型先根据提问生成相关文档,再从这些文档中提取答案。 | 生成而非检索:大型语言模型作为强大的上下文生成器 | 2023年9月 |
展示了在 RAG 系统中如何使用特定排名器(比如 DiversityRanker 和 LostInTheMiddleRanker)来挑选信息,从而更好地利用大型语言模型的上下文窗口。 | 提升 Haystack 中 RAG 系统的能力:DiversityRanker 和 LostInTheMiddleRanker 的引入 | 2023年8月 |
描述了如何将大型语言模型与不同的知识库结合,以便于知识的检索和储存。通过编程思维的提示来生成知识库的搜索代码,此外,还能够根据用户的需要,将知识储存在个性化的知识库中。 | KnowledGPT: 利用知识库检索和存储功能增强大型语言模型 | 2023年8月 |
提出一种模型,通过结合检索增强掩码语言建模和前缀语言建模,引入上下文融合学习,以此提高少样本学习的效果,使模型能够在不增加训练负担的情况下使用更多上下文示例。 | RAVEN: 借助检索增强编解码器语言模型实现的上下文学习 | 2023年8月 |
RaLLe 是一款开源工具,专门用于开发、评估和提升针对知识密集型任务的 RAG 系统的性能。 | RaLLe: 针对检索增强大型语言模型的开发和评估框架 | 2023年8月 |
研究发现,当相关信息的位置发生变化时,大型语言模型的性能会明显受影响,这揭示了大型语言模型在处理长篇上下文信息时的局限性。 | 中途迷失:大型语言模型处理长篇上下文的方式 | 2023年7月 |
通过迭代的方式,模型能够将检索和生成过程相互协同。模型的输出不仅展示了完成任务所需的内容,还为检索更多相关知识提供了丰富的上下文,从而在下一轮迭代中帮助产生更优的结果。 | 通过迭代检索-生成协同增强检索增强的大语言模型 | 2023年5月 |
介绍了一种新的视角,即在文本生成过程中,系统能够主动决定何时以及检索什么信息。接着,提出了一种名为FLARE的方法,通过预测下一句话来预见未来的内容,利用此内容作为关键词检索相关文档,并在发现不确定的表达时重新生成句子。 | 主动检索增强生成 | 2023年5月 |
提出了一个能够通用应用于各种大语言模型的检索插件,即使在模型未知或不能共同微调的情况下也能提升模型性能。 | 适应增强型检索器改善大语言模型的泛化作为通用插件 | 2023年5月 |
通过两种创新的预训练方法,提高了对结构化数据的密集检索效果。首先,通过对结构化数据和非结构化数据之间的关联进行预训练来提升模型的结构感知能力;其次,通过实现遮蔽实体预测来更好地捕捉结构语义。 | 结构感知的语言模型预训练改善结构化数据上的密集检索 | 2023年5月 |
该框架能够动态地融合来自不同领域的多样化信息源,以提高大语言模型的事实准确性。通过一个自适应的查询生成器,根据不同知识源定制查询,确保信息的准确性逐步得到修正,避免错误信息的累积和传播。 | 知识链:通过动态知识适应异质来源来基础大语言模型 | 2023年5月 |
此框架通过首先检索知识图谱中的相关子图,并通过调整检索到的子图的词嵌入来确保事实的一致性,然后利用对比学习确保生成的对话与知识图谱高度一致,为生成与上下文相关且基于知识的对话提供了新方法。 | 用于知识基础对话生成的知识图谱增强大语言模型 | 2023年5月 |
通过采用小型语言模型作为可训练重写器,以适应黑盒式大语言模型(LLM)的需求。重写器通过强化学习(RL)根据 LLM 的反馈进行训练,从而构建了一个名为“重写-检索-阅读”的新框架,专注于查询优化。 | 为检索增强的大语言模型重写查询 | 2023年5月 |
利用检索增强生成器迭代创建无限记忆池,并通过记忆选择器挑选出适合下一轮生成的记忆。此方法允许模型利用自身产出的记忆,称为“自我记忆”,以提升内容生成质量。 | 自我提升:带有自我记忆的检索增强文本生成 | 2023年5月 |
通过为大语言模型(LLM)装配知识引导模块,让它们在不改变内部参数的情况下,获取相关知识。这一策略显著提高了模型在需要丰富知识的领域任务(如事实知识增加7.9%,表格知识增加11.9%,医学知识增加3.0%,多模态知识增加8.1%)的表现。 | 用参数知识引导增强大语言模型 | 2023年5月 |
为大语言模型(LLM)引入了一个通用的读写记忆单元,允许它们根据任务需要从文本中提取、存储并回忆知识。 | RET-LLM:朝向大语言模型的通用读写记忆 | 2023年5月 |
通过使用任务不可知检索器,构建了一个共享静态索引,有效选出候选证据。随后,设计了一个基于提示的重排机制,根据任务的特定相关性重新排序最相关的证据,为读者提供精准信息。 | 针对非知识密集型任务的提示引导检索增强 | 2023年5月 |
提出了UPRISE(通用提示检索以改善零样本评估),通过调整一个轻量级且多功能的检索器,它能自动为给定零样本任务的输入检索出最合适的提示,以此来改善评估效果。 | UPRISE:改进零样本评估的通用提示检索 | 2023年3月 |
结合了 SLMs 作为过滤器和 LLMs 作为重排器的优势,提出了一个适应性的“过滤-再重排”范式,有效提升了难样本的信息提取与重排效果。 | 大语言模型不是一个理想的少样本信息提取器,但它在重排难样本方面表现出色! | 2023年3月 |
零样本学习指导一款能够遵循指令的大语言模型,创建一个虚拟文档来抓住重要的联系模式。接着,一个名为Contriever的工具会将这份文档转化成嵌入向量,利用这个向量在大数据集的嵌入空间中找到相似文档的聚集地,通过向量的相似度来检索真实文档。 | 无需相关标签的精确零样本密集检索 | 2022年12月 |
提出了一个名为展示-搜索-预测(DSP)的新框架,通过这个框架可以编写高级程序,这些程序能够先展示流程,然后搜索相关信息,并基于这些信息做出预测。它能够将复杂问题分解成小的、更易于解决的步骤。 | 通过检索和语言模型组合,为复杂的自然语言处理任务提供解决方案 | 2022年12月 |
采用了一种新的多步骤问答策略,通过在思维链条的每一步中穿插检索信息,使用检索到的信息来丰富和改善思维链条。这种方法显著提升了解决知识密集型多步问题的效果。 | 结合思维链条推理和信息检索解决复杂多步骤问题 | 2022年12月 |
研究发现,增加检索环节可以有效减轻对已有训练信息的依赖,使得RAG变成一个有效捕捉信息长尾的策略。 | 大语言模型在学习长尾知识方面的挑战 | 2022年11月 |
通过抽样方式,从大语言模型的记忆中提取相关信息段落,进而生成最终答案。 | 通过回忆增强语言模型的能力 | 2022年10月 |
将大语言模型用作少量示例的查询生成器,根据这些生成的数据构建针对特定任务的检索系统。 | Promptagator: 基于少量示例实现密集检索 | 2022年9月 |
介绍了Atlas,这是一个经过预训练的检索增强型语言模型,它能够通过极少数的示例学习掌握知识密集任务。 | Atlas: 借助检索增强型语言模型进行少样本学习 | 2022年8月 |
通过从训练数据中进行智能检索,实现了在多个自然语言生成和理解任务上的性能提升。 | 重新认识训练数据的价值:通过训练数据检索的简单有效方法 | 2022年3月 |
通过在连续的数据存储条目之间建立指针关联,并将这些条目分组成不同的状态,我们近似模拟了数据存储搜索过程。这种方法创造了一个加权有限自动机,在推理时能够在不降低模型预测准确性(困惑度)的情况下,节约高达 83% 的查找最近邻居的计算量。 | 通过自动机增强检索的神经符号语言建模 | 2022 年 1 月 |
通过将自回归语言模型与从大规模文本库中检索的文档块相结合,基于这些文档与前文 Token 的局部相似性,我们实现了模型的显著改进。该策略利用了一个庞大的数据库(2 万亿 Token),大大增强了语言模型的能力。 | 通过从数万亿 Token 中检索来改善语言模型 | 2021 年 12 月 |
我们采用了一种创新的零样本任务处理方法,通过为检索增强生成模型引入严格的负样本和强化训练流程,提升了密集段落检索的效果,用于零样本槽填充任务。 | 用于零样本槽填充的鲁棒检索增强生成 | 2021 年 8 月 |
介绍了 RAG 模型,这是一种结合了预训练的 seq2seq 模型(作为参数记忆)和基于密集向量索引的 Wikipedia(作为非参数记忆)的模型。此模型通过预训练的神经网络检索器访问信息,比较了两种 RAG 设计:一种是在生成过程中始终依赖相同检索的段落,另一种则是每个 Token 都使用不同的段落。 | 用于知识密集型 NLP 任务的检索增强生成 | 2020 年 5 月 |
展示了一种仅通过密集表示实现信息检索的方法,该方法通过简单的双编码框架从少量问题和文本段落中学习嵌入。这种方法为开放域问答提供了一种高效的密集段落检索方案。 | 用于开放域问答的密集段落检索 | 2020 年 4 月 |
- From Good to Great: How Pre-processing Documents Supercharges AI’s Output
- Advanced RAG 02: Unveiling PDF Parsing
- Advanced RAG 07: Exploring RAG for Tables
- 5 Levels Of Text Splitting
- Advanced RAG series: Indexing
- Advanced RAG 06: Exploring Query Rewriting
- Advanced RAG 11: Query Classification and Refinement
- Advanced RAG Series:Routing and Query Construction
- Query Transformations
- Query Construction
-
Foundations of Vector Retrieval
- 这本200多页的专题论文提供了向量检索文献中主要算法里程碑的总结,目的是作为新老研究者可以独立参考的资料。
- Improving Retrieval Performance in RAG Pipelines with Hybrid Search
- Multi-Vector Retriever for RAG on tables, text, and images
- Relevance and ranking in vector search
- Boosting RAG: Picking the Best Embedding & Reranker models
- Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities
- Optimizing Retrieval Augmentation with Dynamic Top-K Tuning for Efficient Question Answering
- Building Production-Ready LLM Apps with LlamaIndex: Document Metadata for Higher Accuracy Retrieval
- dvanced RAG Series: Retrieval
-
How to Cut RAG Costs by 80% Using Prompt Compression
- 第一种压缩方法是 AutoCompressors。它的工作原理是将长文本汇总为短向量表示,称为汇总向量。然后,这些压缩的摘要向量充当模型的软提示。
- LangChain Contextual Compression
-
Bridging the rift in Retrieval Augmented Generation
- 不是直接微调检索器和语言模型等效果不佳的基础模块,而是引入了第三个参与者——位于现有组件之间的中间桥接模块。涉及技术包括排序、压缩、上下文框架、条件推理脚手架、互动询问等 (可参考后续论文)
- Evaluating RAG Applications with RAGAs
- Best Practices for LLM Evaluation of RAG Applications
- Advanced RAG 03: Using RAGAs + LlamaIndex for RAG evaluation
- Exploring End-to-End Evaluation of RAG Pipelines
- Evaluating Multi-Modal Retrieval-Augmented Generation
- RAG Evaluation
-
Evaluation - LlamaIndex
- 评估-LlamaIndex
-
Pinecone的RAG评测
- 不同数据规模下不同模型的RAG忠实度效果
- 不同模型下使用RAG与不是用RAG(仅依靠内部知识)的忠实度效果
- 不同模型下结合内部和外部知识后的RAG忠实度效果
- 不同模型下的RAG的答案相关度效果
- zilliz:Optimizing RAG Applications: A Guide to Methodologies, Metrics, and Evaluation Tools for Enhanced Reliability
- Advanced RAG Series: Generation and Evaluation
- 短课程 Building and Evaluating Advanced RAG Applications
- Retrieval Augmented Generation for Production with LangChain & LlamaIndex
- A Survey of Techniques for Maximizing LLM Performance
- How do domain-specific chatbots work? An overview of retrieval augmented generation (RAG)
- nvidia:Augmenting LLMs Using Retrieval Augmented Generation
- How to Choose a Vector Database
-
中文大模型相关汇总
- 包括数据、微调、推理、评估、体验、RAG、Agent、搜索、书籍和课程等方面的资源:
- Large Language Model (LLM) Disruption of Chatbots
- Gen AI: why does simple Retrieval Augmented Generation (RAG) not work for insurance?
- End-to-End LLMOps Platform
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLM-RAG-Application
Similar Open Source Tools
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
ChatGPT-Next-Web-Pro
ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.
ai_quant_trade
The ai_quant_trade repository is a comprehensive platform for stock AI trading, offering learning, simulation, and live trading capabilities. It includes features such as factor mining, traditional strategies, machine learning, deep learning, reinforcement learning, graph networks, and high-frequency trading. The repository provides tools for monitoring stocks, stock recommendations, and deployment tools for live trading. It also features new functionalities like sentiment analysis using StructBERT, reinforcement learning for multi-stock trading with a 53% annual return, automatic factor mining with 5000 factors, customized stock monitoring software, and local deep reinforcement learning strategies.
how-to-optim-algorithm-in-cuda
This repository documents how to optimize common algorithms based on CUDA. It includes subdirectories with code implementations for specific optimizations. The optimizations cover topics such as compiling PyTorch from source, NVIDIA's reduce optimization, OneFlow's elementwise template, fast atomic add for half data types, upsample nearest2d optimization in OneFlow, optimized indexing in PyTorch, OneFlow's softmax kernel, linear attention optimization, and more. The repository also includes learning resources related to deep learning frameworks, compilers, and optimization techniques.
END-TO-END-GENERATIVE-AI-PROJECTS
The 'END TO END GENERATIVE AI PROJECTS' repository is a collection of awesome industry projects utilizing Large Language Models (LLM) for various tasks such as chat applications with PDFs, image to speech generation, video transcribing and summarizing, resume tracking, text to SQL conversion, invoice extraction, medical chatbot, financial stock analysis, and more. The projects showcase the deployment of LLM models like Google Gemini Pro, HuggingFace Models, OpenAI GPT, and technologies such as Langchain, Streamlit, LLaMA2, LLaMAindex, and more. The repository aims to provide end-to-end solutions for different AI applications.
lawyer-llama
Lawyer LLaMA is a large language model that has been specifically trained on legal data, including Chinese laws, regulations, and case documents. It has been fine-tuned on a large dataset of legal questions and answers, enabling it to understand and respond to legal inquiries in a comprehensive and informative manner. Lawyer LLaMA is designed to assist legal professionals and individuals with a variety of law-related tasks, including: * **Legal research:** Quickly and efficiently search through vast amounts of legal information to find relevant laws, regulations, and case precedents. * **Legal analysis:** Analyze legal issues, identify potential legal risks, and provide insights on how to proceed. * **Document drafting:** Draft legal documents, such as contracts, pleadings, and legal opinions, with accuracy and precision. * **Legal advice:** Provide general legal advice and guidance on a wide range of legal matters, helping users understand their rights and options. Lawyer LLaMA is a powerful tool that can significantly enhance the efficiency and effectiveness of legal research, analysis, and decision-making. It is an invaluable resource for lawyers, paralegals, law students, and anyone else who needs to navigate the complexities of the legal system.
ChuanhuChatGPT
Chuanhu Chat is a user-friendly web graphical interface that provides various additional features for ChatGPT and other language models. It supports GPT-4, file-based question answering, local deployment of language models, online search, agent assistant, and fine-tuning. The tool offers a range of functionalities including auto-solving questions, online searching with network support, knowledge base for quick reading, local deployment of language models, GPT 3.5 fine-tuning, and custom model integration. It also features system prompts for effective role-playing, basic conversation capabilities with options to regenerate or delete dialogues, conversation history management with auto-saving and search functionalities, and a visually appealing user experience with themes, dark mode, LaTeX rendering, and PWA application support.
Nocode-Wep
Nocode/WEP is a forward-looking office visualization platform that includes modules for document building, web application creation, presentation design, and AI capabilities for office scenarios. It supports features such as configuring bullet comments, global article comments, multimedia content, custom drawing boards, flowchart editor, form designer, keyword annotations, article statistics, custom appreciation settings, JSON import/export, content block copying, and unlimited hierarchical directories. The platform is compatible with major browsers and aims to deliver content value, iterate products, share technology, and promote open-source collaboration.
AI-Competition-Collections
AI-Competition-Collections is a repository that collects and curates various experiences and tips from AI competitions. It includes posts on competition experiences in computer vision, NLP, speech, and other AI-related fields. The repository aims to provide valuable insights and techniques for individuals participating in AI competitions, covering topics such as image classification, object detection, OCR, adversarial attacks, and more.
Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.
llm-resource
llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.
Speech-AI-Forge
Speech-AI-Forge is a project developed around TTS generation models, implementing an API Server and a WebUI based on Gradio. The project offers various ways to experience and deploy Speech-AI-Forge, including online experience on HuggingFace Spaces, one-click launch on Colab, container deployment with Docker, and local deployment. The WebUI features include TTS model functionality, speaker switch for changing voices, style control, long text support with automatic text segmentation, refiner for ChatTTS native text refinement, various tools for voice control and enhancement, support for multiple TTS models, SSML synthesis control, podcast creation tools, voice creation, voice testing, ASR tools, and post-processing tools. The API Server can be launched separately for higher API throughput. The project roadmap includes support for various TTS models, ASR models, voice clone models, and enhancer models. Model downloads can be manually initiated using provided scripts. The project aims to provide inference services and may include training-related functionalities in the future.
ZhiLight
ZhiLight is a highly optimized large language model (LLM) inference engine developed by Zhihu and ModelBest Inc. It accelerates the inference of models like Llama and its variants, especially on PCIe-based GPUs. ZhiLight offers significant performance advantages compared to mainstream open-source inference engines. It supports various features such as custom defined tensor and unified global memory management, optimized fused kernels, support for dynamic batch, flash attention prefill, prefix cache, and different quantization techniques like INT8, SmoothQuant, FP8, AWQ, and GPTQ. ZhiLight is compatible with OpenAI interface and provides high performance on mainstream NVIDIA GPUs with different model sizes and precisions.
BlueLM
BlueLM is a large-scale pre-trained language model developed by vivo AI Global Research Institute, featuring 7B base and chat models. It includes high-quality training data with a token scale of 26 trillion, supporting both Chinese and English languages. BlueLM-7B-Chat excels in C-Eval and CMMLU evaluations, providing strong competition among open-source models of similar size. The models support 32K long texts for better context understanding while maintaining base capabilities. BlueLM welcomes developers for academic research and commercial applications.
For similar tasks
dify
Dify is an open-source LLM app development platform that combines AI workflow, RAG pipeline, agent capabilities, model management, observability features, and more. It allows users to quickly go from prototype to production. Key features include: 1. Workflow: Build and test powerful AI workflows on a visual canvas. 2. Comprehensive model support: Seamless integration with hundreds of proprietary / open-source LLMs from dozens of inference providers and self-hosted solutions. 3. Prompt IDE: Intuitive interface for crafting prompts, comparing model performance, and adding additional features. 4. RAG Pipeline: Extensive RAG capabilities that cover everything from document ingestion to retrieval. 5. Agent capabilities: Define agents based on LLM Function Calling or ReAct, and add pre-built or custom tools. 6. LLMOps: Monitor and analyze application logs and performance over time. 7. Backend-as-a-Service: All of Dify's offerings come with corresponding APIs for easy integration into your own business logic.
intro-to-intelligent-apps
This repository introduces and helps organizations get started with building AI Apps and incorporating Large Language Models (LLMs) into them. The workshop covers topics such as prompt engineering, AI orchestration, and deploying AI apps. Participants will learn how to use Azure OpenAI, Langchain/ Semantic Kernel, Qdrant, and Azure AI Search to build intelligent applications.
runhouse
Runhouse is a tool that allows you to build, run, and deploy production-quality AI apps and workflows on your own compute. It provides simple, powerful APIs for the full lifecycle of AI development, from research to evaluation to production to updates to scaling to management, and across any infra. By automatically packaging your apps into scalable, secure, and observable services, Runhouse can also turn otherwise redundant AI activities into common reusable components across your team or company, which improves cost, velocity, and reproducibility.
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
sdfx
SDFX is the ultimate no-code platform for building and sharing AI apps with beautiful UI. It enables the creation of user-friendly interfaces for complex workflows by combining Comfy workflow with a UI. The tool is designed to merge the benefits of form-based UI and graph-node based UI, allowing users to create intricate graphs with a high-level UI overlay. SDFX is fully compatible with ComfyUI, abstracting the need for installing ComfyUI. It offers features like animated graph navigation, node bookmarks, UI debugger, custom nodes manager, app and template export, image and mask editor, and more. The tool compiles as a native app or web app, making it easy to maintain and add new features.
Build-Modern-AI-Apps
This repository serves as a hub for Microsoft Official Build & Modernize AI Applications reference solutions and content. It provides access to projects demonstrating how to build Generative AI applications using Azure services like Azure OpenAI, Azure Container Apps, Azure Kubernetes, and Azure Cosmos DB. The solutions include Vector Search & AI Assistant, Real-Time Payment and Transaction Processing, and Medical Claims Processing. Additionally, there are workshops like the Intelligent App Workshop for Microsoft Copilot Stack, focusing on infusing intelligence into traditional software systems using foundation models and design thinking.
RAG_Hack
RAGHack is a hackathon focused on building AI applications using the power of RAG (Retrieval Augmented Generation). RAG combines large language models with search engine knowledge to provide contextually relevant answers. Participants can learn to build RAG apps on Azure AI using various languages and retrievers, explore frameworks like LangChain and Semantic Kernel, and leverage technologies such as agents and vision models. The hackathon features live streams, hack submissions, and prizes for innovative projects.
awesome-llm-webapps
This repository is a curated list of open-source, actively maintained web applications that leverage large language models (LLMs) for various use cases, including chatbots, natural language interfaces, assistants, and question answering systems. The projects are evaluated based on key criteria such as licensing, maintenance status, complexity, and features, to help users select the most suitable starting point for their LLM-based applications. The repository welcomes contributions and encourages users to submit projects that meet the criteria or suggest improvements to the existing list.
For similar jobs
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
ChatGPT-On-CS
ChatGPT-On-CS is an intelligent chatbot tool based on large models, supporting various platforms like WeChat, Taobao, Bilibili, Douyin, Weibo, and more. It can handle text, voice, and image inputs, access external resources through plugins, and customize enterprise AI applications based on proprietary knowledge bases. Users can set custom replies, utilize ChatGPT interface for intelligent responses, send images and binary files, and create personalized chatbots using knowledge base files. The tool also features platform-specific plugin systems for accessing external resources and supports enterprise AI applications customization.
call-gpt
Call GPT is a voice application that utilizes Deepgram for Speech to Text, elevenlabs for Text to Speech, and OpenAI for GPT prompt completion. It allows users to chat with ChatGPT on the phone, providing better transcription, understanding, and speaking capabilities than traditional IVR systems. The app returns responses with low latency, allows user interruptions, maintains chat history, and enables GPT to call external tools. It coordinates data flow between Deepgram, OpenAI, ElevenLabs, and Twilio Media Streams, enhancing voice interactions.
awesome-LLM-resourses
A comprehensive repository of resources for Chinese large language models (LLMs), including data processing tools, fine-tuning frameworks, inference libraries, evaluation platforms, RAG engines, agent frameworks, books, courses, tutorials, and tips. The repository covers a wide range of tools and resources for working with LLMs, from data labeling and processing to model fine-tuning, inference, evaluation, and application development. It also includes resources for learning about LLMs through books, courses, and tutorials, as well as insights and strategies from building with LLMs.
tappas
Hailo TAPPAS is a set of full application examples that implement pipeline elements and pre-trained AI tasks. It demonstrates Hailo's system integration scenarios on predefined systems, aiming to accelerate time to market, simplify integration with Hailo's runtime SW stack, and provide a starting point for customers to fine-tune their applications. The tool supports both Hailo-15 and Hailo-8, offering various example applications optimized for different common hosts. TAPPAS includes pipelines for single network, two network, and multi-stream processing, as well as high-resolution processing via tiling. It also provides example use case pipelines like License Plate Recognition and Multi-Person Multi-Camera Tracking. The tool is regularly updated with new features, bug fixes, and platform support.
cloudflare-rag
This repository provides a fullstack example of building a Retrieval Augmented Generation (RAG) app with Cloudflare. It utilizes Cloudflare Workers, Pages, D1, KV, R2, AI Gateway, and Workers AI. The app features streaming interactions to the UI, hybrid RAG with Full-Text Search and Vector Search, switchable providers using AI Gateway, per-IP rate limiting with Cloudflare's KV, OCR within Cloudflare Worker, and Smart Placement for workload optimization. The development setup requires Node, pnpm, and wrangler CLI, along with setting up necessary primitives and API keys. Deployment involves setting up secrets and deploying the app to Cloudflare Pages. The project implements a Hybrid Search RAG approach combining Full Text Search against D1 and Hybrid Search with embeddings against Vectorize to enhance context for the LLM.
pixeltable
Pixeltable is a Python library designed for ML Engineers and Data Scientists to focus on exploration, modeling, and app development without the need to handle data plumbing. It provides a declarative interface for working with text, images, embeddings, and video, enabling users to store, transform, index, and iterate on data within a single table interface. Pixeltable is persistent, acting as a database unlike in-memory Python libraries such as Pandas. It offers features like data storage and versioning, combined data and model lineage, indexing, orchestration of multimodal workloads, incremental updates, and automatic production-ready code generation. The tool emphasizes transparency, reproducibility, cost-saving through incremental data changes, and seamless integration with existing Python code and libraries.
wave-apps
Wave Apps is a directory of sample applications built on H2O Wave, allowing users to build AI apps faster. The apps cover various use cases such as explainable hotel ratings, human-in-the-loop credit risk assessment, mitigating churn risk, online shopping recommendations, and sales forecasting EDA. Users can download, modify, and integrate these sample apps into their own projects to learn about app development and AI model deployment.