![kernel-memory](/statics/github-mark.png)
kernel-memory
RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.
Stars: 1732
![screenshot](/screenshots_githubs/microsoft-kernel-memory.jpg)
Kernel Memory (KM) is a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing. KM is available as a Web Service, as a Docker container, a Plugin for ChatGPT/Copilot/Semantic Kernel, and as a .NET library for embedded applications. Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources. Designed for seamless integration as a Plugin with Semantic Kernel, Microsoft Copilot and ChatGPT, Kernel Memory enhances data-driven features in applications built for most popular AI platforms.
README:
This repository presents best practices and a reference implementation for Memory in specific AI and LLMs application scenarios. Please note that the code provided serves as a demonstration and is not an officially supported Microsoft offering.
Kernel Memory (KM) is a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing.
KM is available as a Web Service, as a Docker container, a Plugin for ChatGPT/Copilot/Semantic Kernel, and as a .NET library for embedded applications.
Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources.
Kernel Memory is designed for seamless integration as a Plugin with Semantic Kernel, Microsoft Copilot and ChatGPT.
Kernel Memory can be deployed in various configurations, including as a Service in Azure. To learn more about deploying Kernel Memory in Azure, please refer to the Azure deployment guide. For detailed instructions on deploying to Azure, you can check the infrastructure documentation.
If you are already familiar with these resources, you can quickly deploy by clicking the following button.
๐ See also: Kernel Memory via Docker and Serverless Kernel Memory with Azure services example.
Kernel Memory can be easily run and imported in other projects also via .NET Aspire. For example:
var builder = DistributedApplication.CreateBuilder();
builder.AddContainer("kernel-memory", "kernelmemory/service")
.WithEnvironment("KernelMemory__TextGeneratorType", "OpenAI")
.WithEnvironment("KernelMemory__DataIngestion__EmbeddingGeneratorTypes__0", "OpenAI")
.WithEnvironment("KernelMemory__Retrieval__EmbeddingGeneratorType", "OpenAI")
.WithEnvironment("KernelMemory__Services__OpenAI__APIKey", "...your OpenAI key...");
builder.Build().Run();
The example show the default documents ingestion pipeline:
- Extract text: automatically recognize the file format and extract the information
- Partition the text in small chunks, ready for search and RAG prompts
- Extract embeddings using any LLM embedding generator
- Save embeddings into a vector index such as Azure AI Search, Qdrant or other DBs.
The example shows how to safeguard private information specifying who owns each document, and how to organize data for search and faceted navigation, using Tags.
#r "nuget: Microsoft.KernelMemory.WebClient" var memory = new MemoryWebClient("http://127.0.0.1:9001"); // <== URL of KM web service // Import a file await memory.ImportDocumentAsync("meeting-transcript.docx"); // Import a file specifying Document ID and Tags await memory.ImportDocumentAsync("business-plan.docx", new Document("doc01") .AddTag("user", "[email protected]") .AddTag("collection", "business") .AddTag("collection", "plans") .AddTag("fiscalYear", "2025"));
import requests # Files to import files = { "file1": ("business-plan.docx", open("business-plan.docx", "rb")), } # Tags to apply, used by queries to filter memory data = { "documentId": "doc01", "tags": [ "user:[email protected]", "collection:business", "collection:plans", "fiscalYear:2025" ] } response = requests.post("http://127.0.0.1:9001/upload", files=files, data=data)
var memory = new KernelMemoryBuilder() .WithOpenAIDefaults(Environment.GetEnvironmentVariable("OPENAI_API_KEY")) .Build<MemoryServerless>(); // Import a file await memory.ImportDocumentAsync("meeting-transcript.docx"); // Import a file specifying Document ID and Tags await memory.ImportDocumentAsync("business-plan.docx", new Document("doc01") .AddTag("collection", "business") .AddTag("collection", "plans") .AddTag("fiscalYear", "2025"));
Asking questions, running RAG prompts, and filtering by user and other criteria is simple, with answers including citations and all the information needed to verify their accuracy, pointing to which documents ground the response.
Questions can be asked targeting the entire memory set, or a subset using filters, e.g. to implement security filters.
var answer1 = await memory.AskAsync("How many people attended the meeting?"); var answer2 = await memory.AskAsync("what's the project timeline?", filter: MemoryFilters.ByTag("user", "[email protected]"));
When generating answers with LLMs, the result includes a token usage report.
foreach (var report in tokenUsage) { Console.WriteLine($"{report.ServiceType}: {report.ModelName} ({report.ModelType})"); Console.WriteLine($"- Input : {report.ServiceTokensIn}"); Console.WriteLine($"- Output: {report.ServiceTokensOut}"); }Azure OpenAI: gpt-4o (TextGeneration)
- Input : 24356 tokens
- Output: 103 tokens
await memory.ImportFileAsync("NASA-news.pdf"); var answer = await memory.AskAsync("Any news from NASA about Orion?"); Console.WriteLine(answer.Result + "/n"); foreach (var x in answer.RelevantSources) { Console.WriteLine($" * {x.SourceName} -- {x.Partitions.First().LastUpdate:D}"); }Yes, there is news from NASA about the Orion spacecraft. NASA has invited the media to see a new test version [......] For more information about the Artemis program, you can visit the NASA website.
- NASA-news.pdf -- Tuesday, August 1, 2023
import requests import json data = { "question": "what's the project timeline?", "filters": [ {"user": ["[email protected]"]} ] } response = requests.post( "http://127.0.0.1:9001/ask", headers={"Content-Type": "application/json"}, data=json.dumps(data), ).json() print(response["text"])
curl http://127.0.0.1:9001/ask -d'{"query":"Any news from NASA about Orion?"}' -H 'Content-Type: application/json'
{ "Query": "Any news from NASA about Orion?", "Text": "Yes, there is news from NASA about the Orion spacecraft. NASA has invited the media to see a new test version [......] For more information about the Artemis program, you can visit the NASA website.", "RelevantSources": [ { "Link": "...", "SourceContentType": "application/pdf", "SourceName": "file5-NASA-news.pdf", "Partitions": [ { "Text": "Skip to main content\nJul 28, 2023\nMEDIA ADVISORY M23-095\nNASA Invites Media to See Recovery Craft for\nArtemis Moon Mission\n(/sites/default/๏ฌles/thumbnails/image/ksc-20230725-ph-fmx01_0003orig.jpg)\nAboard the [......] to Mars (/topics/moon-to-\nmars/),Orion Spacecraft (/exploration/systems/orion/index.html)\nNASA Invites Media to See Recovery Craft for Artemis Moon Miss... https://www.nasa.gov/press-release/nasa-invites-media-to-see-recov...\n2 of 3 7/28/23, 4:51 PM", "Relevance": 0.8430657, "SizeInTokens": 863, "LastUpdate": "2023-08-01T08:15:02-07:00" } ] } ] }
The OpenAPI schema ("swagger") is available at http://127.0.0.1:9001/swagger/index.html when running the service locally with OpenAPI enabled. Here's a copy.
๐ See also:
If you want to give the service a quick test, use the following command to start the Kernel Memory Service using OpenAI:
docker run -e OPENAI_API_KEY="..." -it --rm -p 9001:9001 kernelmemory/service
If you prefer using custom settings and services such as Azure OpenAI, Azure
Document Intelligence, etc., you should create an appsettings.Development.json
file overriding the default values set in appsettings.json
, or using the
configuration wizard included:
cd service/Service
dotnet run setup
Then run this command to start the Docker image with the configuration just created:
on Windows:
docker run --volume .\appsettings.Development.json:/app/appsettings.Production.json -it --rm -p 9001:9001 kernelmemory/service
on Linux / macOS:
docker run --volume ./appsettings.Development.json:/app/appsettings.Production.json -it --rm -p 9001:9001 kernelmemory/service
๐ See also:
Depending on your scenarios, you might want to run all the code remotely through an asynchronous and scalable service, or locally inside your process.
If you're importing small files, and use only .NET and can block the application process while importing documents, then local-in-process execution can be fine, using the MemoryServerless described below.
However, if you are in one of these scenarios:
- My app is written in TypeScript, Java, Rust, or some other language
- I'd just like a web service to import data and send questions to answer
- I'm importing big documents that can require minutes to process, and I don't want to block the user interface
- I need memory import to run independently, supporting failures and retry logic
- I want to define custom pipelines mixing multiple languages like Python, TypeScript, etc
then you're likely looking for a Memory Service, and you can deploy Kernel Memory as a backend service, using the default ingestion logic, or your custom workflow including steps coded in Python/TypeScript/Java/etc., leveraging the asynchronous non-blocking memory encoding process, uploading documents and asking questions using the MemoryWebClient.
Here you can find a complete set of instruction about how to run the Kernel Memory service.
Kernel Memory works and scales at best when running as an asynchronous Web Service, allowing to ingest thousands of documents and information without blocking your app.
However, Kernel Memory can also run in serverless mode, embedding MemoryServerless
class instance
in .NET backend/console/desktop apps in synchronous mode.
Each request is processed immediately, although calling clients are responsible for handling
transient errors.
Kernel Memory relies on external services to run stateful pipelines, store data, handle embeddings, and generate text responses. The project includes extensions that allow customization of file storage, queues, vector stores, and LLMs to fit specific requirements.
- AI: Azure OpenAI, OpenAI, ONNX, Ollama, Anthropic, Azure AI Document Intelligence, Azure AI Content Safety
- Vector Store: Azure AI Search, Postgres, SQL Server, Elasticsearch, Qdrant, Redis, MongoDB Atlas, In memory store
- File Storage: Azure Blob storage, AWS S3, MongoDB Atlas, Local disk, In memory storage
- Ingestion pipelines: Azure Queues, RabbitMQ, In memory queues
Document ingestion operates as a stateful pipeline, executing steps in a defined sequence. By default, Kernel Memory employs a pipeline to extract text, chunk content, vectorize, and store data.
If you need a custom data pipeline, you can modify the sequence, add new steps, or replace existing ones by providing custom โhandlersโ for each desired stage. This allows complete flexibility in defining how data is processed. For example:
// Memory setup, e.g. how to calculate and where to store embeddings
var memoryBuilder = new KernelMemoryBuilder()
.WithoutDefaultHandlers()
.WithOpenAIDefaults(Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
var memory = memoryBuilder.Build();
// Plug in custom .NET handlers
memory.Orchestrator.AddHandler<MyHandler1>("step1");
memory.Orchestrator.AddHandler<MyHandler2>("step2");
memory.Orchestrator.AddHandler<MyHandler3>("step3");
// Use the custom handlers with the memory object
await memory.ImportDocumentAsync(
new Document("mytest001")
.AddFile("file1.docx")
.AddFile("file2.pdf"),
steps: new[] { "step1", "step2", "step3" });
Semantic Kernel is an SDK for C#, Python, and Java used to develop solutions with AI. SK includes libraries that wrap direct calls to databases, supporting vector search.
Semantic Kernel is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.
Kernel Memory (KM) is a SERVICE built on Semantic Kernel, with additional features developed for RAG, Security, and Cloud deployment. As a service, KM can be used from any language, tool, or platform, e.g. browser extensions and ChatGPT assistants.
Kernel Memory provides several features out of the scope of Semantic Kernel, that would usually be developed manually, such as storing files, extracting text from documents, providing a framework to secure users' data, content moderation etc.
Kernel Memory is also leveraged to explore new AI patterns, which sometimes are backported to Semantic Kernel and Microsoft libraries, for instance vector stores flexible schemas, advanced filtering, authentications.
Here's comparison table:
Feature | Kernel Memory | Semantic Kernel |
---|---|---|
Runtime | Memory as a Service, Web service | SDK packages |
Data formats | Web pages, PDF, Images, Word, PowerPoint, Excel, Markdown, Text, JSON | Text only |
Language support | Any language | .NET, Python, Java |
RAG | Yes | - |
Cloud deployment | Yes | - |
- Collection of Jupyter notebooks with various scenarios
- Using Kernel Memory web service to upload documents and answer questions
- Importing files and asking question without running the service (serverless mode)
- Kernel Memory RAG with Azure services
- Kernel Memory with .NET Aspire
- Using KM Plugin for Semantic Kernel
- Customizations
- Processing files with custom logic (custom handlers) in serverless mode
- Processing files with custom logic (custom handlers) in asynchronous mode
- Customizing RAG and summarization prompts
- Custom partitioning/text chunking options
- Using a custom embedding/vector generator
- Using custom content decoders
- Using a custom web scraper to fetch web pages
- Writing and using a custom ingestion handler
- Using Context Parameters to customize RAG prompt during a request
- Local models and external connectors
- Upload files and ask questions from command line using curl
- Summarizing documents, using synthetic memories
- Hybrid Search with Azure AI Search
- Running a single asynchronous pipeline handler as a standalone service
- Integrating Memory with ASP.NET applications and controllers
- Sample code showing how to extract text from files
- .NET configuration and logging
- Expanding chunks retrieving adjacent partitions
- Creating a Memory instance without KernelMemoryBuilder
- Intent Detection
- Fetching data from Discord
- Test project using KM package from nuget.org
- .NET appsettings.json generator
- Curl script to upload files
- Curl script to ask questions
- Curl script to search documents
- Script to start Qdrant for development tasks
- Script to start Elasticsearch for development tasks
- Script to start MS SQL Server for development tasks
- Script to start Redis for development tasks
- Script to start RabbitMQ for development tasks
- Script to start MongoDB Atlas for development tasks
-
Microsoft.KernelMemory.WebClient: .NET web client to call a running instance of Kernel Memory web service.
-
Microsoft.KernelMemory: Kernel Memory library including all extensions and clients, it can be used to build custom pipelines and handlers. It contains also the serverless client to use memory in a synchronous way without the web service.
-
Microsoft.KernelMemory.Service.AspNetCore: an extension to load Kernel Memory into your ASP.NET apps.
-
Microsoft.KernelMemory.SemanticKernelPlugin: a Memory plugin for Semantic Kernel, replacing the original Semantic Memory available in SK.
-
Microsoft.KernelMemory.* packages: Kernel Memory Core and all KM extensions split into distinct packages.
Kernel Memory service offers a Web API out of the box, including the OpenAPI swagger documentation that you can leverage to test the API and create custom web clients. For instance, after starting the service locally, see http://127.0.0.1:9001/swagger/index.html.
A .NET Web Client and a Semantic Kernel plugin are available, see the nugets packages above.
For Python, TypeScript, Java and other languages we recommend leveraging the Web Service. We also welcome PR contributions to support more languages.
aaronpowell | afederici75 | akordowski | alexibraimov | alexmg | alkampfergit |
amomra | anthonypuppo | aportillo83 | carlodek | chaelli | cherchyk |
coryisakson | crickman | dependabot[bot] | dluc | DM-98 | EelcoKoster |
Foorcee | GraemeJones104 | imranshams | jurepurgar | JustinRidings | kbeaugrand |
koteus | KSemenenko | lecramr | luismanez | marcominerva | neel015 |
pascalberger | pawarsum12 | pradeepr-roboticist | qihangnet | roldengarm | setuc |
slapointe | slorello89 | snakex64 | spenavajr | TaoChenOSU | tarekgh |
teresaqhoang | tomasz-skarzynski | v-msamovendyuk | Valkozaur | vicperdana | walexee |
westdavidr | xbotter |
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for kernel-memory
Similar Open Source Tools
![kernel-memory Screenshot](/screenshots_githubs/microsoft-kernel-memory.jpg)
kernel-memory
Kernel Memory (KM) is a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing. KM is available as a Web Service, as a Docker container, a Plugin for ChatGPT/Copilot/Semantic Kernel, and as a .NET library for embedded applications. Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources. Designed for seamless integration as a Plugin with Semantic Kernel, Microsoft Copilot and ChatGPT, Kernel Memory enhances data-driven features in applications built for most popular AI platforms.
![datachain Screenshot](/screenshots_githubs/iterative-datachain.jpg)
datachain
DataChain is an open-source Python library for processing and curating unstructured data at scale. It supports AI-driven data curation using local ML models and LLM APIs, handles large datasets, and is Python-friendly with Pydantic objects. It excels at optimizing batch operations and is designed for offline data processing, curation, and ETL. Typical use cases include Computer Vision data curation, LLM analytics, and validation.
![starwhale Screenshot](/screenshots_githubs/star-whale-starwhale.jpg)
starwhale
Starwhale is an MLOps/LLMOps platform that brings efficiency and standardization to machine learning operations. It streamlines the model development lifecycle, enabling teams to optimize workflows around key areas like model building, evaluation, release, and fine-tuning. Starwhale abstracts Model, Runtime, and Dataset as first-class citizens, providing tailored capabilities for common workflow scenarios including Models Evaluation, Live Demo, and LLM Fine-tuning. It is an open-source platform designed for clarity and ease of use, empowering developers to build customized MLOps features tailored to their needs.
![mobius Screenshot](/screenshots_githubs/ray-project-mobius.jpg)
mobius
Mobius is an AI infra platform including realtime computing and training. It is built on Ray, a distributed computing framework, and provides a number of features that make it well-suited for online machine learning tasks. These features include: * **Cross Language**: Mobius can run in multiple languages (only Python and Java are supported currently) with high efficiency. You can implement your operator in different languages and run them in one job. * **Single Node Failover**: Mobius has a special failover mechanism that only needs to rollback the failed node itself, in most cases, to recover the job. This is a huge benefit if your job is sensitive about failure recovery time. * **AutoScaling**: Mobius can generate a new graph with different configurations in runtime without stopping the job. * **Fusion Training**: Mobius can combine TensorFlow/Pytorch and streaming, then building an e2e online machine learning pipeline. Mobius is still under development, but it has already been used to power a number of real-world applications, including: * A real-time recommendation system for a major e-commerce company * A fraud detection system for a large financial institution * A personalized news feed for a major news organization If you are interested in using Mobius for your own online machine learning projects, you can find more information in the documentation.
![vecs Screenshot](/screenshots_githubs/supabase-vecs.jpg)
vecs
vecs is a Python client for managing and querying vector stores in PostgreSQL with the pgvector extension. It allows users to create collections of vectors with associated metadata, index the collections for fast search performance, and query the collections based on specified filters. The tool simplifies the process of working with vector data in a PostgreSQL database, making it easier to store, retrieve, and analyze vector information.
![llm-interface Screenshot](/screenshots_githubs/samestrin-llm-interface.jpg)
llm-interface
LLM Interface is an npm module that streamlines interactions with various Large Language Model (LLM) providers in Node.js applications. It offers a unified interface for switching between providers and models, supporting 36 providers and hundreds of models. Features include chat completion, streaming, error handling, extensibility, response caching, retries, JSON output, and repair. The package relies on npm packages like axios, @google/generative-ai, dotenv, jsonrepair, and loglevel. Installation is done via npm, and usage involves sending prompts to LLM providers. Tests can be run using npm test. Contributions are welcome under the MIT License.
![sophia Screenshot](/screenshots_githubs/TrafficGuard-sophia.jpg)
sophia
Sophia is an open-source TypeScript platform designed for autonomous AI agents and LLM based workflows. It aims to automate processes, review code, assist with refactorings, and support various integrations. The platform offers features like advanced autonomous agents, reasoning/planning inspired by Google's Self-Discover paper, memory and function call history, adaptive iterative planning, and more. Sophia supports multiple LLMs/services, CLI and web interface, human-in-the-loop interactions, flexible deployment options, observability with OpenTelemetry tracing, and specific agents for code editing, software engineering, and code review. It provides a flexible platform for the TypeScript community to expand and support various use cases and integrations.
![gpustack Screenshot](/screenshots_githubs/gpustack-gpustack.jpg)
gpustack
GPUStack is an open-source GPU cluster manager designed for running large language models (LLMs). It supports a wide variety of hardware, scales with GPU inventory, offers lightweight Python package with minimal dependencies, provides OpenAI-compatible APIs, simplifies user and API key management, enables GPU metrics monitoring, and facilitates token usage and rate metrics tracking. The tool is suitable for managing GPU clusters efficiently and effectively.
![ExtractThinker Screenshot](/screenshots_githubs/enoch3712-ExtractThinker.jpg)
ExtractThinker
ExtractThinker is a library designed for extracting data from files and documents using Language Model Models (LLMs). It offers ORM-style interaction between files and LLMs, supporting multiple document loaders such as Tesseract OCR, Azure Form Recognizer, AWS TextExtract, and Google Document AI. Users can customize extraction using contract definitions, process documents asynchronously, handle various document formats efficiently, and split and process documents. The project is inspired by the LangChain ecosystem and focuses on Intelligent Document Processing (IDP) using LLMs to achieve high accuracy in document extraction tasks.
![lance Screenshot](/screenshots_githubs/lancedb-lance.jpg)
lance
Lance is a modern columnar data format optimized for ML workflows and datasets. It offers high-performance random access, vector search, zero-copy automatic versioning, and ecosystem integrations with Apache Arrow, Pandas, Polars, and DuckDB. Lance is designed to address the challenges of the ML development cycle, providing a unified data format for collection, exploration, analytics, feature engineering, training, evaluation, deployment, and monitoring. It aims to reduce data silos and streamline the ML development process.
![glide Screenshot](/screenshots_githubs/EinStack-glide.jpg)
glide
Glide is a cloud-native LLM gateway that provides a unified REST API for accessing various large language models (LLMs) from different providers. It handles LLMOps tasks such as model failover, caching, key management, and more, making it easy to integrate LLMs into applications. Glide supports popular LLM providers like OpenAI, Anthropic, Azure OpenAI, AWS Bedrock (Titan), Cohere, Google Gemini, OctoML, and Ollama. It offers high availability, performance, and observability, and provides SDKs for Python and NodeJS to simplify integration.
![cognee Screenshot](/screenshots_githubs/topoteretes-cognee.jpg)
cognee
Cognee is an open-source framework designed for creating self-improving deterministic outputs for Large Language Models (LLMs) using graphs, LLMs, and vector retrieval. It provides a platform for AI engineers to enhance their models and generate more accurate results. Users can leverage Cognee to add new information, utilize LLMs for knowledge creation, and query the system for relevant knowledge. The tool supports various LLM providers and offers flexibility in adding different data types, such as text files or directories. Cognee aims to streamline the process of working with LLMs and improving AI models for better performance and efficiency.
![rss-can Screenshot](/screenshots_githubs/soulteary-rss-can.jpg)
rss-can
RSS Can is a tool designed to simplify and improve RSS feed management. It supports various systems and architectures, including Linux and macOS. Users can download the binary from the GitHub release page or use the Docker image for easy deployment. The tool provides CLI parameters and environment variables for customization. It offers features such as memory and Redis cache services, web service configuration, and rule directory settings. The project aims to support RSS pipeline flow, NLP tasks, integration with open-source software rules, and tools like a quick RSS rules generator.
![lancedb Screenshot](/screenshots_githubs/lancedb-lancedb.jpg)
lancedb
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering, and management of embeddings. The key features of LanceDB include: Production-scale vector search with no servers to manage. Store, query, and filter vectors, metadata, and multi-modal data (text, images, videos, point clouds, and more). Support for vector similarity search, full-text search, and SQL. Native Python and Javascript/Typescript support. Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure. GPU support in building vector index(*). Ecosystem integrations with LangChain ๐ฆ๏ธ๐, LlamaIndex ๐ฆ, Apache-Arrow, Pandas, Polars, DuckDB, and more on the way. LanceDB's core is written in Rust ๐ฆ and is built using Lance, an open-source columnar format designed for performant ML workloads.
![rag-chat Screenshot](/screenshots_githubs/upstash-rag-chat.jpg)
rag-chat
The `@upstash/rag-chat` package simplifies the development of retrieval-augmented generation (RAG) chat applications by providing Next.js compatibility with streaming support, built-in vector store, optional Redis compatibility for fast chat history management, rate limiting, and disableRag option. Users can easily set up the environment variables and initialize RAGChat to interact with AI models, manage knowledge base, chat history, and enable debugging features. Advanced configuration options allow customization of RAGChat instance with built-in rate limiting, observability via Helicone, and integration with Next.js route handlers and Vercel AI SDK. The package supports OpenAI models, Upstash-hosted models, and custom providers like TogetherAi and Replicate.
![clearml-serving Screenshot](/screenshots_githubs/allegroai-clearml-serving.jpg)
clearml-serving
ClearML Serving is a command line utility for model deployment and orchestration, enabling model deployment including serving and preprocessing code to a Kubernetes cluster or custom container based solution. It supports machine learning models like Scikit Learn, XGBoost, LightGBM, and deep learning models like TensorFlow, PyTorch, ONNX. It provides a customizable RestAPI for serving, online model deployment, scalable solutions, multi-model per container, automatic deployment, canary A/B deployment, model monitoring, usage metric reporting, metric dashboard, and model performance metrics. ClearML Serving is modular, scalable, flexible, customizable, and open source.
For similar tasks
![kernel-memory Screenshot](/screenshots_githubs/microsoft-kernel-memory.jpg)
kernel-memory
Kernel Memory (KM) is a multi-modal AI Service specialized in the efficient indexing of datasets through custom continuous data hybrid pipelines, with support for Retrieval Augmented Generation (RAG), synthetic memory, prompt engineering, and custom semantic memory processing. KM is available as a Web Service, as a Docker container, a Plugin for ChatGPT/Copilot/Semantic Kernel, and as a .NET library for embedded applications. Utilizing advanced embeddings and LLMs, the system enables Natural Language querying for obtaining answers from the indexed data, complete with citations and links to the original sources. Designed for seamless integration as a Plugin with Semantic Kernel, Microsoft Copilot and ChatGPT, Kernel Memory enhances data-driven features in applications built for most popular AI platforms.
![swirl-search Screenshot](/screenshots_githubs/swirlai-swirl-search.jpg)
swirl-search
Swirl is an open-source software that allows users to simultaneously search multiple content sources and receive AI-ranked results. It connects to various data sources, including databases, public data services, and enterprise sources, and utilizes AI and LLMs to generate insights and answers based on the user's data. Swirl is easy to use, requiring only the download of a YML file, starting in Docker, and searching with Swirl. Users can add credentials to preloaded SearchProviders to access more sources. Swirl also offers integration with ChatGPT as a configured AI model. It adapts and distributes user queries to anything with a search API, re-ranking the unified results using Large Language Models without extracting or indexing anything. Swirl includes five Google Programmable Search Engines (PSEs) to get users up and running quickly. Key features of Swirl include Microsoft 365 integration, SearchProvider configurations, query adaptation, synchronous or asynchronous search federation, optional subscribe feature, pipelining of Processor stages, results stored in SQLite3 or PostgreSQL, built-in Query Transformation support, matching on word stems and handling of stopwords, duplicate detection, re-ranking of unified results using Cosine Vector Similarity, result mixers, page through all results requested, sample data sets, optional spell correction, optional search/result expiration service, easily extensible Connector and Mixer objects, and a welcoming community for collaboration and support.
![paper-qa Screenshot](/screenshots_githubs/whitead-paper-qa.jpg)
paper-qa
PaperQA is a minimal package for question and answering from PDFs or text files, providing very good answers with in-text citations. It uses OpenAI Embeddings to embed and search documents, and follows a process of embedding docs and queries, searching for top passages, creating summaries, scoring and selecting relevant summaries, putting summaries into prompt, and generating answers. Users can customize prompts and use various models for embeddings and LLMs. The tool can be used asynchronously and supports adding documents from paths, files, or URLs.
![quick-start-connectors Screenshot](/screenshots_githubs/cohere-ai-quick-start-connectors.jpg)
quick-start-connectors
Cohere's Build-Your-Own-Connector framework allows integration of Cohere's Command LLM via the Chat API endpoint to any datastore/software holding text information with a search endpoint. Enables user queries grounded in proprietary information. Use-cases include question/answering, knowledge working, comms summary, and research. Repository provides code for popular datastores and a template connector. Requires Python 3.11+ and Poetry. Connectors can be built and deployed using Docker. Environment variables set authorization values. Pre-commits for linting. Connectors tailored to integrate with Cohere's Chat API for creating chatbots. Connectors return documents as JSON objects for Cohere's API to generate answers with citations.
![llm-rag-workshop Screenshot](/screenshots_githubs/alexeygrigorev-llm-rag-workshop.jpg)
llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.
![RAGMeUp Screenshot](/screenshots_githubs/AI-Commandos-RAGMeUp.jpg)
RAGMeUp
RAG Me Up is a generic framework that enables users to perform Retrieve and Generate (RAG) on their own dataset easily. It consists of a small server and UIs for communication. Best run on GPU with 16GB vRAM. Users can combine RAG with fine-tuning using LLaMa2Lang repository. The tool allows configuration for LLM, data, LLM parameters, prompt, and document splitting. Funding is sought to democratize AI and advance its applications.
![local-genAI-search Screenshot](/screenshots_githubs/nikolamilosevic86-local-genAI-search.jpg)
local-genAI-search
Local-GenAI Search is a local generative search engine powered by the Llama3 model, allowing users to ask questions about their local files and receive concise answers with relevant document references. It utilizes MS MARCO embeddings for semantic search and can run locally on a 32GB laptop or computer. The tool can be used to index local documents, search for information, and provide generative search services through a user interface.
![nanoPerplexityAI Screenshot](/screenshots_githubs/Yusuke710-nanoPerplexityAI.jpg)
nanoPerplexityAI
nanoPerplexityAI is an open-source implementation of a large language model service that fetches information from Google. It involves a simple architecture where the user query is checked by the language model, reformulated for Google search, and an answer is generated and saved in a markdown file. The tool requires minimal setup and is designed for easy visualization of answers.
For similar jobs
![weave Screenshot](/screenshots_githubs/wandb-weave.jpg)
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
![LLMStack Screenshot](/screenshots_githubs/trypromptly-LLMStack.jpg)
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
![VisionCraft Screenshot](/screenshots_githubs/VisionCraft-org-VisionCraft.jpg)
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
![kaito Screenshot](/screenshots_githubs/Azure-kaito.jpg)
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
![PyRIT Screenshot](/screenshots_githubs/Azure-PyRIT.jpg)
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
![tabby Screenshot](/screenshots_githubs/TabbyML-tabby.jpg)
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
![spear Screenshot](/screenshots_githubs/isl-org-spear.jpg)
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
![Magick Screenshot](/screenshots_githubs/Oneirocom-Magick.jpg)
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.