RAGMeUp

Generic rag framework to apply the power of LLMs on any given dataset

Stars: 489

Visit

RAG Me Up is a generic framework that enables users to perform Retrieve and Generate (RAG) on their own dataset easily. It consists of a small server and UIs for communication. Best run on GPU with 16GB vRAM. Users can combine RAG with fine-tuning using LLaMa2Lang repository. The tool allows configuration for LLM, data, LLM parameters, prompt, and document splitting. Funding is sought to democratize AI and advance its applications.

README:

RAG Me Up

RAG Me Up is a generic framework (server + UIs) that enables you do to RAG on your own dataset easily. Its essence is a small and lightweight server and a couple of ways to run UIs to communicate with the server (or write your own).

RAG Me Up can run on CPU but is best run on any GPU with at least 16GB of vRAM when using the default instruct model.

Combine the power of RAG with the power of fine-tuning - check out our LLaMa2Lang repository on fine-tuning LLMs which can then be used in RAG Me Up.

Updates

2024-09-06 Implemented Re2
2024-09-04 Added an evaluation script that uses Ragas to evaluate your RAG pipeline
2024-08-30 Added Ollama compatibility
2024-08-27 Using cross encoders now so you can specify your own reranking model
2024-07-30 Added multiple provenance attribution methods
2024-06-26 Updated readme, added more file types, robust self-inflection
2024-06-05 Upgraded to Langchain v0.2

Installation

Server

git clone https://github.com/UnderstandLingBV/RAGMeUp.git
cd server
pip install -r requirements.txt

Then run the server using python server.py from the server subfolder.

Scala UI

Make sure you have JDK 17+. Download and install SBT and run sbt run from the server/scala directory or alternatively download the compiled binary and run bin/ragemup(.bat)

Using Postgres (adviced for production)

RAG Me Up supports Postgres as hybrid retrieval database with both pgvector and pg_search installed. To run Postgres instead of Milvus, follow these steps.

In the postgres folder is a Dockerfile, build it using docker build -t ragmeup-pgvector-pgsearch .
Run the container using docker run --name ragmeup-pgvector-pgsearch -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d ragmeup-pgvector-pgsearch
Once in use, our custom PostgresBM25Retriever will automatically create the right indexes for you.
pgvector however, will not do this automatically so you have to create them yourself (perhaps after loading the documents first so the right tables are created):
- Make sure the vector column is an actual vector (it's not by default): ALTER TABLE langchain_pg_embedding ALTER COLUMN embedding TYPE vector(384);
- Create the index (may take a while with a lot of data): CREATE INDEX ON langchain_pg_embedding USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

How does RAG Me Up work?

RAG Me Up aims to provide a robust RAG pipeline that is configurable without necessarily writing any code. To achieve this, a couple of strategies are used to make sure that the user query can be accurately answered through the documents provided.

The RAG pipeline is visualized in the image below:

The following steps are executed. Take note that some steps are optional and can be turned off through configuring the .env file.

Top part - Indexing

You collect and make your documents available to RAG Me Up.
Using different file type loaders, RAG Me Up will read the contents of your documents. Note that for some document types like JSON and XML, you need to specify additional configuration to tell RAG Me Up what to extract.
Your documents get chunked using a recursive splitter.
The chunks get converted into document (chunk) embeddings using an embedding model. Note that this model is usually a different one than the LLM you intend to use for chat.
RAG Me Up uses a hybrid search strategy, combining dense vectors in the vector database with sparse vectors using BM25. By default, RAG Me Up uses a local Milvus database.

Bottom part - Inference

Inference starts with a user asking a query. This query can either be an initial query or a follow-up query with an associated history and documents retrieved before. Note that both (chat history, documents) need to be passed on by a UI to properly handle follow-up querying.
A check is done if new documents need to be fetched, this can be due to one of two cases:
- There is no history given in which case we always need to fetch documents
- [OPTIONAL] The LLM itself will judge whether or not the question - in isolation - is phrased in such a way that new documents are fetched or whether it is a follow-up question on existing documents. A flag called fetch_new_documents is set to indicate whether or not new documents need to be fetched.
Documents are fetched from both the vector database (dense) and the BM25 index (sparse). Only executed if fetch_new_documents is set.
[OPTIONAL] Reranking is applied to extract the most relevant documents returned by the previous step. Only executed if fetch_new_documents is set.
[OPTIONAL] The LLM is asked to judge whether or not the documents retrieved contain an accurate answer to the user's query. Only executed if fetch_new_documents is set.
- If this is not the case, the LLM is used to rewrite the query with the instruction to optimize for distance based similarity search. This is then fed back into step 3. but only once to avoid lengthy or infinite loops.
The documents are injected into the prompt with the user query. The documents can come from:
- The retrieval and reranking of the document databases, if fetch_new_documents is set.
- The history passed on with the initial user query, if fetch_new_documents is not set.
The LLM is asked to answer the query with the given chat history and documents.
The answer, chat history and documents are returned.

Configuration

RAG Me Up uses a .env file for configuration, see .env.template. The following fields can be configured:

LLM configuration

llm_model This is the main LLM (instruct or chat) model to use that you will converse with. Default is LLaMa3-8B
llm_assistant_token This should contain the unique query (sub)string that indicates where in a prompt template the assistant's answer starts
embedding_model The model used to convert your documents' chunks into vectors that will be stored in the vector store
trust_remote_code Set this to true if your LLM needs to execute remote code
force_cpu When set to True, forces RAG Me Up to run fully on CPU (not recommended)

Use OpenAI

If you want to use OpenAI as LLM backend, make sure to set use_openai to True and make sure you (externally) set the environment variable OPENAI_API_KEY to be your OpenAI API Key.

Use Gemini

If you want to use Gemini as LLM backend, make sure to set use_gemini to True and make sure you (externally) set the environment variable GOOGLE_API_KEY to be your Gemini API Key.

Use Azure OpenAI

If you want to use Azure OpenAI as LLM backend, make sure to set use_azure to True and make sure you (externally) set the following environment variables:

AZURE_OPENAI_API_KEY
AZURE_OPENAI_API_VERSION
AZURE_OPENAI_ENDPOINT
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME

Use Ollama

If you want to use Ollama as LLM backend, make sure to install Ollama and set use_ollama to True. The model to use should be given in ollama_model.

RAG Provenance

One of the biggest, arguably unsolved, challenges of RAG is to do good provenance attribution: tracking which of the source documents retrieved from your database led to the LLM generating its answer (the most). RAG Me Up implements several ways of achieving this, each with its own pros and cons.

The following environment variables can be set for provenance attribution.

provenance_method Can be one of rerank, attention, similarity, llm. If rerank is False and the value of provenance_method is either rerank or none of the allowed values, provenance attribution is turned completely off
provenance_similarity_llm If provenance_method is set to similarity, this model will be used to compute the similarity scores
provenance_include_query Set to True or False to include the query itself when attributing provenance
provenance_llm_prompt If provenance_method is set to llm, this prompt will be used to let the LLM attribute the provenance of each document in isolation.

The different provenance attribution metrics are described below.

`provenance_method=rerank` (preferred for closed LLMs)

This uses the reranker as the provenance method. While the reranking is already used when retrieving documents (if reranking is turned on), this only applies the rerankers cross-attention to the documents and the query. For provenance attribution, we use the same reranking to apply cross-attention to the answer (and potentially the query too).

`provenance_method=attention` (preferred for OS LLMs)

This is probably the most accurate way of tracking provenance but it can only be used with OS LLMs that allow to return the attention weights. The way we track provenance is by looking at the actual attention weights (of the last attention layer in the model) for each token from the answer to the document and vice versa, optionally we do the same for the query if provenance_include_query=True.

`provenance_method=similarity`

This method uses a sentence transformer (LM) to get dense vectors for each document as well as for the answer (and potentially query). We then use a cosine similarity to get the similarity of the document vectors to the answer (+ query).

`provenance_method=llm`

The LLM that is used to generate messages is now also used to attribute the provenance of each document in isolation. We use the provenance_llm_prompt as the prompt to ask the LLM to perform this task. Note that the outcome of this provenance method is highly influenced by the prompt and the strength of the model. As a good practice, make sure you force the LLM to return numbers on a relatively small scale (eg. score from 1 to 3). Using something like a percentage for each document will likely result in random outcomes.

Data configuration

data_directory The directory that contains your (initial) documents to load into the vector store
file_types Comma-separated list of file types to load. Supported file types: PDF, JSON, DOCX, XSLX, PPTX, CSV, XML
json_schema If you are loading JSON, this should be the schema (using jq_schema). For example, use . for the root of a JSON object if your data contains JSON objects only and .[0] for the first element in each JSON array if your data contains JSON arrays with one JSON object in them
json_text_content Whether or not the JSON data should be loaded as textual content or as structured content (in case of a JSON object)
xml_xpath If you are loading XML, this should be the XPath of the documents to load (the tags that contain your text)

Retrieval configuration

vector_store_uri RAG Me Up caches your vector store on disk if possible to make loading a next time faster. This is the location where the vector store is stored. Remove this file to force a reload of all your documents
vector_store_k The number of documents to retrieve from the vector store
rerank Set to either True or False to enable reranking
rerank_k The number of documents to keep after reranking. Note that if you use reranking, this should be your final target for k and vector_store_k should be set (significantly) higher. For example, set vector_store_k to 10 and rerank_k to 3
rerank_model The cross encoder reranking retrieval model to use. Sensible defaults are cross-encoder/ms-marco-TinyBERT-L-2-v2 for speed and colbert-ir/colbertv2.0 for accuracy (antoinelouis/colbert-xm for multilingual). Set this value to flashrank to use the FlashrankReranker.

LLM parameters

temperature The chat LLM's temperature. Increase this to create more diverse answers
repetition_penalty The penalty for repeating outputs in the chat answers. Some models are very sensitive to this parameter and need a value bigger than 1.0 (penalty) while others benefit from inversing it (lower than 1.0)
max_new_tokens This caps how much tokens the LLM can generate in its answer. More tokens means slower throughput and more memory usage

Prompt configuration

rag_instruction An instruction message for the LLM to let it know what to do. Should include a mentioning of it performing RAG and that documents will be given as input context to generate the answer from.
rag_question_initial The initial question prompt that will be given to the LLM only for the first question a user asks, that is, without chat history
rag_question_followup This is a follow-up question the user is asking. While the context resulting from the prompt will be populated by RAG from the vector store, if chat history is present, this prompt will be used instead of rag_question_initial

Document retrieval

rag_fetch_new_instruction RAG Me Up automatically determines whether or not new documents should be fetched from the vector store or whether the user is asking a follow-up question on the already fetched documents by leveraging the same LLM that is used for chat. This environment variable determines the prompt to use to make this decision. Be very sure to instruct your LLM to answer with yes or no only and make sure your LLM is capable enough to follow this instruction
rag_fetch_new_question The question prompt used in conjunction with rag_fetch_new_instruction to decide if new documents should be fetched or not

Rewriting (self-inflection)

user_rewrite_loop Set to either True or False to enable the rewriting of the initial query. Note that a rewrite will always occur at most once
rewrite_query_instruction This is the instruction of the prompt that is used to ask the LLM to judge whether a rewrite is necessary or not. Make sure you force the LLM to answer with yes or no only
rewrite_query_question This is the actual query part of the prompt that isued to ask the LLM to judge a rewrite
rewrite_query_prompt If the rewrite loop is on and the LLM judges a rewrite is required, this is the instruction with question asked to the LLM to rewrite the user query into a phrasing more optimized for RAG. Make sure to instruct your model adequately.

Re2

use_re2 Set to either True or False to enable Re2 (Re-reading) which repeats the question, generally improving the quality of the answer generated by the LLM.
re2_prompt The prompt used in between the question and the repeated question to signal that we are re-asking.

Document splitting configuration

splitter The Langchain document splitter to use. Supported splitters are RecursiveCharacterTextSplitter and SemanticChunker.
chunk_size The chunk size to use when splitting up documents for RecursiveCharacterTextSplitter
chunk_overlap The chunk overlap for RecursiveCharacterTextSplitter
breakpoint_threshold_type Sets the breakpoint threshold type when using the SemanticChunker (see here). Can be one of: percentile, standard_deviation, interquartile, gradient
breakpoint_threshold_amount The amount to use for the threshold type, in float. Set to None to leave default
number_of_chunks The number of chunks to use for the threshold type, in int. Set to None to leave default

Evaluation

While RAG evaluation is difficult and subjective to begin with, frameworks such as Ragas can give some metrics as to how well your RAG pipeline and its prompts are working, allowing us to benchmark one approach over the other quantitatively.

RAG Me Up uses Ragas to evaluate your pipeline. You can run an evaluation based on your .env using python Ragas_eval.py. The following configuration parameters can be set for evaluation:

ragas_sample_size The amount of document (chunks) to use in evaluation. These are sampled from your data directory after chunking.
ragas_qa_pairs Ragas works upon questions and ground truth answers. The amount of such pairs to create based on the sampled document chunks is set by this parameter.
ragas_question_instruction The instruction prompt used to generate the questions of the Ragas input pairs.
ragas_question_query The query prompt used to generate the questions of the Ragas input pairs.
ragas_answer_instruction The instruction prompt used to generate the answers of the Ragas input pairs.
ragas_answer_query The query prompt used to generate the answers of the Ragas input pairs.

Funding

We are actively looking for funding to democratize AI and advance its applications. Contact us at [email protected] if you want to invest.

For Tasks:

Click tags to check more tools for each tasks

generate answers retrieve information fine-tune models load documents configure settings

For Jobs:

data scientist machine learning engineer ai researcher natural language processing engineer research scientist

Alternative AI tools for RAGMeUp

Similar Open Source Tools

RAGMeUp

github

: 489

RAGMeUp

RAG Me Up is a generic framework that enables users to perform Retrieve, Answer, Generate (RAG) on their own dataset easily. It consists of a small server and UIs for communication. The tool can run on CPU but is optimized for GPUs with at least 16GB of vRAM. Users can combine RAG with fine-tuning using the LLaMa2Lang repository. The tool provides a configurable RAG pipeline without the need for coding, utilizing indexing and inference steps to accurately answer user queries.

github

: 576

gpt-subtrans

GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.

github

: 418

qb

QANTA is a system and dataset for question answering tasks. It provides a script to download datasets, preprocesses questions, and matches them with Wikipedia pages. The system includes various datasets, training, dev, and test data in JSON and SQLite formats. Dependencies include Python 3.6, `click`, and NLTK models. Elastic Search 5.6 is needed for the Guesser component. Configuration is managed through environment variables and YAML files. QANTA supports multiple guesser implementations that can be enabled/disabled. Running QANTA involves using `cli.py` and Luigi pipelines. The system accesses raw Wikipedia dumps for data processing. The QANTA ID numbering scheme categorizes datasets based on events and competitions.

github

: 167

Open-LLM-VTuber

Open-LLM-VTuber is a project in early stages of development that allows users to interact with Large Language Models (LLM) using voice commands and receive responses through a Live2D talking face. The project aims to provide a minimum viable prototype for offline use on macOS, Linux, and Windows, with features like long-term memory using MemGPT, customizable LLM backends, speech recognition, and text-to-speech providers. Users can configure the project to chat with LLMs, choose different backend services, and utilize Live2D models for visual representation. The project supports perpetual chat, offline operation, and GPU acceleration on macOS, addressing limitations of existing solutions on macOS.

github

: 1.9k

REINVENT4

REINVENT is a molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design, molecule optimization, and other small molecule design tasks. It uses a Reinforcement Learning (RL) algorithm to generate optimized molecules compliant with a user-defined property profile defined as a multi-component score. Transfer Learning (TL) can be used to create or pre-train a model that generates molecules closer to a set of input molecules.

github

: 447

polis

Polis is an AI powered sentiment gathering platform that offers a more organic approach than surveys and requires less effort than focus groups. It provides a comprehensive wiki, main deployment at https://pol.is, discussions, issue tracking, and project board for users. Polis can be set up using Docker infrastructure and offers various commands for building and running containers. Users can test their instance, update the system, and deploy Polis for production. The tool also provides developer conveniences for code reloading, type checking, and database connections. Additionally, Polis supports end-to-end browser testing using Cypress and offers troubleshooting tips for common Docker and npm issues.

github

: 836

RouteLLM

RouteLLM is a framework for serving and evaluating LLM routers. It allows users to launch an OpenAI-compatible API that routes requests to the best model based on cost thresholds. Trained routers are provided to reduce costs while maintaining performance. Users can easily extend the framework, compare router performance, and calibrate cost thresholds. RouteLLM supports multiple routing strategies and benchmarks, offering a lightweight server and evaluation framework. It enables users to evaluate routers on benchmarks, calibrate thresholds, and modify model pairs. Contributions for adding new routers and benchmarks are welcome.

github

: 2.6k

tutor-gpt

Tutor-GPT is an LLM powered learning companion developed by Plastic Labs. It dynamically reasons about your learning needs and updates its own prompts to best serve you. It is an expansive learning companion that uses theory of mind experiments to provide personalized learning experiences. The project is split into different modules for backend logic, including core logic, discord bot implementation, FastAPI API interface, NextJS web front end, common utilities, and SQL scripts for setting up local supabase. Tutor-GPT is powered by Honcho to build robust user representations and create personalized experiences for each user. Users can run their own instance of the bot by following the provided instructions.

github

: 725

MultiPL-E

MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. It is part of the BigCode Code Generation LM Harness and allows for evaluating Code LLMs using various benchmarks. The tool supports multiple versions with improvements and new language additions, providing a scalable and polyglot approach to benchmarking neural code generation. Users can access a tutorial for direct usage and explore the dataset of translated prompts on the Hugging Face Hub.

github

: 219

redbox-copilot

Redbox Copilot is a retrieval augmented generation (RAG) app that uses GenAI to chat with and summarise civil service documents. It increases organisational memory by indexing documents and can summarise reports read months ago, supplement them with current work, and produce a first draft that lets civil servants focus on what they do best. The project uses a microservice architecture with each microservice running in its own container defined by a Dockerfile. Dependencies are managed using Python Poetry. Contributions are welcome, and the project is licensed under the MIT License.

github

: 66

ai-town

AI Town is a virtual town where AI characters live, chat, and socialize. This project provides a deployable starter kit for building and customizing your own version of AI Town. It features a game engine, database, vector search, auth, text model, deployment, pixel art generation, background music generation, and local inference. You can customize your own simulation by creating characters and stories, updating spritesheets, changing the background, and modifying the background music.

github

: 6.3k

PolyMind

PolyMind is a multimodal, function calling powered LLM webui designed for various tasks such as internet searching, image generation, port scanning, Wolfram Alpha integration, Python interpretation, and semantic search. It offers a plugin system for adding extra functions and supports different models and endpoints. The tool allows users to interact via function calling and provides features like image input, image generation, and text file search. The application's configuration is stored in a `config.json` file with options for backend selection, compatibility mode, IP address settings, API key, and enabled features.

github

: 204

ollama-autocoder

Ollama Autocoder is a simple to use autocompletion engine that integrates with Ollama AI. It provides options for streaming functionality and requires specific settings for optimal performance. Users can easily generate text completions by pressing a key or using a command pallete. The tool is designed to work with Ollama API and a specified model, offering real-time generation of text suggestions.

github

: 92

Airstream

github

: 239

openui

OpenUI is a tool designed to simplify the process of building UI components by allowing users to describe UI using their imagination and see it rendered live. It supports converting HTML to React, Svelte, Web Components, etc. The tool is open source and aims to make UI development fun, fast, and flexible. It integrates with various AI services like OpenAI, Groq, Gemini, Anthropic, Cohere, and Mistral, providing users with the flexibility to use different models. OpenUI also supports LiteLLM for connecting to various LLM services and allows users to create custom proxy configs. The tool can be run locally using Docker or Python, and it offers a development environment for quick setup and testing.

github

: 16.7k

For similar tasks

Forza-Mods-AIO

Forza Mods AIO is a free and open-source tool that enhances the gaming experience in Forza Horizon 4 and 5. It offers a range of time-saving and quality-of-life features, making gameplay more enjoyable and efficient. The tool is designed to streamline various aspects of the game, improving user satisfaction and overall enjoyment.

github

: 417

hass-ollama-conversation

The Ollama Conversation integration adds a conversation agent powered by Ollama in Home Assistant. This agent can be used in automations to query information provided by Home Assistant about your house, including areas, devices, and their states. Users can install the integration via HACS and configure settings such as API timeout, model selection, context size, maximum tokens, and other parameters to fine-tune the responses generated by the AI language model. Contributions to the project are welcome, and discussions can be held on the Home Assistant Community platform.

github

: 113

crawl4ai

Crawl4AI is a powerful and free web crawling service that extracts valuable data from websites and provides LLM-friendly output formats. It supports crawling multiple URLs simultaneously, replaces media tags with ALT, and is completely free to use and open-source. Users can integrate Crawl4AI into Python projects as a library or run it as a standalone local server. The tool allows users to crawl and extract data from specified URLs using different providers and models, with options to include raw HTML content, force fresh crawls, and extract meaningful text blocks. Configuration settings can be adjusted in the `crawler/config.py` file to customize providers, API keys, chunk processing, and word thresholds. Contributions to Crawl4AI are welcome from the open-source community to enhance its value for AI enthusiasts and developers.

github

: 37.5k

MaterialSearch

MaterialSearch is a tool for searching local images and videos using natural language. It provides functionalities such as text search for images, image search for images, text search for videos (providing matching video clips), image search for videos (searching for the segment in a video through a screenshot), image-text similarity calculation, and Pexels video search. The tool can be deployed through the source code or Docker image, and it supports GPU acceleration. Users can configure the tool through environment variables or a .env file. The tool is still under development, and configurations may change frequently. Users can report issues or suggest improvements through issues or pull requests.

github

: 1.4k

tenere

Tenere is a TUI interface for Language Model Libraries (LLMs) written in Rust. It provides syntax highlighting, chat history, saving chats to files, Vim keybindings, copying text from/to clipboard, and supports multiple backends. Users can configure Tenere using a TOML configuration file, set key bindings, and use different LLMs such as ChatGPT, llama.cpp, and ollama. Tenere offers default key bindings for global and prompt modes, with features like starting a new chat, saving chats, scrolling, showing chat history, and quitting the app. Users can interact with the prompt in different modes like Normal, Visual, and Insert, with various key bindings for navigation, editing, and text manipulation.

github

: 419

openkore

OpenKore is a custom client and intelligent automated assistant for Ragnarok Online. It is a free, open source, and cross-platform program (Linux, Windows, and MacOS are supported). To run OpenKore, you need to download and extract it or clone the repository using Git. Configure OpenKore according to the documentation and run openkore.pl to start. The tool provides a FAQ section for troubleshooting, guidelines for reporting issues, and information about botting status on official servers. OpenKore is developed by a global team, and contributions are welcome through pull requests. Various community resources are available for support and communication. Users are advised to comply with the GNU General Public License when using and distributing the software.

github

: 1.3k

QA-Pilot

QA-Pilot is an interactive chat project that leverages online/local LLM for rapid understanding and navigation of GitHub code repository. It allows users to chat with GitHub public repositories using a git clone approach, store chat history, configure settings easily, manage multiple chat sessions, and quickly locate sessions with a search function. The tool integrates with `codegraph` to view Python files and supports various LLM models such as ollama, openai, mistralai, and localai. The project is continuously updated with new features and improvements, such as converting from `flask` to `fastapi`, adding `localai` API support, and upgrading dependencies like `langchain` and `Streamlit` to enhance performance.

github

: 118

extension-gen-ai

The Looker GenAI Extension provides code examples and resources for building a Looker Extension that integrates with Vertex AI Large Language Models (LLMs). Users can leverage the power of LLMs to enhance data exploration and analysis within Looker. The extension offers generative explore functionality to ask natural language questions about data and generative insights on dashboards to analyze data by asking questions. It leverages components like BQML Remote Models, BQML Remote UDF with Vertex AI, and Custom Fine Tune Model for different integration options. Deployment involves setting up infrastructure with Terraform and deploying the Looker Extension by creating a Looker project, copying extension files, configuring BigQuery connection, connecting to Git, and testing the extension. Users can save example prompts and configure user settings for the extension. Development of the Looker Extension environment includes installing dependencies, starting the development server, and building for production.

github

: 59

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675

RAGMeUp

README:

RAG Me Up

Updates

Installation

Server

Scala UI

Using Postgres (adviced for production)

How does RAG Me Up work?

Configuration

LLM configuration

Use OpenAI

Use Gemini

Use Azure OpenAI

Use Ollama

RAG Provenance

provenance_method=rerank (preferred for closed LLMs)

provenance_method=attention (preferred for OS LLMs)

provenance_method=similarity

provenance_method=llm

Data configuration

Retrieval configuration

LLM parameters

Prompt configuration

Document retrieval

Rewriting (self-inflection)

Re2

Document splitting configuration

Evaluation

Funding

For Tasks:

For Jobs:

Alternative AI tools for RAGMeUp

Similar Open Source Tools

RAGMeUp

RAGMeUp

gpt-subtrans

qb

Open-LLM-VTuber

REINVENT4

polis

RouteLLM

tutor-gpt

MultiPL-E

redbox-copilot

ai-town

PolyMind

ollama-autocoder

Airstream

openui

For similar tasks

Forza-Mods-AIO

hass-ollama-conversation

crawl4ai

MaterialSearch

tenere

openkore

QA-Pilot

extension-gen-ai

For similar jobs

weave

LLMStack

VisionCraft

kaito

PyRIT

tabby

spear

Magick

`provenance_method=rerank` (preferred for closed LLMs)

`provenance_method=attention` (preferred for OS LLMs)

`provenance_method=similarity`

`provenance_method=llm`