![raglite](/statics/github-mark.png)
raglite
π₯€ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
Stars: 657
![screenshot](/screenshots_githubs/superlinear-ai-raglite.jpg)
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.
README:
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite.
- π§ Choose any LLM provider with LiteLLM, including local llama-cpp-python models
- πΎ Choose either PostgreSQL or SQLite as a keyword & vector search database
- π₯ Choose any reranker with rerankers, including multilingual FlashRank as the default
- β€οΈ Only lightweight and permissive open source dependencies (e.g., no PyTorch or LangChain)
- π Acceleration with Metal on macOS, and CUDA on Linux and Windows
- π PDF to Markdown conversion on top of pdftext and pypdfium2
- 𧬠Multi-vector chunk embedding with late chunking and contextual chunk headings
- βοΈ Optimal level 4 semantic chunking by solving a binary integer programming problem
- π Hybrid search with the database's native keyword & vector search (tsvector+pgvector, FTS5+sqlite-vec1)
- π Adaptive retrieval where the LLM decides whether to and what to retrieve based on the query
- π° Improved cost and latency with a prompt caching-aware message array structure
- π° Improved output quality with Anthropic's long-context prompt format
- π Optimal closed-form linear query adapter by solving an orthogonal Procrustes problem
- π A built-in Model Context Protocol (MCP) server that any MCP client like Claude desktop can connect with
- π¬ Optional customizable ChatGPT-like frontend for web, Slack, and Teams with Chainlit
- βοΈ Optional conversion of any input document to Markdown with Pandoc
- β Optional evaluation of retrieval and generation performance with Ragas
[!TIP] π It is optional but recommended to install an accelerated llama-cpp-python precompiled binary with:
# Configure which llama-cpp-python precompiled binary to install (β οΈ On macOS only v0.3.2 is supported right now): LLAMA_CPP_PYTHON_VERSION=0.3.2 PYTHON_VERSION=310|311|312 ACCELERATOR=metal|cu121|cu122|cu123|cu124 PLATFORM=macosx_11_0_arm64|linux_x86_64|win_amd64 # Install llama-cpp-python: pip install "https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"
Install RAGLite with:
pip install raglite
To add support for a customizable ChatGPT-like frontend, use the chainlit
extra:
pip install raglite[chainlit]
To add support for filetypes other than PDF, use the pandoc
extra:
pip install raglite[pandoc]
To add support for evaluation, use the ragas
extra:
pip install raglite[ragas]
- Configuring RAGLite
- Inserting documents
- Retrieval-Augmented Generation (RAG)
- Computing and using an optimal query adapter
- Evaluation of retrieval and generation
- Running a Model Context Protocol (MCP) server
- Serving a customizable ChatGPT-like frontend
[!TIP] π§ RAGLite extends LiteLLM with support for llama.cpp models using llama-cpp-python. To select a llama.cpp model (e.g., from bartowski's collection), use a model identifier of the form
"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>"
, wheren_ctx
is an optional parameter that specifies the context size of the model.
[!TIP] πΎ You can create a PostgreSQL database in a few clicks at neon.tech.
First, configure RAGLite with your preferred PostgreSQL or SQLite database and any LLM supported by LiteLLM:
from raglite import RAGLiteConfig
# Example 'remote' config with a PostgreSQL database and an OpenAI LLM:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
llm="gpt-4o-mini", # Or any LLM supported by LiteLLM.
embedder="text-embedding-3-large", # Or any embedder supported by LiteLLM.
)
# Example 'local' config with a SQLite database and a llama.cpp LLM:
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.db",
llm="llama-cpp-python/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/*Q4_K_M.gguf@8192",
embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@1024", # A context size of 1024 tokens is the sweet spot for bge-m3.
)
You can also configure any reranker supported by rerankers:
from rerankers import Reranker
# Example remote API-based reranker:
my_config = RAGLiteConfig(
db_url="postgresql://my_username:my_password@my_host:5432/my_database"
reranker=Reranker("cohere", lang="en", api_key=COHERE_API_KEY)
)
# Example local cross-encoder reranker per language (this is the default):
my_config = RAGLiteConfig(
db_url="sqlite:///raglite.db",
reranker=(
("en", Reranker("ms-marco-MiniLM-L-12-v2", model_type="flashrank")), # English
("other", Reranker("ms-marco-MultiBERT-L-12", model_type="flashrank")), # Other languages
)
)
[!TIP] βοΈ To insert documents other than PDF, install the
pandoc
extra withpip install raglite[pandoc]
.
Next, insert some documents into the database. RAGLite will take care of the conversion to Markdown, optimal level 4 semantic chunking, and multi-vector embedding with late chunking:
# Insert documents:
from pathlib import Path
from raglite import insert_document
insert_document(Path("On the Measure of Intelligence.pdf"), config=my_config)
insert_document(Path("Special Relativity.pdf"), config=my_config)
Now you can run an adaptive RAG pipeline that consists of adding the user prompt to the message history and streaming the LLM response:
from raglite import rag
# Create a user message:
messages = [] # Or start with an existing message history.
messages.append({
"role": "user",
"content": "How is intelligence measured?"
})
# Adaptively decide whether to retrieve and then stream the response:
chunk_spans = []
stream = rag(messages, on_retrieval=lambda x: chunk_spans.extend(x), config=my_config)
for update in stream:
print(update, end="")
# Access the documents referenced in the RAG context:
documents = [chunk_span.document for chunk_span in chunk_spans]
The LLM will adaptively decide whether to retrieve information based on the complexity of the user prompt. If retrieval is necessary, the LLM generates the search query and RAGLite applies hybrid search and reranking to retrieve the most relevant chunk spans (each of which is a list of consecutive chunks). The retrieval results are sent to the on_retrieval
callback and are appended to the message history as a tool output. Finally, the assistant response is streamed and appended to the message history.
If you need manual control over the RAG pipeline, you can run a basic but powerful pipeline that consists of retrieving the most relevant chunk spans with hybrid search and reranking, converting the user prompt to a RAG instruction and appending it to the message history, and finally generating the RAG response:
from raglite import create_rag_instruction, rag, retrieve_rag_context
# Retrieve relevant chunk spans with hybrid search and reranking:
user_prompt = "How is intelligence measured?"
chunk_spans = retrieve_rag_context(query=user_prompt, num_chunks=5, config=my_config)
# Append a RAG instruction based on the user prompt and context to the message history:
messages = [] # Or start with an existing message history.
messages.append(create_rag_instruction(user_prompt=user_prompt, context=chunk_spans))
# Stream the RAG response and append it to the message history:
stream = rag(messages, config=my_config)
for update in stream:
print(update, end="")
# Access the documents referenced in the RAG context:
documents = [chunk_span.document for chunk_span in chunk_spans]
[!TIP] π₯ Reranking can significantly improve the output quality of a RAG application. To add reranking to your application: first search for a larger set of 20 relevant chunks, then rerank them with a rerankers reranker, and finally keep the top 5 chunks.
RAGLite also offers more advanced control over the individual steps of a full RAG pipeline:
- Searching for relevant chunks with keyword, vector, or hybrid search
- Retrieving the chunks from the database
- Reranking the chunks and selecting the top 5 results
- Extending the chunks with their neighbors and grouping them into chunk spans
- Converting the user prompt to a RAG instruction and appending it to the message history
- Streaming an LLM response to the message history
- Accessing the cited documents from the chunk spans
A full RAG pipeline is straightforward to implement with RAGLite:
# Search for chunks:
from raglite import hybrid_search, keyword_search, vector_search
user_prompt = "How is intelligence measured?"
chunk_ids_vector, _ = vector_search(user_prompt, num_results=20, config=my_config)
chunk_ids_keyword, _ = keyword_search(user_prompt, num_results=20, config=my_config)
chunk_ids_hybrid, _ = hybrid_search(user_prompt, num_results=20, config=my_config)
# Retrieve chunks:
from raglite import retrieve_chunks
chunks_hybrid = retrieve_chunks(chunk_ids_hybrid, config=my_config)
# Rerank chunks and keep the top 5 (optional, but recommended):
from raglite import rerank_chunks
chunks_reranked = rerank_chunks(user_prompt, chunks_hybrid, config=my_config)
chunks_reranked = chunks_reranked[:5]
# Extend chunks with their neighbors and group them into chunk spans:
from raglite import retrieve_chunk_spans
chunk_spans = retrieve_chunk_spans(chunks_reranked, config=my_config)
# Append a RAG instruction based on the user prompt and context to the message history:
from raglite import create_rag_instruction
messages = [] # Or start with an existing message history.
messages.append(create_rag_instruction(user_prompt=user_prompt, context=chunk_spans))
# Stream the RAG response and append it to the message history:
from raglite import rag
stream = rag(messages, config=my_config)
for update in stream:
print(update, end="")
# Access the documents referenced in the RAG context:
documents = [chunk_span.document for chunk_span in chunk_spans]
RAGLite can compute and apply an optimal closed-form query adapter to the prompt embedding to improve the output quality of RAG. To benefit from this, first generate a set of evals with insert_evals
and then compute and store the optimal query adapter with update_query_adapter
:
# Improve RAG with an optimal query adapter:
from raglite import insert_evals, update_query_adapter
insert_evals(num_evals=100, config=my_config)
update_query_adapter(config=my_config) # From here, every vector search will use the query adapter.
If you installed the ragas
extra, you can use RAGLite to answer the evals and then evaluate the quality of both the retrieval and generation steps of RAG using Ragas:
# Evaluate retrieval and generation:
from raglite import answer_evals, evaluate, insert_evals
insert_evals(num_evals=100, config=my_config)
answered_evals_df = answer_evals(num_evals=10, config=my_config)
evaluation_df = evaluate(answered_evals_df, config=my_config)
RAGLite comes with an MCP server implemented with FastMCP that exposes a search_knowledge_base
tool. To use the server:
- Install Claude desktop
- Install uv so that Claude desktop can start the server
- Configure Claude desktop to use
uv
to start the MCP server with:
raglite \
--db-url sqlite:///raglite.db \
--llm llama-cpp-python/bartowski/Llama-3.2-3B-Instruct-GGUF/*Q4_K_M.gguf@4096 \
--embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@1024 \
mcp install
To use an API-based LLM, make sure to include your credentials in a .env
file or supply them inline:
OPENAI_API_KEY=sk-... raglite --llm gpt-4o-mini --embedder text-embedding-3-large mcp install
Now, when you start Claude desktop you should see a π¨ icon at the bottom right of your prompt indicating that the Claude has successfully connected with the MCP server.
When relevant, Claude will suggest to use the search_knowledge_base
tool that the MCP server provides. You can also explicitly ask Claude to search the knowledge base if you want to be certain that it does.
If you installed the chainlit
extra, you can serve a customizable ChatGPT-like frontend with:
raglite chainlit
The application is also deployable to web, Slack, and Teams.
You can specify the database URL, LLM, and embedder directly in the Chainlit frontend, or with the CLI as follows:
raglite \
--db-url sqlite:///raglite.db \
--llm llama-cpp-python/bartowski/Llama-3.2-3B-Instruct-GGUF/*Q4_K_M.gguf@4096 \
--embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@1024 \
chainlit
To use an API-based LLM, make sure to include your credentials in a .env
file or supply them inline:
OPENAI_API_KEY=sk-... raglite --llm gpt-4o-mini --embedder text-embedding-3-large chainlit
Prerequisites
1. Set up Git to use SSH
- Generate an SSH key and add the SSH key to your GitHub account.
- Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes ForwardAgent yes EOF
2. Install Docker
-
Install Docker Desktop.
-
Linux only:
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
-
Linux only:
3. Install VS Code or PyCharm
- Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
- Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments
The following development environments are supported:
- βοΈ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
- βοΈ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
- Dev Container: clone this repository, open it with VS Code, and run Ctrl/β + β§ + P β Dev Containers: Reopen in Container.
-
PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
dev
service. -
Terminal: clone this repository, open it with your terminal, and run
docker compose up --detach dev
to start a Dev Container in the background, and then rundocker compose exec dev zsh
to open a shell prompt in the Dev Container.
Developing
- This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
- Run
poe
from within the development environment to print a list of Poe the Poet tasks available to run on this project. - Run
poetry add {package}
from within the development environment to install a run time dependency and add it topyproject.toml
andpoetry.lock
. Add--group test
or--group dev
to install a CI or development dependency, respectively. - Run
poetry update
from within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml
. - Run
cz bump
to bump the package's version, update theCHANGELOG.md
, and create a git tag.
-
We use PyNNDescent until sqlite-vec is more mature. β©
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for raglite
Similar Open Source Tools
![raglite Screenshot](/screenshots_githubs/superlinear-ai-raglite.jpg)
raglite
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.
![llmgraph Screenshot](/screenshots_githubs/dylanhogg-llmgraph.jpg)
llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.
![labo Screenshot](/screenshots_githubs/sakamotoktr-labo.jpg)
labo
LABO is a time series forecasting and analysis framework that integrates pre-trained and fine-tuned LLMs with multi-domain agent-based systems. It allows users to create and tune agents easily for various scenarios, such as stock market trend prediction and web public opinion analysis. LABO requires a specific runtime environment setup, including system requirements, Python environment, dependency installations, and configurations. Users can fine-tune their own models using LABO's Low-Rank Adaptation (LoRA) for computational efficiency and continuous model updates. Additionally, LABO provides a Python library for building model training pipelines and customizing agents for specific tasks.
![oasis Screenshot](/screenshots_githubs/camel-ai-oasis.jpg)
oasis
OASIS is a scalable, open-source social media simulator that integrates large language models with rule-based agents to realistically mimic the behavior of up to one million users on platforms like Twitter and Reddit. It facilitates the study of complex social phenomena such as information spread, group polarization, and herd behavior, offering a versatile tool for exploring diverse social dynamics and user interactions in digital environments. With features like scalability, dynamic environments, diverse action spaces, and integrated recommendation systems, OASIS provides a comprehensive platform for simulating social media interactions at a large scale.
![model2vec Screenshot](/screenshots_githubs/MinishLab-model2vec.jpg)
model2vec
Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance. It outperforms other static embedding models like GLoVe and BPEmb, is lightweight with only `numpy` as a major dependency, offers fast inference, dataset-free distillation, and is integrated into Sentence Transformers, txtai, and Chonkie. Model2Vec creates powerful models by passing a vocabulary through a sentence transformer model, reducing dimensionality using PCA, and weighting embeddings using zipf weighting. Users can distill their own models or use pre-trained models from the HuggingFace hub. Evaluation can be done using the provided evaluation package. Model2Vec is licensed under MIT.
![node-llama-cpp Screenshot](/screenshots_githubs/withcatai-node-llama-cpp.jpg)
node-llama-cpp
node-llama-cpp is a tool that allows users to run AI models locally on their machines. It provides pre-built bindings with the option to build from source using cmake. Users can interact with text generation models, chat with models using a chat wrapper, and force models to generate output in a parseable format like JSON. The tool supports Metal and CUDA, offers CLI functionality for chatting with models without coding, and ensures up-to-date compatibility with the latest version of llama.cpp. Installation includes pre-built binaries for macOS, Linux, and Windows, with the option to build from source if binaries are not available for the platform.
![paxml Screenshot](/screenshots_githubs/google-paxml.jpg)
paxml
Pax is a framework to configure and run machine learning experiments on top of Jax.
![pebblo Screenshot](/screenshots_githubs/daxa-ai-pebblo.jpg)
pebblo
Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organizationβs compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.
![llama_index Screenshot](/screenshots_githubs/run-llama-llama_index.jpg)
llama_index
LlamaIndex is a data framework for building LLM applications. It provides tools for ingesting, structuring, and querying data, as well as integrating with LLMs and other tools. LlamaIndex is designed to be easy to use for both beginner and advanced users, and it provides a comprehensive set of features for building LLM applications.
![wllama Screenshot](/screenshots_githubs/ngxson-wllama.jpg)
wllama
Wllama is a WebAssembly binding for llama.cpp, a high-performance and lightweight language model library. It enables you to run inference directly on the browser without the need for a backend or GPU. Wllama provides both high-level and low-level APIs, allowing you to perform various tasks such as completions, embeddings, tokenization, and more. It also supports model splitting, enabling you to load large models in parallel for faster download. With its Typescript support and pre-built npm package, Wllama is easy to integrate into your React Typescript projects.
![react-native-fast-tflite Screenshot](/screenshots_githubs/mrousavy-react-native-fast-tflite.jpg)
react-native-fast-tflite
A high-performance TensorFlow Lite library for React Native that utilizes JSI for power, zero-copy ArrayBuffers for efficiency, and low-level C/C++ TensorFlow Lite core API for direct memory access. It supports swapping out TensorFlow Models at runtime and GPU-accelerated delegates like CoreML/Metal/OpenGL. Easy VisionCamera integration allows for seamless usage. Users can load TensorFlow Lite models, interpret input and output data, and utilize GPU Delegates for faster computation. The library is suitable for real-time object detection, image classification, and other machine learning tasks in React Native applications.
![FlexFlow Screenshot](/screenshots_githubs/flexflow-FlexFlow.jpg)
FlexFlow
FlexFlow Serve is an open-source compiler and distributed system for **low latency**, **high performance** LLM serving. FlexFlow Serve outperforms existing systems by 1.3-2.0x for single-node, multi-GPU inference and by 1.4-2.4x for multi-node, multi-GPU inference.
![clarifai-python-grpc Screenshot](/screenshots_githubs/Clarifai-clarifai-python-grpc.jpg)
clarifai-python-grpc
This is the official Clarifai gRPC Python client for interacting with their recognition API. Clarifai offers a platform for data scientists, developers, researchers, and enterprises to utilize artificial intelligence for image, video, and text analysis through computer vision and natural language processing. The client allows users to authenticate, predict concepts in images, and access various functionalities provided by the Clarifai API. It follows a versioning scheme that aligns with the backend API updates and includes specific instructions for installation and troubleshooting. Users can explore the Clarifai demo, sign up for an account, and refer to the documentation for detailed information.
![Jlama Screenshot](/screenshots_githubs/tjake-Jlama.jpg)
Jlama
Jlama is a modern Java inference engine designed for large language models. It supports various model types such as Gemma, Llama, Mistral, GPT-2, BERT, and more. The tool implements features like Flash Attention, Mixture of Experts, and supports different model quantization formats. Built with Java 21 and utilizing the new Vector API for faster inference, Jlama allows users to add LLM inference directly to their Java applications. The tool includes a CLI for running models, a simple UI for chatting with LLMs, and examples for different model types.
![lhotse Screenshot](/screenshots_githubs/lhotse-speech-lhotse.jpg)
lhotse
Lhotse is a Python library designed to make speech and audio data preparation flexible and accessible. It aims to attract a wider community to speech processing tasks by providing a Python-centric design and an expressive command-line interface. Lhotse offers standard data preparation recipes, PyTorch Dataset classes for speech tasks, and efficient data preparation for model training with audio cuts. It supports data augmentation, feature extraction, and feature-space cut mixing. The tool extends Kaldi's data preparation recipes with seamless PyTorch integration, human-readable text manifests, and convenient Python classes.
![codellm-devkit Screenshot](/screenshots_githubs/IBM-codellm-devkit.jpg)
codellm-devkit
Codellm-devkit (CLDK) is a Python library that serves as a multilingual program analysis framework bridging traditional static analysis tools and Large Language Models (LLMs) specialized for code (CodeLLMs). It simplifies the process of analyzing codebases across multiple programming languages, enabling the extraction of meaningful insights and facilitating LLM-based code analysis. The library provides a unified interface for integrating outputs from various analysis tools and preparing them for effective use by CodeLLMs. Codellm-devkit aims to enable the development and experimentation of robust analysis pipelines that combine traditional program analysis tools and CodeLLMs, reducing friction in multi-language code analysis and ensuring compatibility across different tools and LLM platforms. It is designed to seamlessly integrate with popular analysis tools like WALA, Tree-sitter, LLVM, and CodeQL, acting as a crucial intermediary layer for efficient communication between these tools and CodeLLMs. The project is continuously evolving to include new tools and frameworks, maintaining its versatility for code analysis and LLM integration.
For similar tasks
![raglite Screenshot](/screenshots_githubs/superlinear-ai-raglite.jpg)
raglite
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.
![Co-LLM-Agents Screenshot](/screenshots_githubs/UMass-Foundation-Model-Co-LLM-Agents.jpg)
Co-LLM-Agents
This repository contains code for building cooperative embodied agents modularly with large language models. The agents are trained to perform tasks in two different environments: ThreeDWorld Multi-Agent Transport (TDW-MAT) and Communicative Watch-And-Help (C-WAH). TDW-MAT is a multi-agent environment where agents must transport objects to a goal position using containers. C-WAH is an extension of the Watch-And-Help challenge, which enables agents to send messages to each other. The code in this repository can be used to train agents to perform tasks in both of these environments.
![GPT4Point Screenshot](/screenshots_githubs/Pointcept-GPT4Point.jpg)
GPT4Point
GPT4Point is a unified framework for point-language understanding and generation. It aligns 3D point clouds with language, providing a comprehensive solution for tasks such as 3D captioning and controlled 3D generation. The project includes an automated point-language dataset annotation engine, a novel object-level point cloud benchmark, and a 3D multi-modality model. Users can train and evaluate models using the provided code and datasets, with a focus on improving models' understanding capabilities and facilitating the generation of 3D objects.
![asreview Screenshot](/screenshots_githubs/asreview-asreview.jpg)
asreview
The ASReview project implements active learning for systematic reviews, utilizing AI-aided pipelines to assist in finding relevant texts for search tasks. It accelerates the screening of textual data with minimal human input, saving time and increasing output quality. The software offers three modes: Oracle for interactive screening, Exploration for teaching purposes, and Simulation for evaluating active learning models. ASReview LAB is designed to support decision-making in any discipline or industry by improving efficiency and transparency in screening large amounts of textual data.
![Groma Screenshot](/screenshots_githubs/FoundationVision-Groma.jpg)
Groma
Groma is a grounded multimodal assistant that excels in region understanding and visual grounding. It can process user-defined region inputs and generate contextually grounded long-form responses. The tool presents a unique paradigm for multimodal large language models, focusing on visual tokenization for localization. Groma achieves state-of-the-art performance in referring expression comprehension benchmarks. The tool provides pretrained model weights and instructions for data preparation, training, inference, and evaluation. Users can customize training by starting from intermediate checkpoints. Groma is designed to handle tasks related to detection pretraining, alignment pretraining, instruction finetuning, instruction following, and more.
![amber-train Screenshot](/screenshots_githubs/LLM360-amber-train.jpg)
amber-train
Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs. It is a 7B English language model with the LLaMA architecture. The model type is a language model with the same architecture as LLaMA-7B. It is licensed under Apache 2.0. The resources available include training code, data preparation, metrics, and fully processed Amber pretraining data. The model has been trained on various datasets like Arxiv, Book, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia. The hyperparameters include a total of 6.7B parameters, hidden size of 4096, intermediate size of 11008, 32 attention heads, 32 hidden layers, RMSNorm Ξ΅ of 1e^-6, max sequence length of 2048, and a vocabulary size of 32000.
![kan-gpt Screenshot](/screenshots_githubs/AdityaNG-kan-gpt.jpg)
kan-gpt
The KAN-GPT repository is a PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling. It provides a model for generating text based on prompts, with a focus on improving performance compared to traditional MLP-GPT models. The repository includes scripts for training the model, downloading datasets, and evaluating model performance. Development tasks include integrating with other libraries, testing, and documentation.
![LLM-SFT Screenshot](/screenshots_githubs/yongzhuo-LLM-SFT.jpg)
LLM-SFT
LLM-SFT is a Chinese large model fine-tuning tool that supports models such as ChatGLM, LlaMA, Bloom, Baichuan-7B, and frameworks like LoRA, QLoRA, DeepSpeed, UI, and TensorboardX. It facilitates tasks like fine-tuning, inference, evaluation, and API integration. The tool provides pre-trained weights for various models and datasets for Chinese language processing. It requires specific versions of libraries like transformers and torch for different functionalities.
For similar jobs
![weave Screenshot](/screenshots_githubs/wandb-weave.jpg)
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
![LLMStack Screenshot](/screenshots_githubs/trypromptly-LLMStack.jpg)
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
![VisionCraft Screenshot](/screenshots_githubs/VisionCraft-org-VisionCraft.jpg)
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
![kaito Screenshot](/screenshots_githubs/Azure-kaito.jpg)
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
![PyRIT Screenshot](/screenshots_githubs/Azure-PyRIT.jpg)
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
![tabby Screenshot](/screenshots_githubs/TabbyML-tabby.jpg)
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
![spear Screenshot](/screenshots_githubs/isl-org-spear.jpg)
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
![Magick Screenshot](/screenshots_githubs/Oneirocom-Magick.jpg)
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.