
sage
Chat with any codebase with 2 commands
Stars: 705

Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.
README:
Using pipx (recommended)
Make sure pipx is installed on your system (see instructions), then run:pipx install git+https://github.com/Storia-AI/sage.git@main
Using venv and pip
Alternatively, you can manually create a virtual environment and install Code Sage via pip:python -m venv sage-venv
source sage-venv/bin/activate
pip install git+https://github.com/Storia-AI/sage.git@main
sage
performs two steps:
- Indexes your codebase (requiring an embdder and a vector store)
- Enables chatting via LLM + RAG (requiring access to an LLM)
๐ป Running locally (lower quality)
-
To index the codebase locally, we use the open-source project Marqo, which is both an embedder and a vector store. To bring up a Marqo instance:
docker rm -f marqo docker pull marqoai/marqo:latest docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
This will open a persistent Marqo console window. This should take around 2-3 minutes on a fresh install.
-
To chat with an LLM locally, we use Ollama:
- Head over to ollama.com to download the appropriate binary for your machine.
- Open a new terminal window
- Pull the desired model, e.g.
ollama pull llama3.1
.
โ๏ธ Using external providers (higher quality)
-
For embeddings, we support OpenAI and Voyage. According to our experiments, OpenAI is better quality. Their batch API is also faster, with more generous rate limits. Export the API key of the desired provider:
export OPENAI_API_KEY=... # or export VOYAGE_API_KEY=...
-
We use Pinecone for the vector store, so you will need an API key:
export PINECONE_API_KEY=...
If you want to reuse an existing Pinecone index, specify it. Otherwise we'll create a new one called
sage
.export PINECONE_INDEX_NAME=...
-
For reranking, we support NVIDIA, Voyage, Cohere, and Jina.
- According to our experiments, NVIDIA performs best. To get an API key, follow these instructions. Note that NVIDIA's API keys are model-specific. We recommend using
nvidia/nv-rerankqa-mistral-4b-v3
. - Export the API key of the desired provider:
export NVIDIA_API_KEY=... # or export VOYAGE_API_KEY=... # or export COHERE_API_KEY=... # or export JINA_API_KEY=...
- According to our experiments, NVIDIA performs best. To get an API key, follow these instructions. Note that NVIDIA's API keys are model-specific. We recommend using
-
For chatting with an LLM, we support OpenAI and Anthropic. For the latter, set an additional API key:
export ANTHROPIC_API_KEY=...
For easier configuration, adapt the entries within the sample .sage-env
(change the API keys names based on your desired setup) and run:
source .sage-env
If you are planning on indexing GitHub issues in addition to the codebase, you will need a GitHub token:
export GITHUB_TOKEN=...
-
Select your desired repository:
export GITHUB_REPO=huggingface/transformers
-
Index the repository. This might take a few minutes, depending on its size.
sage-index $GITHUB_REPO
To use external providers instead of running locally, set
--mode=remote
. -
Chat with the repository, once it's indexed:
sage-chat $GITHUB_REPO
To use external providers instead of running locally, set
--mode=remote
.
- To get a public URL for your chat app, set
--share=true
. - You can overwrite the default settings (e.g. desired embedding model or LLM) via command line flags. Run
sage-index --help
orsage-chat --help
for a full list.
๐ Working with private repositories
To index and chat with a private repository, simply set the GITHUB_TOKEN
environment variable. To obtain this token, go to github.com > click on your profile icon > Settings > Developer settings > Personal access tokens. You can either make a fine-grained token for the desired repository, or a classic token.
export GITHUB_TOKEN=...
๐ ๏ธ Control which files get indexed
You can specify an inclusion or exclusion file in the following format:
# This is a comment
ext:.my-ext-1
ext:.my-ext-2
ext:.my-ext-3
dir:my-dir-1
dir:my-dir-2
dir:my-dir-3
file:my-file-1.md
file:my-file-2.py
file:my-file-3.cpp
where:
-
ext
specifies a file extension -
dir
specifies a directory. This is not a full path. For instance, if you specifydir:tests
in an exclusion directory, then a file like/path/to/my/tests/file.py
will be ignored. -
file
specifies a file name. This is also not a full path. For instance, if you specifyfile:__init__.py
, then a file like/path/to/my/__init__.py
will be ignored.
To specify an inclusion file (i.e. only index the specified files):
sage-index $GITHUB_REPO --include=/path/to/inclusion/file
To specify an exclusion file (i.e. index all files, except for the ones specified):
sage-index $GITHUB_REPO --exclude=/path/to/exclusion/file
By default, we use the exclusion file sample-exclude.txt.
๐ Index open GitHub issues
You will need a GitHub token first:
export GITHUB_TOKEN=...
To index GitHub issues without comments:
sage-index $GITHUB_REPO --index-issues
To index GitHub issues with comments:
sage-index $GITHUB_REPO --index-issues --index-issue-comments
To index GitHub issues, but not the codebase:
sage-index $GITHUB_REPO --index-issues --no-index-repo
๐ Experiment with retrieval strategies
Retrieving the right files from the vector database is arguably the quality bottleneck of the system. We are actively experimenting with various retrieval strategies and documenting our findings here.
Currently, we support the following types of retrieval:
-
Vanilla RAG from a vector database (nearest neighbor between dense embeddings). This is the default.
-
Hybrid RAG that combines dense retrieval (embeddings-based) with sparse retrieval (BM25). Use
--retrieval-alpha
to weigh the two strategies.- A value of 1 means dense-only retrieval and 0 means BM25-only retrieval.
- Note this is not available when running locally, only when using Pinecone as a vector store.
- Contrary to Anthropic's findings, we find that BM25 is actually damaging performance on codebases, because it gives undeserved advantage to Markdown files.
-
Multi-query retrieval performs multiple query rewrites, makes a separate retrieval call for each, and takes the union of the retrieved documents. You can activate it by passing
--multi-query-retrieval
. This can be combined with both vanilla and hybrid RAG.- We find that on our benchmark this only marginally improves retrieval quality (from 0.44 to 0.46 R-precision) while being significantly slower and more expensive due to LLM calls. But your mileage may vary.
-
LLM-only retrieval completely circumvents indexing the codebase. We simply enumerate all file paths and pass them to an LLM together with the user query. We ask the LLM which files are likely to be relevant for the user query, solely based on their filenames. You can activate it by passing
--llm-retriever
.- We find that on our benchmark the performance is comparable with vector database solutions (R-precision is 0.44 for both). This is quite remarkable, since we've saved so much effort by not indexing the codebase. However, we are reluctant to claim that these findings generalize, for the following reasons:
- Our (artificial) dataset occasionally contains explicit path names in the query, making it trivial for the LLM. Sample query: "Alice is managing a series of machine learning experiments. Please explain in detail how
main
inexamples/pytorch/image-pretraining/run_mim.py
allows her to organize the outputs of each experiment in separate directories." - Our benchmark focuses on the Transformers library, which is well-maintained and the file paths are often meaningful. This might not be the case for all codebases.
- Our (artificial) dataset occasionally contains explicit path names in the query, making it trivial for the LLM. Sample query: "Alice is managing a series of machine learning experiments. Please explain in detail how
- We find that on our benchmark the performance is comparable with vector database solutions (R-precision is 0.44 for both). This is quite remarkable, since we've saved so much effort by not indexing the codebase. However, we are reluctant to claim that these findings generalize, for the following reasons:
Sometimes you just want to learn how a codebase works and how to integrate it, without spending hours sifting through the code itself.
sage
is like an open-source GitHub Copilot with the most up-to-date information about your repo.
Features:
- Dead-simple set-up. Run two scripts and you have a functional chat interface for your code. That's really it.
- Heavily documented answers. Every response shows where in the code the context for the answer was pulled from. Let's build trust in the AI.
- Runs locally or on the cloud.
- Plug-and-play. Want to improve the algorithms powering the code understanding/generation? We've made every component of the pipeline easily swappable. Google-grade engineering standards allow you to customize to your heart's content.
- 2024-09-16: Renamed
repo2vec
tosage
. - 2024-09-03: Support for indexing GitHub issues.
- 2024-08-30: Support for running everything locally (Marqo for embeddings, Ollama for LLMs).
We're working to make all code on the internet searchable and understandable for devs. You can check out our early product, Code Sage. We pre-indexed a slew of OSS repos, and you can index your desired ones by simply pasting a GitHub URL.
If you're the maintainer of an OSS repo and would like a dedicated page on Code Sage (e.g. sage.storia.ai/your-repo
), then send us a message at [email protected]. We'll do it for free!
We built the code purposefully modular so that you can plug in your desired embeddings, LLM and vector stores providers by simply implementing the relevant abstract classes.
Feel free to send feature requests to [email protected] or make a pull request!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for sage
Similar Open Source Tools

sage
Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.

safety-tooling
This repository, safety-tooling, is designed to be shared across various AI Safety projects. It provides an LLM API with a common interface for OpenAI, Anthropic, and Google models. The aim is to facilitate collaboration among AI Safety researchers, especially those with limited software engineering backgrounds, by offering a platform for contributing to a larger codebase. The repo can be used as a git submodule for easy collaboration and updates. It also supports pip installation for convenience. The repository includes features for installation, secrets management, linting, formatting, Redis configuration, testing, dependency management, inference, finetuning, API usage tracking, and various utilities for data processing and experimentation.

gpt-cli
gpt-cli is a command-line interface tool for interacting with various chat language models like ChatGPT, Claude, and others. It supports model customization, usage tracking, keyboard shortcuts, multi-line input, markdown support, predefined messages, and multiple assistants. Users can easily switch between different assistants, define custom assistants, and configure model parameters and API keys in a YAML file for easy customization and management.

vector-inference
This repository provides an easy-to-use solution for running inference servers on Slurm-managed computing clusters using vLLM. All scripts in this repository run natively on the Vector Institute cluster environment. Users can deploy models as Slurm jobs, check server status and performance metrics, and shut down models. The repository also supports launching custom models with specific configurations. Additionally, users can send inference requests and set up an SSH tunnel to run inference from a local device.

garak
Garak is a vulnerability scanner designed for LLMs (Large Language Models) that checks for various weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. It combines static, dynamic, and adaptive probes to explore vulnerabilities in LLMs. Garak is a free tool developed for red-teaming and assessment purposes, focusing on making LLMs or dialog systems fail. It supports various LLM models and can be used to assess their security and robustness.

garak
Garak is a free tool that checks if a Large Language Model (LLM) can be made to fail in a way that is undesirable. It probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. Garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

HuggingFaceGuidedTourForMac
HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.

just-chat
Just-Chat is a containerized application that allows users to easily set up and chat with their AI agent. Users can customize their AI assistant using a YAML file, add new capabilities with Python tools, and interact with the agent through a chat web interface. The tool supports various modern models like DeepSeek Reasoner, ChatGPT, LLAMA3.3, etc. Users can also use semantic search capabilities with MeiliSearch to find and reference relevant information based on meaning. Just-Chat requires Docker or Podman for operation and provides detailed installation instructions for both Linux and Windows users.

ai-starter-kit
SambaNova AI Starter Kits is a collection of open-source examples and guides designed to facilitate the deployment of AI-driven use cases for developers and enterprises. The kits cover various categories such as Data Ingestion & Preparation, Model Development & Optimization, Intelligent Information Retrieval, and Advanced AI Capabilities. Users can obtain a free API key using SambaNova Cloud or deploy models using SambaStudio. Most examples are written in Python but can be applied to any programming language. The kits provide resources for tasks like text extraction, fine-tuning embeddings, prompt engineering, question-answering, image search, post-call analysis, and more.

reader
Reader is a tool that converts any URL to an LLM-friendly input with a simple prefix `https://r.jina.ai/`. It improves the output for your agent and RAG systems at no cost. Reader supports image reading, captioning all images at the specified URL and adding `Image [idx]: [caption]` as an alt tag. This enables downstream LLMs to interact with the images in reasoning, summarizing, etc. Reader offers a streaming mode, useful when the standard mode provides an incomplete result. In streaming mode, Reader waits a bit longer until the page is fully rendered, providing more complete information. Reader also supports a JSON mode, which contains three fields: `url`, `title`, and `content`. Reader is backed by Jina AI and licensed under Apache-2.0.

fabric
Fabric is an open-source framework for augmenting humans using AI. It provides a structured approach to breaking down problems into individual components and applying AI to them one at a time. Fabric includes a collection of pre-defined Patterns (prompts) that can be used for a variety of tasks, such as extracting the most interesting parts of YouTube videos and podcasts, writing essays, summarizing academic papers, creating AI art prompts, and more. Users can also create their own custom Patterns. Fabric is designed to be easy to use, with a command-line interface and a variety of helper apps. It is also extensible, allowing users to integrate it with their own AI applications and infrastructure.

frontend
Nuclia frontend apps and libraries repository contains various frontend applications and libraries for the Nuclia platform. It includes components such as Dashboard, Widget, SDK, Sistema (design system), NucliaDB admin, CI/CD Deployment, and Maintenance page. The repository provides detailed instructions on installation, dependencies, and usage of these components for both Nuclia employees and external developers. It also covers deployment processes for different components and tools like ArgoCD for monitoring deployments and logs. The repository aims to facilitate the development, testing, and deployment of frontend applications within the Nuclia ecosystem.

hash
HASH is a self-building, open-source database which grows, structures and checks itself. With it, we're creating a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways.

dir-assistant
Dir-assistant is a tool that allows users to interact with their current directory's files using local or API Language Models (LLMs). It supports various platforms and provides API support for major LLM APIs. Users can configure and customize their local LLMs and API LLMs using the tool. Dir-assistant also supports model downloads and configurations for efficient usage. It is designed to enhance file interaction and retrieval using advanced language models.

generative-models
Generative Models by Stability AI is a repository that provides various generative models for research purposes. It includes models like Stable Video 4D (SV4D) for video synthesis, Stable Video 3D (SV3D) for multi-view synthesis, SDXL-Turbo for text-to-image generation, and more. The repository focuses on modularity and implements a config-driven approach for building and combining submodules. It supports training with PyTorch Lightning and offers inference demos for different models. Users can access pre-trained models like SDXL-base-1.0 and SDXL-refiner-1.0 under a CreativeML Open RAIL++-M license. The codebase also includes tools for invisible watermark detection in generated images.

mark
Mark is a CLI tool that allows users to interact with large language models (LLMs) using Markdown format. It enables users to seamlessly integrate GPT responses into Markdown files, supports image recognition, scraping of local and remote links, and image generation. Mark focuses on using Markdown as both a prompt and response medium for LLMs, offering a unique and flexible way to interact with language models for various use cases in development and documentation processes.
For similar tasks

chat-with-code
Chat-with-code is a codebase chatbot that enables users to interact with their codebase using the OpenAI Language Model. It provides a user-friendly chat interface where users can ask questions and interact with their code. The tool clones, chunks, and embeds the codebase, allowing for natural language interactions. It is designed to assist users in exploring and understanding their codebase more intuitively.

Devon
Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.

sage
Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.

brokk
Brokk is a code assistant designed to understand code semantically, allowing LLMs to work effectively on large codebases. It offers features like agentic search, summarizing related classes, parsing stack traces, adding source for usages, and autonomously fixing errors. Users can interact with Brokk through different panels and commands, enabling them to manipulate context, ask questions, search codebase, run shell commands, and more. Brokk helps with tasks like debugging regressions, exploring codebase, AI-powered refactoring, and working with dependencies. It is particularly useful for making complex, multi-file edits with o1pro.

ai-cookbook
The AI Cookbook is a collection of examples and tutorials designed to assist developers in building AI systems. It provides ready-to-use code snippets that can be easily integrated into various projects. The content covers practical guidance on creating AI solutions that are functional in real-world scenarios. The repository aims to support learners, freelancers, and businesses seeking AI expertise by offering valuable resources and insights.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerโs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.