r2ai
local language model for radare2
Stars: 118
r2ai is a tool designed to run a language model locally without internet access. It can be used to entertain users or assist in answering questions related to radare2 or reverse engineering. The tool allows users to prompt the language model, index large codebases, slurp file contents, embed the output of an r2 command, define different system-level assistant roles, set environment variables, and more. It is accessible as an r2lang-python plugin and can be scripted from various languages. Users can use different models, adjust query templates dynamically, load multiple models, and make them communicate with each other.
README:
,______ .______ .______ ,___
: __ \ \____ |: \ : __|
| \____|/ ____|| _,_ || : |
| : \ \ . || : || |
| |___\ \__:__||___| || |
|___| : |___||___|
*
Run a language model to entertain you or help answering questions about radare2 or reverse engineering in general. The language model may be local (running without Internet on your host) or remote (e.g if you have an API key). Note that models used by r2ai are pulled from external sources which may behave different or respond unreliable information. That's why there's an ongoing effort into improving the post-finetuning using memgpt-like techniques which can't get better without your help!
R2AI is structured into four independent components:
- r2ai (wip r2ai-native rewrite in C)
- r2-like repl using r2pipe to comunicate with r2
- supports auto solving mode
- client and server openapi protocol
- download and manage models from huggingface
- decai
- lightweight r2js plugin
- focus on decompilation
- talks to r2ai, r2ai-server, openai, anthropic or ollama
-
r2ai-plugin
- not recommended because of python versions pain. Hopefully soon re-written in C.
- requires r2lang-python
- adds r2ai command inside r2
- r2ai-server
- list and select models downloaded from r2ai
- simple cli tool to start local openapi webservers
- supports llamafile, llamacpp, r2ai-w and kobaldcpp
- Support Auto mode to solve tasks using function calling
- Use local and remote language models (llama, ollama, openai, anthropic, ..)
- Support OpenAI, Anthropic, Bedrock
- Index large codebases or markdown books using a vector database
- Slurp file and perform actions on that
- Embed the output of an r2 command and resolve questions on the given data
- Define different system-level assistant role
- Set environment variables to provide context to the language model
- Live with repl and batch mode from cli or r2 prompt
- Scriptable from python, bash, r2pipe, and javascript (r2papi)
- Use different models, dynamically adjust query template
- Load multiple models and make them talk between them
Install the various components via r2pm
:
r2pm -ci r2ai
r2pm -ci decai
r2pm -ci r2ai-server
Running make
will setup a python virtual environment in the current directory installing all the necessary dependencies and will get into a shell to run r2ai.
The installation is now splitted into two different targets:
-
make install
will place a symlink in$BINDIR/r2ai
-
make install-decai
will install the decai r2js decompiler plugin -
make install-server
will install the decai r2js decompiler plugin
On Windows you may follow the same instructions, just ensure you have the right python environment ready and create the venv to use
git clone https://github.com/radareorg/r2ai
cd r2ai
set PATH=C:\Users\YOURUSERNAME\Local\Programs\Python\Python39\;%PATH%
python3 -m pip install .
python3 main.py
Launch r2ai:
- If you installed via r2pm, you can execute it like this:
r2pm -r r2ai
- Otherwise,
./r2ai.sh
Selecting the model:
- List all downloaded models:
-m
- Get a short list of models:
-MM
- Help:
-h
Example using Claude 3.5 Sonnet 20241022:
First, put your API key in ~/.r2ai.anthropic-key
:
$ cat ~/.r2ai.anthropic-key
sk-ant-api03-CENSORED
Then, launch r2ai, select a model and ask questions to the AI:
[r2ai:0x00006aa0]> -m anthropic:claude-3-5-sonnet-20241022
[r2ai:0x00006aa0]> compute 4+5
4 + 5 = 9
Example using ChatGPT 4
Put your API key in ~/.r2ai.openai-key
. Then, launch r2ai, select the model and question the AI:
[r2ai:0x00006aa0]> -m openai:gpt-4
[r2ai:0x00006aa0]> draw me a pancake in ASCII art
Sure, here's a simple ASCII pancake:
_____
( )
( )
-----
Example using a free local AI: Mistral 7B v0.2
Launch r2ai, select the model and ask a question. If the model isn't downloaded yet, r2ai will ask you which precise version to download.
[r2ai:0x00006aa0]> -m TheBloke/Mistral-7B-Instruct-v0.2-GGUF
[r2ai:0x00006aa0]> give me a short algorithm to test prime numbers
Select TheBloke/Mistral-7B-Instruct-v0.2-GGUF model. See -M and -m flags
[?] Quality (smaller is faster):
> Small | Size: 2.9 GB, Estimated RAM usage: 5.4 GB
Medium | Size: 3.9 GB, Estimated RAM usage: 6.4 GB
Large | Size: 7.2 GB, Estimated RAM usage: 9.7 GB
See More
[?] Quality (smaller is faster):
> mistral-7b-instruct-v0.2.Q2_K.gguf | Size: 2.9 GB, Estimated RAM usage: 5.4 GB
mistral-7b-instruct-v0.2.Q3_K_L.gguf | Size: 3.6 GB, Estimated RAM usage: 6.1 GB
mistral-7b-instruct-v0.2.Q3_K_M.gguf | Size: 3.3 GB, Estimated RAM usage: 5.8 GB
mistral-7b-instruct-v0.2.Q3_K_S.gguf | Size: 2.9 GB, Estimated RAM usage: 5.4 GB
mistral-7b-instruct-v0.2.Q4_0.gguf | Size: 3.8 GB, Estimated RAM usage: 6.3 GB
mistral-7b-instruct-v0.2.Q4_K_M.gguf | Size: 4.1 GB, Estimated RAM usage: 6.6 GB
mistral-7b-instruct-v0.2.Q4_K_S.gguf | Size: 3.9 GB, Estimated RAM usage: 6.4 GB
mistral-7b-instruct-v0.2.Q5_0.gguf | Size: 4.7 GB, Estimated RAM usage: 7.2 GB
mistral-7b-instruct-v0.2.Q5_K_M.gguf | Size: 4.8 GB, Estimated RAM usage: 7.3 GB
mistral-7b-instruct-v0.2.Q5_K_S.gguf | Size: 4.7 GB, Estimated RAM usage: 7.2 GB
mistral-7b-instruct-v0.2.Q6_K.gguf | Size: 5.5 GB, Estimated RAM usage: 8.0 GB
mistral-7b-instruct-v0.2.Q8_0.gguf | Size: 7.2 GB, Estimated RAM usage: 9.7 GB
[?] Use this model by default? ~/.r2ai.model:
> Yes
No
[?] Download to ~/.local/share/r2ai/models? (Y/n): Y
Here's a simple algorithm to test if a number is likely prime using the trial division method, which checks if a
number is divisible by smaller prime numbers up to its square root:
1. Input the number `n` to be checked.
2. If `n` is less than 2, it is not a prime number.
3. If `n` is equal to 2, it is a prime number.
4. If `n` is even, it is not a prime number (unless it is equal to 2).
5. For each prime number `p` from 3 to the square root of `n`, do the following:
a. If `n` is divisible by `p`, it is not a prime number.
b. If `n` is not divisible by `p`, go to the next prime number.
6. If the loop completes without finding a divisor, then `n` is a prime number.
This algorithm is not foolproof, as it can only determine if a number is likely prime, not definitely prime. For
example, it cannot determine if 15 is a prime number, but it can determine that 29 is a prime number. For larger
numbers, more sophisticated algorithms like the Miller-Rabin primality test or the AKS primality test are
required.
- Get usage:
r2pm -r r2ai-server
- List available servers:
r2pm -r r2ai-server -l
- List available models:
r2pm -r r2ai-server -m
On Linux, models are stored in ~/.r2ai.models/
. File ~/.r2ai.model
lists the default model and other models.
Example launching a local Mistral AI server:
$ r2pm -r r2ai-server -l r2ai -m mistral-7b-instruct-v0.2.Q2_K
[12/13/24 10:35:22] INFO r2ai.server - INFO - [R2AI] Serving at port 8080 web.py:336
Decai is used from r2
(e.g r2 ./mybinary
). Get help with decai -h
:
[0x00406cac]> decai -h
Usage: decai (-h) ...
decai -H - help setting up r2ai
decai -d [f1 ..] - decompile given functions
decai -dr - decompile function and its called ones (recursive)
decai -dd [..] - same as above, but ignoring cache
decai -D [query] - decompile current function with given extra query
decai -e - display and change eval config vars
decai -h - show this help
decai -i [f] [q] - include given file and query
decai -n - suggest better function name
decai -q [text] - query language model with given text
decai -Q [text] - query on top of the last output
decai -r - change role prompt (same as: decai -e prompt)
decai -s - function signature
decai -v - show local variables
decai -V - find vulnerabilities
decai -x - eXplain current function
List configuration variables with decai -e
:
[0x00406cac]> decai -e
decai -e api=r2
decai -e host=http://localhost
decai -e port=8080
decai -e prompt=Rewrite this function and respond ONLY with code, NO explanations, NO markdown, Change 'goto' into if/else/for/while, Simplify as much as possible, use better variable names, take function arguments and and strings from comments like 'string:'
decai -e ctxfile=
decai -e cmds=pdc
decai -e cache=false
decai -e lang=C
decai -e hlang=English
decai -e debug=false
decai -e model=
List possible APIs to discuss with AI: decai -e api=?
:
[0x00406cac]> decai -e api=?
r2ai
claude
openapi
openai
gemini
xai
hf
For example, assuming we have a local Mistral AI server running on port 8080 with r2ai-server
, we can decompile a given function with decai -d
.
The server shows it received the question:
GET
CUSTOM
RUNLINE: -R
127.0.0.1 - - [13/Dec/2024 10:40:49] "GET /cmd/-R HTTP/1.1" 200 -
GET
CUSTOM
RUNLINE: -i /tmp/.pdc.txt Rewrite this function and respond ONLY with code, NO explanations, NO markdown, Change goto into if/else/for/while, Simplify as much as possible, use better variable names, take function arguments and and strings from comments like string:. Transform this pseudocode into C
Example with ChatGPT 4:
[0x00406cac]> decai -e api=openai
[0x00406cac]> decai -d
#include <stdio.h>
#include <unistd.h>
void daemonize() {
daemon(1, 0);
}
...
Just run make
- add "undo" command to drop the last message
- dump / restore conversational states (see -L command)
- Implement
~
,|
and>
and other r2shell features
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for r2ai
Similar Open Source Tools
r2ai
r2ai is a tool designed to run a language model locally without internet access. It can be used to entertain users or assist in answering questions related to radare2 or reverse engineering. The tool allows users to prompt the language model, index large codebases, slurp file contents, embed the output of an r2 command, define different system-level assistant roles, set environment variables, and more. It is accessible as an r2lang-python plugin and can be scripted from various languages. Users can use different models, adjust query templates dynamically, load multiple models, and make them communicate with each other.
AGiXT
AGiXT is a dynamic Artificial Intelligence Automation Platform engineered to orchestrate efficient AI instruction management and task execution across a multitude of providers. Our solution infuses adaptive memory handling with a broad spectrum of commands to enhance AI's understanding and responsiveness, leading to improved task completion. The platform's smart features, like Smart Instruct and Smart Chat, seamlessly integrate web search, planning strategies, and conversation continuity, transforming the interaction between users and AI. By leveraging a powerful plugin system that includes web browsing and command execution, AGiXT stands as a versatile bridge between AI models and users. With an expanding roster of AI providers, code evaluation capabilities, comprehensive chain management, and platform interoperability, AGiXT is consistently evolving to drive a multitude of applications, affirming its place at the forefront of AI technology.
DeepPavlov
DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
stark
STaRK is a large-scale semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. It provides natural-sounding and practical queries crafted to incorporate rich relational information and complex textual properties, closely mirroring real-life scenarios. The benchmark aims to assess how effectively large language models can handle the interplay between textual and relational requirements in queries, using three diverse knowledge bases constructed from public sources.
react-native-fast-tflite
A high-performance TensorFlow Lite library for React Native that utilizes JSI for power, zero-copy ArrayBuffers for efficiency, and low-level C/C++ TensorFlow Lite core API for direct memory access. It supports swapping out TensorFlow Models at runtime and GPU-accelerated delegates like CoreML/Metal/OpenGL. Easy VisionCamera integration allows for seamless usage. Users can load TensorFlow Lite models, interpret input and output data, and utilize GPU Delegates for faster computation. The library is suitable for real-time object detection, image classification, and other machine learning tasks in React Native applications.
mistral-inference
Mistral Inference repository contains minimal code to run 7B, 8x7B, and 8x22B models. It provides model download links, installation instructions, and usage guidelines for running models via CLI or Python. The repository also includes information on guardrailing, model platforms, deployment, and references. Users can interact with models through commands like mistral-demo, mistral-chat, and mistral-common. Mistral AI models support function calling and chat interactions for tasks like testing models, chatting with models, and using Codestral as a coding assistant. The repository offers detailed documentation and links to blogs for further information.
doc-comments-ai
doc-comments-ai is a tool designed to automatically generate code documentation using language models. It allows users to easily create documentation comment blocks for methods in various programming languages such as Python, Typescript, Javascript, Java, Rust, and more. The tool supports both OpenAI and local LLMs, ensuring data privacy and security. Users can generate documentation comments for methods in files, inline comments in method bodies, and choose from different models like GPT-3.5-Turbo, GPT-4, and Azure OpenAI. Additionally, the tool provides support for Treesitter integration and offers guidance on selecting the appropriate model for comprehensive documentation needs.
jina
Jina is a tool that allows users to build multimodal AI services and pipelines using cloud-native technologies. It provides a Pythonic experience for serving ML models and transitioning from local deployment to advanced orchestration frameworks like Docker-Compose, Kubernetes, or Jina AI Cloud. Users can build and serve models for any data type and deep learning framework, design high-performance services with easy scaling, serve LLM models while streaming their output, integrate with Docker containers via Executor Hub, and host on CPU/GPU using Jina AI Cloud. Jina also offers advanced orchestration and scaling capabilities, a smooth transition to the cloud, and easy scalability and concurrency features for applications. Users can deploy to their own cloud or system with Kubernetes and Docker Compose integration, and even deploy to JCloud for autoscaling and monitoring.
BentoML
BentoML is an open-source model serving library for building performant and scalable AI applications with Python. It comes with everything you need for serving optimization, model packaging, and production deployment.
screen-pipe
Screen-pipe is a Rust + WASM tool that allows users to turn their screen into actions using Large Language Models (LLMs). It enables users to record their screen 24/7, extract text from frames, and process text and images for tasks like analyzing sales conversations. The tool is still experimental and aims to simplify the process of recording screens, extracting text, and integrating with various APIs for tasks such as filling CRM data based on screen activities. The project is open-source and welcomes contributions to enhance its functionalities and usability.
code2prompt
Code2Prompt is a powerful command-line tool that generates comprehensive prompts from codebases, designed to streamline interactions between developers and Large Language Models (LLMs) for code analysis, documentation, and improvement tasks. It bridges the gap between codebases and LLMs by converting projects into AI-friendly prompts, enabling users to leverage AI for various software development tasks. The tool offers features like holistic codebase representation, intelligent source tree generation, customizable prompt templates, smart token management, Gitignore integration, flexible file handling, clipboard-ready output, multiple output options, and enhanced code readability.
shellChatGPT
ShellChatGPT is a shell wrapper for OpenAI's ChatGPT, DALL-E, Whisper, and TTS, featuring integration with LocalAI, Ollama, Gemini, Mistral, Groq, and GitHub Models. It provides text and chat completions, vision, reasoning, and audio models, voice-in and voice-out chatting mode, text editor interface, markdown rendering support, session management, instruction prompt manager, integration with various service providers, command line completion, file picker dialogs, color scheme personalization, stdin and text file input support, and compatibility with Linux, FreeBSD, MacOS, and Termux for a responsive experience.
gpt-translate
Markdown Translation BOT is a GitHub action that translates markdown files into multiple languages using various AI models. It supports markdown, markdown-jsx, and json files only. The action can be executed by individuals with write permissions to the repository, preventing API abuse by non-trusted parties. Users can set up the action by providing their API key and configuring the workflow settings. The tool allows users to create comments with specific commands to trigger translations and automatically generate pull requests or add translated files to existing pull requests. It supports multiple file translations and can interpret any language supported by GPT-4 or GPT-3.5.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
Vitron
Vitron is a unified pixel-level vision LLM designed for comprehensive understanding, generating, segmenting, and editing static images and dynamic videos. It addresses challenges in existing vision LLMs such as superficial instance-level understanding, lack of unified support for images and videos, and insufficient coverage across various vision tasks. The tool requires Python >= 3.8, Pytorch == 2.1.0, and CUDA Version >= 11.8 for installation. Users can deploy Gradio demo locally and fine-tune their models for specific tasks.
For similar tasks
r2ai
r2ai is a tool designed to run a language model locally without internet access. It can be used to entertain users or assist in answering questions related to radare2 or reverse engineering. The tool allows users to prompt the language model, index large codebases, slurp file contents, embed the output of an r2 command, define different system-level assistant roles, set environment variables, and more. It is accessible as an r2lang-python plugin and can be scripted from various languages. Users can use different models, adjust query templates dynamically, load multiple models, and make them communicate with each other.
lmql
LMQL is a programming language designed for large language models (LLMs) that offers a unique way of integrating traditional programming with LLM interaction. It allows users to write programs that combine algorithmic logic with LLM calls, enabling model reasoning capabilities within the context of the program. LMQL provides features such as Python syntax integration, rich control-flow options, advanced decoding techniques, powerful constraints via logit masking, runtime optimization, sync and async API support, multi-model compatibility, and extensive applications like JSON decoding and interactive chat interfaces. The tool also offers library integration, flexible tooling, and output streaming options for easy model output handling.
LLM_Web_search
LLM_Web_search project gives local LLMs the ability to search the web by outputting a specific command. It uses regular expressions to extract search queries from model output and then utilizes duckduckgo-search to search the web. LangChain's Contextual compression and Okapi BM25 or SPLADE are used to extract relevant parts of web pages in search results. The extracted results are appended to the model's output.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.