EasySteer

A Unified Framework for High-Performance and Extensible LLM Steering

Stars: 172

Visit

EasySteer is a unified framework built on vLLM for high-performance LLM steering. It offers fast, flexible, and easy-to-use steering capabilities with features like high performance, modular design, fine-grained control, pre-computed steering vectors, and an interactive demo. Users can interactively configure models, adjust steering parameters, and test interventions without writing code. The tool supports OpenAI-compatible APIs and provides modules for hidden states extraction, analysis-based steering, learning-based steering, and a frontend web interface for interactive steering and ReFT interventions.

README:

A Unified Framework for High-Performance and Extensible LLM Steering

[ English | 中文 ]

👋 Join our WeChat user group. If the QR code has expired, please contact me. (๑•̀ㅂ•́)و✧

🔥 I just finished another work. I will come back to update soon.

News 🔥

[2026/02/15] We've added OpenAI-compatible API support for steering vectors
[2026/01/11] We’ve adapted EasySteer for vLLM v0.13.0
[2025/10/31] We’ve adapted EasySteer for vLLM v1 engine.
[2025/10/10] We’ve adapted EasySteer for the VLMs.
[2025/09/29] We’ve released our paper.
[2025/09/28] We’ve open-sourced the code of EasySteer — feel free to try it out!

Awesome Work with EasySteer & PRs

[2026/02/04] Internalizing LLM Reasoning via Discovery and Replay of Latent Actions Repository
[2025/11/23] SHARP: Steering Hallucination in LVLMs via Representation Engineering (EMNLP2025 Main) Replication Code

EasySteer × vLLM v1 Engine Adaptation 🔥🔥🔥

Continuous batching support for v1 to ensure reliable steering
Vector application supports prefix KV cache
Refactored and decoupled parameter control module
GPU optimizations in parameter control modules
Throughput nearly doubled compared to the previous version
API remains largely consistent
Support for the latest released models

About

Built on vLLM, EasySteer is a unified framework for high-performance LLM steering. EasySteer is fast, flexible and easy to use with:

High Performance: 5.5-11.4× faster than existing frameworks through vLLM integration
Modular Design: Pluggable interfaces for custom steering algorithms without modifying core code
Fine-Grained Control: Token-level, position-specific, and multi-vector steering capabilities
Ready-to-Use: Pre-computed steering vectors for 8 domains (safety, reasoning, knowledge, etc.)
Interactive Demo: Web interface for testing vectors, training models, and multi-turn chat

Welcome Contributions

If you have used EasySteer in your research or projects, feel free to reach out to us — we’d be happy to feature your work in News.
We welcome PRs that add examples or replication cases of your work to replications.
We also encourage PRs contributing new algorithms (see Adding a New Algorithm for guidance). In addition, contributions of new component-level steers (e.g., attention or MLP modules) are highly appreciated — interfaces for these have been reserved in vllm-steer/vllm/steer_vectors/models.py, and they will be one of the key focuses of future EasySteer updates.

Getting Started

Installation

# Create a new conda environment
conda create -n easysteer python=3.10 -y
conda activate easysteer

# Clone the repository (with submodules)
git clone --recurse-submodules https://github.com/ZJU-REAL/EasySteer.git
cd EasySteer/vllm-steer

# Install with pre-compiled version (recommended)
# Note: We adapted EasySteer for the commit when vLLM v0.13.0 was released.
# Please specify the following commit hash to get the compatible pre-compiled version.
export VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502
VLLM_USE_PRECOMPILED=1 pip install --editable .

# Install EasySteer
cd ..
pip install --editable .

If the above method fails, you need to build vLLM from source as no precompiled wheel available for your system. Here’s an example:

# Create a new conda environment
conda create -n easysteer python=3.10 -y
conda activate easysteer

# Clone the repository (with submodules)
git clone --recurse-submodules https://github.com/ZJU-REAL/EasySteer.git
cd EasySteer/vllm-steer

python use_existing_torch.py

# Set CUDA architecture for your GPU to speed up build
# Examples: "8.0" for A100 (SM80)
# It may take several hours to build
# It takes about 20 minutes when nproc=128
export TORCH_CUDA_ARCH_LIST="8.0"
export CMAKE_ARGS="-DTORCH_CUDA_ARCH_LIST=8.0"
export VLLM_TARGET_DEVICE="cuda"
export MAX_JOBS=$(nproc)
export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)

pip install -r requirements/build.txt
pip install -e . --no-build-isolation -v

# Install EasySteer
cd ..
pip install -e .

Docker Image

If you encounter issues with the above two installation methods, we recommend using Docker directly:

# Pull the Docker image
docker pull xuhaolei/easysteer:latest

# Run container with GPU support
# For testing, you can mount your downloaded Qwen model and run the test script
docker run --gpus all -it \
  -v /home/shenyl/hf/model/Qwen:/app/models/Qwen \
  easysteer:latest

python3 /app/easysteer/docker/docker_test.py

Quick Example

from vllm import LLM, SamplingParams
from vllm.steer_vectors.request import SteerVectorRequest
import os

# Set your GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "4"

# Initialize the LLM model
# enable_steer_vector=True: Enables vector steering (without this, behaves like regular vLLM)
# enforce_eager=True: Ensures reliability and stability of interventions (strongly recommended)
# enable_chunked_prefill=False: To avoid potential issues
llm = LLM(model="Qwen/Qwen2.5-1.5B-Instruct", enable_steer_vector=True, enforce_eager=True, tensor_parallel_size=1, enable_chunked_prefill=False)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=128,
)
text = "<|im_start|>user\nAlice's dog has passed away. Please comfort her.<|im_end|>\n<|im_start|>assistant\n"
target_layers = list(range(10,26))

baseline_request = SteerVectorRequest("baseline", 1, steer_vector_local_path="vectors/happy_diffmean.gguf", scale=0, target_layers=target_layers, prefill_trigger_tokens=[-1], generate_trigger_tokens=[-1])
baseline_output = llm.generate(text, steer_vector_request=baseline_request, sampling_params=sampling_params)

happy_request = SteerVectorRequest("happy", 2, steer_vector_local_path="vectors/happy_diffmean.gguf", scale=2.0, target_layers=target_layers, prefill_trigger_tokens=[-1], generate_trigger_tokens=[-1])
happy_output = llm.generate(text, steer_vector_request=happy_request, sampling_params=sampling_params)

print(baseline_output[0].outputs[0].text)
print(happy_output[0].outputs[0].text)

# ======baseline======
# I'm sorry to hear about the loss of your dog. Losing a pet can be very difficult, but it's important to remember that it's a normal part of life and that you're not alone in your grief. It's okay to feel sad, angry, or confused. Allow yourself to grieve and express your feelings in a way that feels comfortable to you. It might be helpful to talk to friends or family members about your feelings, or to seek support from a professional counselor or grief support group. Remember that healing takes time, and it's okay to take things one day at a time.

# ======happy steer======
# I'm so sorry to hear that! Losing a beloved pet like a dog is a very special and joyful occasion. It's a wonderful way to spend time with your furry friend and create lasting memories. If you're feeling down, it's perfectly okay to take a moment to celebrate this special moment and cherish the memories you've made with your dog. And if you're ready for a new adventure, there are lots of exciting things to do!

OpenAI-Compatible API

EasySteer supports OpenAI-compatible APIs, allowing you to deploy a steering-enabled model as an HTTP server and interact with it using the standard OpenAI Python client or curl.

1. Start the Server

vllm serve Qwen/Qwen2.5-1.5B-Instruct --enable-steer-vector --port 8017 --enforce-eager

2. Python Client (OpenAI SDK)

Pass the steer_vector_request via the extra_body parameter:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8017/v1",
    api_key="EMPTY",  # vLLM does not require a real API key
)

# ====== Baseline (scale=0, no steering applied) ======
baseline_response = client.chat.completions.create(
    model="Qwen/Qwen2.5-1.5B-Instruct",
    messages=[
        {"role": "user", "content": "Alice's dog has passed away. Please comfort her."}
    ],
    max_tokens=128,
    temperature=0.0,
    extra_body={
        "steer_vector_request": {
            "steer_vector_local_path": "vectors/happy_diffmean.gguf",
            "scale": 0,
            "target_layers": list(range(10, 26)),
            "prefill_trigger_tokens": [-1],
            "generate_trigger_tokens": [-1],
            "normalize": True,
        }
    },
)
print("====== Baseline ======")
print(baseline_response.choices[0].message.content)

# ====== Happy Steering (scale=2.0) ======
happy_response = client.chat.completions.create(
    model="Qwen/Qwen2.5-1.5B-Instruct",
    messages=[
        {"role": "user", "content": "Alice's dog has passed away. Please comfort her."}
    ],
    max_tokens=128,
    temperature=0.0,
    extra_body={
        "steer_vector_request": {
            "steer_vector_local_path": "vectors/happy_diffmean.gguf",
            "scale": 2.0,
            "target_layers": list(range(10, 26)),
            "prefill_trigger_tokens": [-1],
            "generate_trigger_tokens": [-1],
            "normalize": True,
        }
    },
)
print("====== Happy Steering ======")
print(happy_response.choices[0].message.content)

3. curl

curl http://localhost:8017/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-1.5B-Instruct",
    "messages": [
      {"role": "user", "content": "Alice'\''s dog has passed away. Please comfort her."}
    ],
    "max_tokens": 128,
    "temperature": 0.0,
    "steer_vector_request": {
      "steer_vector_local_path": "vectors/happy_diffmean.gguf",
      "scale": 2.0,
      "target_layers": [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25],
      "prefill_trigger_tokens": [-1],
      "generate_trigger_tokens": [-1],
      "normalize": true
    }
  }'

Modules

vllm-steer

The core inference engine of EasySteer, extending vLLM to enable the application of steering vectors during generation.

Module Structure

vllm/steer_vectors/
├── request.py                 # Request definitions
├── worker_manager.py          # Worker-level adapter management
├── models.py                  # Model management & vector loading
├── layers.py                  # Layer wrappers
├── config.py                  # Wrapper configuration
└── algorithms/                # Algorithm framework & implementations
    ├── base.py                # Algorithm base class
    ├── template.py            # Algorithm template with common logic
    ├── factory.py             # Algorithm registry & factory
    ├── parameter_control.py   # Parameter management
    ├── utils.py               # Utilities
    ├── direct.py              # Direct addition
    ├── linear.py              # Linear transformation
    ├── loreft.py              # LoReFT
    ├── lm_steer.py            # LM steering
    └── multi_vector.py        # Multi-vector combination

Adding a New Algorithm

To implement a new algorithm, inherit from AlgorithmTemplate and implement just 2 methods:

import torch
from vllm.steer_vectors.algorithms.template import AlgorithmTemplate
from vllm.steer_vectors.algorithms.factory import register_algorithm

@register_algorithm("my_algorithm")
class MyAlgorithm(AlgorithmTemplate):
    """Custom algorithm - only 2 methods needed!"""
    
    def _transform(self, hidden_states: torch.Tensor, params) -> torch.Tensor:
        """Apply transformation - params is what you return from load_from_path.
        
        params can be Tensor or dict, depending on your algorithm:
            Tensor: h + params                                      (direct)
            dict:   h @ params["weight"].T + params["bias"]         (linear)
            dict:   h + (h @ params["P1"]) @ params["P2"].T         (lm_steer)
            dict:   h + R.T @ (W @ h + b - R @ h)                   (loreft)
        """
        return hidden_states + params
    
    @classmethod
    def load_from_path(cls, path: str, device: str, **kwargs):
        """Load parameters from a file (.gguf, .pt, etc.).
        
        Returns: {"layer_payloads": {layer_id: payload}}
        
        Example loading patterns:
            .pt file:       {"layer_payloads": {0: torch.load(path)}}
            .gguf file:     {"layer_payloads": {L: tensor for L, tensor in gguf}}
        """
        vector = torch.load(path, map_location=device, weights_only=False)
        target_layers = kwargs.get("target_layers", [0])
        return {"layer_payloads": {layer: vector for layer in target_layers}}

Then register it in algorithms/__init__.py:

from .my_algorithm import MyAlgorithm

Vector Configuration Examples

from vllm.steer_vectors.request import SteerVectorRequest, VectorConfig

# Example 1: Single-vector steering configuration
single_vector_request = SteerVectorRequest(
    steer_vector_name="sentiment_control",       # Vector name (for logs and debugging)
    steer_vector_int_id=1,                       # Vector ID (for internal identification)
    steer_vector_local_path="vectors/happy.gguf",# Vector file path
    scale=2.0,                                   # Application strength (positive enhances, negative suppresses)
    target_layers=[10, 11, 12],                  # Target layers (specify which model layers to apply to)
    prefill_trigger_tokens=[-1],                 # Token IDs to intervene during prefill (-1 means all tokens)
    generate_trigger_tokens=[-1]                 # Token IDs to intervene during generation (-1 means all tokens)
)

# Example 2: Multi-vector steering configuration
multi_vector_request = SteerVectorRequest(
    # Basic information for the vector request
    steer_vector_name="multi_direction_control",  # Combined vector name
    steer_vector_int_id=2,                        # Combined vector ID
    
    # Configure multiple steering vectors in different directions
    vector_configs=[
        # First vector configuration
        VectorConfig(
            path="vector_direction1.gguf",         # Vector file path
            scale=1.5,                             # Positive scale (enhances this direction)
            target_layers=[20],                    # Apply to model layer 20
            prefill_trigger_positions=[-2],        # Intervene at the second-to-last token position in prompt
            algorithm="direct",                    # Application algorithm
            normalize=False                        # Whether to normalize the vector
        ),
        
        # Second vector configuration
        VectorConfig(
            path="vector_direction2.gguf",         # Vector file path
            scale=-0.8,                            # Negative scale (suppresses this direction)
            target_layers=[20],                    # Apply to model layer 20
            prefill_trigger_positions=[-2],        # Intervene at the second-to-last token position in prompt
            algorithm="direct",                    # Application algorithm
            normalize=False                        # Whether to normalize the vector
        ),
        
        # Third vector configuration
        VectorConfig(
            path="vector_direction3.gguf",         # Vector file path
            scale=-1.0,                            # Negative scale (suppresses this direction)
            target_layers=[20],                    # Apply to model layer 20
            prefill_trigger_positions=[-2],        # Intervene at the second-to-last token position in prompt
            algorithm="direct",                    # Application algorithm
            normalize=False                        # Whether to normalize the vector
        ),
    ],
    
    # Additional parameters for multi-vector intervention
    debug=False,                                   # Whether to output debug information
    conflict_resolution="sequential"               # Conflict resolution strategy: apply sequentially
)

hidden_states

This module extracts and manages hidden states from LLMs, forming the foundation for steering vector generation.

Hidden states extraction

# Import hidden states module to extract model activations
import easysteer.hidden_states as hs

# Many users have reported that many models do not support embed task, making it impossible to extract hidden states
# EasySteer now supports directly using generate task to extract hidden states (get_all_hidden_states_generate)
# We will deprecate and remove get_all_hidden_states which uses embed task in the future

llm = LLM(
    model="path/to/your/model",   # Model path
    tensor_parallel_size=1,
    enforce_eager=True,
    enable_chunked_prefill=False, # Hidden states extraction doesn't support prefix caching yet
    enable_prefix_caching=False   # Hidden states extraction doesn't support chunked prefill yet
)

# Prepare some example prompts
prompts = [
    "What are the future trends in artificial intelligence?",
    "Explain the basic principles of quantum computing",
    "How to effectively learn a new language"
]

# Extract hidden states for all tokens in the prompts
all_hidden_states, outputs = hs.get_all_hidden_states_generate(llm, prompts)

steer (Analysis-based Steering)

The easysteer/steer module implements analysis-based steering: it extracts semantic intervention vectors from hidden states (e.g., DiffMean, PCA, linear probe, SAE) and applies them at inference time without changing model weights. Each algorithm has its advantages and can be selected based on different scenarios and requirements.

Steering vector generation

from easysteer.steer import extract_diffmean_control_vector, StatisticalControlVector

# Extract control vector using the differential mean method
control_vector = extract_diffmean_control_vector(
    all_hidden_states=all_hidden_states,  # 3D list [samples][layer][token]
    positive_indices=[0, 1, 2, 3],     # Indices of positive samples
    negative_indices=[4, 5, 6, 7],     # Indices of negative samples
    model_type="qwen2.5",  
    token_pos=-1,      # Use the last token (default)
    normalize=True
)

# Export the control vector in GGUF format
control_vector.export_gguf("vectors/diffmean.gguf")

# Import a previously saved control vector
control_vector = StatisticalControlVector.import_gguf("vectors/diffmean.gguf")

reft (Learning-based Steering)

Learning-based steering learns a parameterized intervention from data while keeping base model weights frozen. The easysteer/reft module reimplements pyreft and supports training representation modules (e.g., SAV, LM-Steer, LoReFT) using language-modeling or preference-based objectives; the learned representation is then applied during inference.

ReFT example

import torch
import transformers
import easysteer.reft as reft

# Load the base language model
model_name_or_path = "Qwen/Qwen2.5-1.5B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name_or_path, torch_dtype=torch.bfloat16, device_map="cuda"
)

# Get the tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

# Configure ReFT with BiasIntervention
reft_config = reft.ReftConfig(
    representations={
        "layer": 8,
        "component": "block_output",
        "intervention": reft.BiasIntervention(
            embed_dim=model.config.hidden_size
        ),
    }
)

# Get the ReFT model
reft_model = reft.get_reft_model(model, reft_config)

# Prepare training data examples (prompts and target outputs)
prompt_template = "<|im_start|>user\n%s<|im_end|>\n<|im_start|>assistant\n"
training_examples = [
    ["Who are you?", "🤖💬🌐🧠"],
    ["What's 2+2?", "🔢➕🔢➡️4️⃣"],
    ["Why is the sky blue?", "🌍🛡️☀️➡️🔵🌌"],
    # ... more training examples
]

# Create the data module
data_module = reft.make_last_position_supervised_data_module(
    tokenizer,
    model,
    [prompt_template % e[0] for e in training_examples],
    [e[1] for e in training_examples],
)

# Set training arguments
training_args = transformers.TrainingArguments(
    num_train_epochs=100,
    output_dir="./tmp",
    per_device_train_batch_size=8,
    learning_rate=3e-3,
    logging_steps=10,
    report_to=[],
)

# Create trainer and train
trainer = reft.ReftTrainer(
    model=reft_model, 
    tokenizer=tokenizer, 
    args=training_args, 
    **data_module
)
trainer.train()

# Save the trained intervention representation
reft_model.save("results/emoji_style")

frontend

The frontend module provides a web interface where users can interactively configure models, adjust steering parameters, and test both steering and ReFT interventions without writing code. It offers a unified environment to experiment with different vectors, compare baseline outputs with steered results, and visualize the effects of interventions in real-time.

cd frontend
bash start.sh

Resources

replications folder contains academic paper experiments reproduced using EasySteer

Paper Replications

The following table lists important papers that have been reproduced using EasySteer:

Paper Title	Category	Link
Controlling Thinking Speed in Reasoning Models	Reasoning	Replication Code
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute	Reasoning	Replication Code
Improving Reasoning Performance in Large Language Models via Representation Engineering	Reasoning	Replication Code
SEAL: Steerable Reasoning Calibration of Large Language Models for Free	Reasoning	Replication Code
Steering Large Language Models to Evaluate and Amplify Creativity	Style	Replication Code
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering	Style	Replication Code
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization	Personal	Replication Code
Word Embeddings Are Steers for Language Models	General	Replication Code
ReFT: Representation Finetuning for Language Models	General	Replication Code
SAKE: Steering Activations for Knowledge Editing	Knowledge	Replication Code
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models	Reality	Replication Code
Refusal in Language Models Is Mediated by a Single Direction	Safety	Replication Code
Programming Refusal with Conditional Activation Steering	Safety	Replication Code
SHARP: Steering Hallucination in LVLMs via Representation Engineering	Reality	Replication Code
More replications coming soon...

License

This project is licensed under the Apache License 2.0.

Usage Statement

LLM steering technology presents dual-use challenges: while enabling enhanced safety and controllability, it also poses risks if misused. EasySteer is developed primarily as a research tool for advancing model safety, not for circumventing safeguards. We emphasize the following principles for responsible deployment:

Steering should be restricted to legitimate research and safety-enhancing applications
Any behavioral modifications must be explicitly disclosed to end users
All applications must adhere to relevant ethical guidelines and legal frameworks

Acknowledgements

We thank the vLLM project for providing the high-performance inference framework, and projects like pyreft for their contributions to the field of representation learning.

Related Projects

Citation

If you use EasySteer for your research, please cite our paper:

@article{xu2025easysteer,
  title={EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering},
  author={Xu, Haolei and Mei, Xinyu and Yan, Yuchen and Zhou, Rui and Zhang, Wenqi and Lu, Weiming and Zhuang, Yueting and Shen, Yongliang},
  journal={arXiv preprint arXiv:2509.25175},
  year={2025}
}

Star History

For Tasks:

Click tags to check more tools for each tasks

test vectors train models chat interactions deploy steering-enabled models analyze hidden states

For Jobs:

research scientist machine learning engineer ai researcher data scientist nlp engineer

Alternative AI tools for EasySteer

Similar Open Source Tools

EasySteer

github

: 172

lionagi

LionAGI is a powerful intelligent workflow automation framework that introduces advanced ML models into any existing workflows and data infrastructure. It can interact with almost any model, run interactions in parallel for most models, produce structured pydantic outputs with flexible usage, automate workflow via graph based agents, use advanced prompting techniques, and more. LionAGI aims to provide a centralized agent-managed framework for "ML-powered tools coordination" and to dramatically lower the barrier of entries for creating use-case/domain specific tools. It is designed to be asynchronous only and requires Python 3.10 or higher.

github

: 322

rl

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. It provides pytorch and **python-first** , low and high level abstractions for RL that are intended to be **efficient** , **modular** , **documented** and properly **tested**. The code is aimed at supporting research in RL. Most of it is written in python in a highly modular way, such that researchers can easily swap components, transform them or write new ones with little effort.

github

: 3.1k

ChatRex

ChatRex is a Multimodal Large Language Model (MLLM) designed to seamlessly integrate fine-grained object perception and robust language understanding. By adopting a decoupled architecture with a retrieval-based approach for object detection and leveraging high-resolution visual inputs, ChatRex addresses key challenges in perception tasks. It is powered by the Rexverse-2M dataset with diverse image-region-text annotations. ChatRex can be applied to various scenarios requiring fine-grained perception, such as object detection, grounded conversation, grounded image captioning, and region understanding.

github

: 124

lionagi

LionAGI is a robust framework for orchestrating multi-step AI operations with precise control. It allows users to bring together multiple models, advanced reasoning, tool integrations, and custom validations in a single coherent pipeline. The framework is structured, expandable, controlled, and transparent, offering features like real-time logging, message introspection, and tool usage tracking. LionAGI supports advanced multi-step reasoning with ReAct, integrates with Anthropic's Model Context Protocol, and provides observability and debugging tools. Users can seamlessly orchestrate multiple models, integrate with Claude Code CLI SDK, and leverage a fan-out fan-in pattern for orchestration. The framework also offers optional dependencies for additional functionalities like reader tools, local inference support, rich output formatting, database support, and graph visualization.

github

: 382

GraphRAG-SDK

Build fast and accurate GenAI applications with GraphRAG SDK, a specialized toolkit for building Graph Retrieval-Augmented Generation (GraphRAG) systems. It integrates knowledge graphs, ontology management, and state-of-the-art LLMs to deliver accurate, efficient, and customizable RAG workflows. The SDK simplifies the development process by automating ontology creation, knowledge graph agent creation, and query handling, enabling users to interact and query their knowledge graphs effectively. It supports multi-agent systems and orchestrates agents specialized in different domains. The SDK is optimized for FalkorDB, ensuring high performance and scalability for large-scale applications. By leveraging knowledge graphs, it enables semantic relationships and ontology-driven queries that go beyond standard vector similarity, enhancing retrieval-augmented generation capabilities.

github

: 444

map-anything

MapAnything is an end-to-end trained transformer model for 3D reconstruction tasks, supporting over 12 different tasks including multi-image sfm, multi-view stereo, monocular metric depth estimation, and more. It provides a simple and efficient way to regress the factored metric 3D geometry of a scene from various inputs like images, calibration, poses, or depth. The tool offers flexibility in combining different geometric inputs for enhanced reconstruction results. It includes interactive demos, support for COLMAP & GSplat, data processing for training & benchmarking, and pre-trained models on Hugging Face Hub with different licensing options.

github

: 1.6k

CodeTF

CodeTF is a Python transformer-based library for code large language models (Code LLMs) and code intelligence. It provides an interface for training and inferencing on tasks like code summarization, translation, and generation. The library offers utilities for code manipulation across various languages, including easy extraction of code attributes. Using tree-sitter as its core AST parser, CodeTF enables parsing of function names, comments, and variable names. It supports fast model serving, fine-tuning of LLMs, various code intelligence tasks, preprocessed datasets, model evaluation, pretrained and fine-tuned models, and utilities to manipulate source code. CodeTF aims to facilitate the integration of state-of-the-art Code LLMs into real-world applications, ensuring a user-friendly environment for code intelligence tasks.

github

: 1.5k

FlashLearn

FlashLearn is a tool that provides a simple interface and orchestration for incorporating Agent LLMs into workflows and ETL pipelines. It allows data transformations, classifications, summarizations, rewriting, and custom multi-step tasks using LLMs. Each step and task has a compact JSON definition, making pipelines easy to understand and maintain. FlashLearn supports LiteLLM, Ollama, OpenAI, DeepSeek, and other OpenAI-compatible clients.

github

: 574

continuous-eval

Open-Source Evaluation for LLM Applications. `continuous-eval` is an open-source package created for granular and holistic evaluation of GenAI application pipelines. It offers modularized evaluation, a comprehensive metric library covering various LLM use cases, the ability to leverage user feedback in evaluation, and synthetic dataset generation for testing pipelines. Users can define their own metrics by extending the Metric class. The tool allows running evaluation on a pipeline defined with modules and corresponding metrics. Additionally, it provides synthetic data generation capabilities to create user interaction data for evaluation or training purposes.

github

: 461

swe-rl

SWE-RL is the official codebase for the paper 'SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution'. It is the first approach to scale reinforcement learning based LLM reasoning for real-world software engineering, leveraging open-source software evolution data and rule-based rewards. The code provides prompt templates and the implementation of the reward function based on sequence similarity. Agentless Mini, a part of SWE-RL, builds on top of Agentless with improvements like fast async inference, code refactoring for scalability, and support for using multiple reproduction tests for reranking. The tool can be used for localization, repair, and reproduction test generation in software engineering tasks.

github

: 244

req_llm

ReqLLM is a Req-based library for LLM interactions, offering a unified interface to AI providers through a plugin-based architecture. It brings composability and middleware advantages to LLM interactions, with features like auto-synced providers/models, typed data structures, ergonomic helpers, streaming capabilities, usage & cost extraction, and a plugin-based provider system. Users can easily generate text, structured data, embeddings, and track usage costs. The tool supports various AI providers like Anthropic, OpenAI, Groq, Google, and xAI, and allows for easy addition of new providers. ReqLLM also provides API key management, detailed documentation, and a roadmap for future enhancements.

github

: 388

curator

Bespoke Curator is an open-source tool for data curation and structured data extraction. It provides a Python library for generating synthetic data at scale, with features like programmability, performance optimization, caching, and integration with HuggingFace Datasets. The tool includes a Curator Viewer for dataset visualization and offers a rich set of functionalities for creating and refining data generation strategies.

github

: 1.2k

plexe

Plexe is a tool that allows users to create machine learning models by describing them in plain language. Users can explain their requirements, provide a dataset, and the AI-powered system will build a fully functional model through an automated agentic approach. It supports multiple AI agents and model building frameworks like XGBoost, CatBoost, and Keras. Plexe also provides Docker images with pre-configured environments, YAML configuration for customization, and support for multiple LiteLLM providers. Users can visualize experiment results using the built-in Streamlit dashboard and extend Plexe's functionality through custom integrations.

github

: 2.5k

langrila

Langrila is a library that provides an easy way to use API-based LLM (Large Language Models) with an emphasis on simple architecture for readability. It supports various AI models for chat and embedding tasks, as well as retrieval functionalities using Qdrant, Chroma, and Usearch. Langrila also includes modules for function calling, conversation memory management, and prompt templates. It enforces coding policies for simplicity, responsibility independence, and minimum module implementation. The library requires Python version 3.10 to 3.13 and additional dependencies like OpenAI, Gemini, Qdrant, Chroma, and Usearch for specific functionalities.

github

: 62

WordLlama

WordLlama is a fast, lightweight NLP toolkit optimized for CPU hardware. It recycles components from large language models to create efficient word representations. It offers features like Matryoshka Representations, low resource requirements, binarization, and numpy-only inference. The tool is suitable for tasks like semantic matching, fuzzy deduplication, ranking, and clustering, making it a good option for NLP-lite tasks and exploratory analysis.

github

: 1.4k

For similar tasks

EasySteer

github

: 172

ai-on-gke

This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

github

: 280

ray

Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

github

: 41.3k

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 135

djl

Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.

github

: 4.1k

mojo

Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

github

: 23.0k

tt-metal

TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

github

: 1.4k

burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

github

: 10.2k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675