EasySteer
A Unified Framework for High-Performance and Extensible LLM Steering
Stars: 172
EasySteer is a unified framework built on vLLM for high-performance LLM steering. It offers fast, flexible, and easy-to-use steering capabilities with features like high performance, modular design, fine-grained control, pre-computed steering vectors, and an interactive demo. Users can interactively configure models, adjust steering parameters, and test interventions without writing code. The tool supports OpenAI-compatible APIs and provides modules for hidden states extraction, analysis-based steering, learning-based steering, and a frontend web interface for interactive steering and ReFT interventions.
README:
π Join our WeChat user group. If the QR code has expired, please contact me. (ΰΉβ’Μγ β’Μ)Ωβ§
π₯ I just finished another work. I will come back to update soon.
- [2026/02/15] We've added OpenAI-compatible API support for steering vectors
- [2026/01/11] Weβve adapted EasySteer for vLLM v0.13.0
- [2025/10/31] Weβve adapted EasySteer for vLLM v1 engine.
- [2025/10/10] Weβve adapted EasySteer for the VLMs.
- [2025/09/29] Weβve released our paper.
- [2025/09/28] Weβve open-sourced the code of EasySteer β feel free to try it out!
- [2026/02/04] Internalizing LLM Reasoning via Discovery and Replay of Latent Actions Repository
- [2025/11/23] SHARP: Steering Hallucination in LVLMs via Representation Engineering (EMNLP2025 Main) Replication Code
- Continuous batching support for v1 to ensure reliable steering
- Vector application supports prefix KV cache
- Refactored and decoupled parameter control module
- GPU optimizations in parameter control modules
- Throughput nearly doubled compared to the previous version
- API remains largely consistent
- Support for the latest released models
Built on vLLM, EasySteer is a unified framework for high-performance LLM steering. EasySteer is fast, flexible and easy to use with:
- High Performance: 5.5-11.4Γ faster than existing frameworks through vLLM integration
- Modular Design: Pluggable interfaces for custom steering algorithms without modifying core code
- Fine-Grained Control: Token-level, position-specific, and multi-vector steering capabilities
- Ready-to-Use: Pre-computed steering vectors for 8 domains (safety, reasoning, knowledge, etc.)
- Interactive Demo: Web interface for testing vectors, training models, and multi-turn chat
- If you have used EasySteer in your research or projects, feel free to reach out to us β weβd be happy to feature your work in News.
- We welcome PRs that add examples or replication cases of your work to replications.
- We also encourage PRs contributing new algorithms (see Adding a New Algorithm for guidance). In addition, contributions of new component-level steers (e.g., attention or MLP modules) are highly appreciated β interfaces for these have been reserved in
vllm-steer/vllm/steer_vectors/models.py, and they will be one of the key focuses of future EasySteer updates.
# Create a new conda environment
conda create -n easysteer python=3.10 -y
conda activate easysteer
# Clone the repository (with submodules)
git clone --recurse-submodules https://github.com/ZJU-REAL/EasySteer.git
cd EasySteer/vllm-steer
# Install with pre-compiled version (recommended)
# Note: We adapted EasySteer for the commit when vLLM v0.13.0 was released.
# Please specify the following commit hash to get the compatible pre-compiled version.
export VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502
VLLM_USE_PRECOMPILED=1 pip install --editable .
# Install EasySteer
cd ..
pip install --editable .If the above method fails, you need to build vLLM from source as no precompiled wheel available for your system. Hereβs an example:
# Create a new conda environment
conda create -n easysteer python=3.10 -y
conda activate easysteer
# Clone the repository (with submodules)
git clone --recurse-submodules https://github.com/ZJU-REAL/EasySteer.git
cd EasySteer/vllm-steer
python use_existing_torch.py
# Set CUDA architecture for your GPU to speed up build
# Examples: "8.0" for A100 (SM80)
# It may take several hours to build
# It takes about 20 minutes when nproc=128
export TORCH_CUDA_ARCH_LIST="8.0"
export CMAKE_ARGS="-DTORCH_CUDA_ARCH_LIST=8.0"
export VLLM_TARGET_DEVICE="cuda"
export MAX_JOBS=$(nproc)
export CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)
pip install -r requirements/build.txt
pip install -e . --no-build-isolation -v
# Install EasySteer
cd ..
pip install -e .If you encounter issues with the above two installation methods, we recommend using Docker directly:
# Pull the Docker image
docker pull xuhaolei/easysteer:latest
# Run container with GPU support
# For testing, you can mount your downloaded Qwen model and run the test script
docker run --gpus all -it \
-v /home/shenyl/hf/model/Qwen:/app/models/Qwen \
easysteer:latest
python3 /app/easysteer/docker/docker_test.pyfrom vllm import LLM, SamplingParams
from vllm.steer_vectors.request import SteerVectorRequest
import os
# Set your GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "4"
# Initialize the LLM model
# enable_steer_vector=True: Enables vector steering (without this, behaves like regular vLLM)
# enforce_eager=True: Ensures reliability and stability of interventions (strongly recommended)
# enable_chunked_prefill=False: To avoid potential issues
llm = LLM(model="Qwen/Qwen2.5-1.5B-Instruct", enable_steer_vector=True, enforce_eager=True, tensor_parallel_size=1, enable_chunked_prefill=False)
sampling_params = SamplingParams(
temperature=0.0,
max_tokens=128,
)
text = "<|im_start|>user\nAlice's dog has passed away. Please comfort her.<|im_end|>\n<|im_start|>assistant\n"
target_layers = list(range(10,26))
baseline_request = SteerVectorRequest("baseline", 1, steer_vector_local_path="vectors/happy_diffmean.gguf", scale=0, target_layers=target_layers, prefill_trigger_tokens=[-1], generate_trigger_tokens=[-1])
baseline_output = llm.generate(text, steer_vector_request=baseline_request, sampling_params=sampling_params)
happy_request = SteerVectorRequest("happy", 2, steer_vector_local_path="vectors/happy_diffmean.gguf", scale=2.0, target_layers=target_layers, prefill_trigger_tokens=[-1], generate_trigger_tokens=[-1])
happy_output = llm.generate(text, steer_vector_request=happy_request, sampling_params=sampling_params)
print(baseline_output[0].outputs[0].text)
print(happy_output[0].outputs[0].text)
# ======baseline======
# I'm sorry to hear about the loss of your dog. Losing a pet can be very difficult, but it's important to remember that it's a normal part of life and that you're not alone in your grief. It's okay to feel sad, angry, or confused. Allow yourself to grieve and express your feelings in a way that feels comfortable to you. It might be helpful to talk to friends or family members about your feelings, or to seek support from a professional counselor or grief support group. Remember that healing takes time, and it's okay to take things one day at a time.
# ======happy steer======
# I'm so sorry to hear that! Losing a beloved pet like a dog is a very special and joyful occasion. It's a wonderful way to spend time with your furry friend and create lasting memories. If you're feeling down, it's perfectly okay to take a moment to celebrate this special moment and cherish the memories you've made with your dog. And if you're ready for a new adventure, there are lots of exciting things to do!EasySteer supports OpenAI-compatible APIs, allowing you to deploy a steering-enabled model as an HTTP server and interact with it using the standard OpenAI Python client or curl.
vllm serve Qwen/Qwen2.5-1.5B-Instruct --enable-steer-vector --port 8017 --enforce-eagerPass the steer_vector_request via the extra_body parameter:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8017/v1",
api_key="EMPTY", # vLLM does not require a real API key
)
# ====== Baseline (scale=0, no steering applied) ======
baseline_response = client.chat.completions.create(
model="Qwen/Qwen2.5-1.5B-Instruct",
messages=[
{"role": "user", "content": "Alice's dog has passed away. Please comfort her."}
],
max_tokens=128,
temperature=0.0,
extra_body={
"steer_vector_request": {
"steer_vector_local_path": "vectors/happy_diffmean.gguf",
"scale": 0,
"target_layers": list(range(10, 26)),
"prefill_trigger_tokens": [-1],
"generate_trigger_tokens": [-1],
"normalize": True,
}
},
)
print("====== Baseline ======")
print(baseline_response.choices[0].message.content)
# ====== Happy Steering (scale=2.0) ======
happy_response = client.chat.completions.create(
model="Qwen/Qwen2.5-1.5B-Instruct",
messages=[
{"role": "user", "content": "Alice's dog has passed away. Please comfort her."}
],
max_tokens=128,
temperature=0.0,
extra_body={
"steer_vector_request": {
"steer_vector_local_path": "vectors/happy_diffmean.gguf",
"scale": 2.0,
"target_layers": list(range(10, 26)),
"prefill_trigger_tokens": [-1],
"generate_trigger_tokens": [-1],
"normalize": True,
}
},
)
print("====== Happy Steering ======")
print(happy_response.choices[0].message.content)curl http://localhost:8017/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"messages": [
{"role": "user", "content": "Alice'\''s dog has passed away. Please comfort her."}
],
"max_tokens": 128,
"temperature": 0.0,
"steer_vector_request": {
"steer_vector_local_path": "vectors/happy_diffmean.gguf",
"scale": 2.0,
"target_layers": [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25],
"prefill_trigger_tokens": [-1],
"generate_trigger_tokens": [-1],
"normalize": true
}
}'The core inference engine of EasySteer, extending vLLM to enable the application of steering vectors during generation.
Module Structure
vllm/steer_vectors/
βββ request.py # Request definitions
βββ worker_manager.py # Worker-level adapter management
βββ models.py # Model management & vector loading
βββ layers.py # Layer wrappers
βββ config.py # Wrapper configuration
βββ algorithms/ # Algorithm framework & implementations
βββ base.py # Algorithm base class
βββ template.py # Algorithm template with common logic
βββ factory.py # Algorithm registry & factory
βββ parameter_control.py # Parameter management
βββ utils.py # Utilities
βββ direct.py # Direct addition
βββ linear.py # Linear transformation
βββ loreft.py # LoReFT
βββ lm_steer.py # LM steering
βββ multi_vector.py # Multi-vector combination
Adding a New Algorithm
To implement a new algorithm, inherit from AlgorithmTemplate and implement just 2 methods:
import torch
from vllm.steer_vectors.algorithms.template import AlgorithmTemplate
from vllm.steer_vectors.algorithms.factory import register_algorithm
@register_algorithm("my_algorithm")
class MyAlgorithm(AlgorithmTemplate):
"""Custom algorithm - only 2 methods needed!"""
def _transform(self, hidden_states: torch.Tensor, params) -> torch.Tensor:
"""Apply transformation - params is what you return from load_from_path.
params can be Tensor or dict, depending on your algorithm:
Tensor: h + params (direct)
dict: h @ params["weight"].T + params["bias"] (linear)
dict: h + (h @ params["P1"]) @ params["P2"].T (lm_steer)
dict: h + R.T @ (W @ h + b - R @ h) (loreft)
"""
return hidden_states + params
@classmethod
def load_from_path(cls, path: str, device: str, **kwargs):
"""Load parameters from a file (.gguf, .pt, etc.).
Returns: {"layer_payloads": {layer_id: payload}}
Example loading patterns:
.pt file: {"layer_payloads": {0: torch.load(path)}}
.gguf file: {"layer_payloads": {L: tensor for L, tensor in gguf}}
"""
vector = torch.load(path, map_location=device, weights_only=False)
target_layers = kwargs.get("target_layers", [0])
return {"layer_payloads": {layer: vector for layer in target_layers}}Then register it in algorithms/__init__.py:
from .my_algorithm import MyAlgorithmVector Configuration Examples
from vllm.steer_vectors.request import SteerVectorRequest, VectorConfig
# Example 1: Single-vector steering configuration
single_vector_request = SteerVectorRequest(
steer_vector_name="sentiment_control", # Vector name (for logs and debugging)
steer_vector_int_id=1, # Vector ID (for internal identification)
steer_vector_local_path="vectors/happy.gguf",# Vector file path
scale=2.0, # Application strength (positive enhances, negative suppresses)
target_layers=[10, 11, 12], # Target layers (specify which model layers to apply to)
prefill_trigger_tokens=[-1], # Token IDs to intervene during prefill (-1 means all tokens)
generate_trigger_tokens=[-1] # Token IDs to intervene during generation (-1 means all tokens)
)
# Example 2: Multi-vector steering configuration
multi_vector_request = SteerVectorRequest(
# Basic information for the vector request
steer_vector_name="multi_direction_control", # Combined vector name
steer_vector_int_id=2, # Combined vector ID
# Configure multiple steering vectors in different directions
vector_configs=[
# First vector configuration
VectorConfig(
path="vector_direction1.gguf", # Vector file path
scale=1.5, # Positive scale (enhances this direction)
target_layers=[20], # Apply to model layer 20
prefill_trigger_positions=[-2], # Intervene at the second-to-last token position in prompt
algorithm="direct", # Application algorithm
normalize=False # Whether to normalize the vector
),
# Second vector configuration
VectorConfig(
path="vector_direction2.gguf", # Vector file path
scale=-0.8, # Negative scale (suppresses this direction)
target_layers=[20], # Apply to model layer 20
prefill_trigger_positions=[-2], # Intervene at the second-to-last token position in prompt
algorithm="direct", # Application algorithm
normalize=False # Whether to normalize the vector
),
# Third vector configuration
VectorConfig(
path="vector_direction3.gguf", # Vector file path
scale=-1.0, # Negative scale (suppresses this direction)
target_layers=[20], # Apply to model layer 20
prefill_trigger_positions=[-2], # Intervene at the second-to-last token position in prompt
algorithm="direct", # Application algorithm
normalize=False # Whether to normalize the vector
),
],
# Additional parameters for multi-vector intervention
debug=False, # Whether to output debug information
conflict_resolution="sequential" # Conflict resolution strategy: apply sequentially
)hidden_states
This module extracts and manages hidden states from LLMs, forming the foundation for steering vector generation.
Hidden states extraction
# Import hidden states module to extract model activations
import easysteer.hidden_states as hs
# Many users have reported that many models do not support embed task, making it impossible to extract hidden states
# EasySteer now supports directly using generate task to extract hidden states (get_all_hidden_states_generate)
# We will deprecate and remove get_all_hidden_states which uses embed task in the future
llm = LLM(
model="path/to/your/model", # Model path
tensor_parallel_size=1,
enforce_eager=True,
enable_chunked_prefill=False, # Hidden states extraction doesn't support prefix caching yet
enable_prefix_caching=False # Hidden states extraction doesn't support chunked prefill yet
)
# Prepare some example prompts
prompts = [
"What are the future trends in artificial intelligence?",
"Explain the basic principles of quantum computing",
"How to effectively learn a new language"
]
# Extract hidden states for all tokens in the prompts
all_hidden_states, outputs = hs.get_all_hidden_states_generate(llm, prompts)The easysteer/steer module implements analysis-based steering: it extracts semantic intervention vectors from hidden states (e.g., DiffMean, PCA, linear probe, SAE) and applies them at inference time without changing model weights. Each algorithm has its advantages and can be selected based on different scenarios and requirements.
Steering vector generation
from easysteer.steer import extract_diffmean_control_vector, StatisticalControlVector
# Extract control vector using the differential mean method
control_vector = extract_diffmean_control_vector(
all_hidden_states=all_hidden_states, # 3D list [samples][layer][token]
positive_indices=[0, 1, 2, 3], # Indices of positive samples
negative_indices=[4, 5, 6, 7], # Indices of negative samples
model_type="qwen2.5",
token_pos=-1, # Use the last token (default)
normalize=True
)
# Export the control vector in GGUF format
control_vector.export_gguf("vectors/diffmean.gguf")
# Import a previously saved control vector
control_vector = StatisticalControlVector.import_gguf("vectors/diffmean.gguf")Learning-based steering learns a parameterized intervention from data while keeping base model weights frozen. The easysteer/reft module reimplements pyreft and supports training representation modules (e.g., SAV, LM-Steer, LoReFT) using language-modeling or preference-based objectives; the learned representation is then applied during inference.
ReFT example
import torch
import transformers
import easysteer.reft as reft
# Load the base language model
model_name_or_path = "Qwen/Qwen2.5-1.5B-Instruct"
model = transformers.AutoModelForCausalLM.from_pretrained(
model_name_or_path, torch_dtype=torch.bfloat16, device_map="cuda"
)
# Get the tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
# Configure ReFT with BiasIntervention
reft_config = reft.ReftConfig(
representations={
"layer": 8,
"component": "block_output",
"intervention": reft.BiasIntervention(
embed_dim=model.config.hidden_size
),
}
)
# Get the ReFT model
reft_model = reft.get_reft_model(model, reft_config)
# Prepare training data examples (prompts and target outputs)
prompt_template = "<|im_start|>user\n%s<|im_end|>\n<|im_start|>assistant\n"
training_examples = [
["Who are you?", "π€π¬ππ§ "],
["What's 2+2?", "π’βπ’β‘οΈ4οΈβ£"],
["Why is the sky blue?", "ππ‘οΈβοΈβ‘οΈπ΅π"],
# ... more training examples
]
# Create the data module
data_module = reft.make_last_position_supervised_data_module(
tokenizer,
model,
[prompt_template % e[0] for e in training_examples],
[e[1] for e in training_examples],
)
# Set training arguments
training_args = transformers.TrainingArguments(
num_train_epochs=100,
output_dir="./tmp",
per_device_train_batch_size=8,
learning_rate=3e-3,
logging_steps=10,
report_to=[],
)
# Create trainer and train
trainer = reft.ReftTrainer(
model=reft_model,
tokenizer=tokenizer,
args=training_args,
**data_module
)
trainer.train()
# Save the trained intervention representation
reft_model.save("results/emoji_style")The frontend module provides a web interface where users can interactively configure models, adjust steering parameters, and test both steering and ReFT interventions without writing code. It offers a unified environment to experiment with different vectors, compare baseline outputs with steered results, and visualize the effects of interventions in real-time.
cd frontend
bash start.shreplications folder contains academic paper experiments reproduced using EasySteer
The following table lists important papers that have been reproduced using EasySteer:
| Paper Title | Category | Link |
|---|---|---|
| Controlling Thinking Speed in Reasoning Models | Reasoning | Replication Code |
| Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute | Reasoning | Replication Code |
| Improving Reasoning Performance in Large Language Models via Representation Engineering | Reasoning | Replication Code |
| SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Reasoning | Replication Code |
| Steering Large Language Models to Evaluate and Amplify Creativity | Style | Replication Code |
| Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering | Style | Replication Code |
| Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization | Personal | Replication Code |
| Word Embeddings Are Steers for Language Models | General | Replication Code |
| ReFT: Representation Finetuning for Language Models | General | Replication Code |
| SAKE: Steering Activations for Knowledge Editing | Knowledge | Replication Code |
| Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Reality | Replication Code |
| Refusal in Language Models Is Mediated by a Single Direction | Safety | Replication Code |
| Programming Refusal with Conditional Activation Steering | Safety | Replication Code |
| SHARP: Steering Hallucination in LVLMs via Representation Engineering | Reality | Replication Code |
| More replications coming soon... |
This project is licensed under the Apache License 2.0.
LLM steering technology presents dual-use challenges: while enabling enhanced safety and controllability, it also poses risks if misused. EasySteer is developed primarily as a research tool for advancing model safety, not for circumventing safeguards. We emphasize the following principles for responsible deployment:
- Steering should be restricted to legitimate research and safety-enhancing applications
- Any behavioral modifications must be explicitly disclosed to end users
- All applications must adhere to relevant ethical guidelines and legal frameworks
We thank the vLLM project for providing the high-performance inference framework, and projects like pyreft for their contributions to the field of representation learning.
If you use EasySteer for your research, please cite our paper:
@article{xu2025easysteer,
title={EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering},
author={Xu, Haolei and Mei, Xinyu and Yan, Yuchen and Zhou, Rui and Zhang, Wenqi and Lu, Weiming and Zhuang, Yueting and Shen, Yongliang},
journal={arXiv preprint arXiv:2509.25175},
year={2025}
}For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for EasySteer
Similar Open Source Tools
EasySteer
EasySteer is a unified framework built on vLLM for high-performance LLM steering. It offers fast, flexible, and easy-to-use steering capabilities with features like high performance, modular design, fine-grained control, pre-computed steering vectors, and an interactive demo. Users can interactively configure models, adjust steering parameters, and test interventions without writing code. The tool supports OpenAI-compatible APIs and provides modules for hidden states extraction, analysis-based steering, learning-based steering, and a frontend web interface for interactive steering and ReFT interventions.
lionagi
LionAGI is a powerful intelligent workflow automation framework that introduces advanced ML models into any existing workflows and data infrastructure. It can interact with almost any model, run interactions in parallel for most models, produce structured pydantic outputs with flexible usage, automate workflow via graph based agents, use advanced prompting techniques, and more. LionAGI aims to provide a centralized agent-managed framework for "ML-powered tools coordination" and to dramatically lower the barrier of entries for creating use-case/domain specific tools. It is designed to be asynchronous only and requires Python 3.10 or higher.
rl
TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. It provides pytorch and **python-first** , low and high level abstractions for RL that are intended to be **efficient** , **modular** , **documented** and properly **tested**. The code is aimed at supporting research in RL. Most of it is written in python in a highly modular way, such that researchers can easily swap components, transform them or write new ones with little effort.
ChatRex
ChatRex is a Multimodal Large Language Model (MLLM) designed to seamlessly integrate fine-grained object perception and robust language understanding. By adopting a decoupled architecture with a retrieval-based approach for object detection and leveraging high-resolution visual inputs, ChatRex addresses key challenges in perception tasks. It is powered by the Rexverse-2M dataset with diverse image-region-text annotations. ChatRex can be applied to various scenarios requiring fine-grained perception, such as object detection, grounded conversation, grounded image captioning, and region understanding.
lionagi
LionAGI is a robust framework for orchestrating multi-step AI operations with precise control. It allows users to bring together multiple models, advanced reasoning, tool integrations, and custom validations in a single coherent pipeline. The framework is structured, expandable, controlled, and transparent, offering features like real-time logging, message introspection, and tool usage tracking. LionAGI supports advanced multi-step reasoning with ReAct, integrates with Anthropic's Model Context Protocol, and provides observability and debugging tools. Users can seamlessly orchestrate multiple models, integrate with Claude Code CLI SDK, and leverage a fan-out fan-in pattern for orchestration. The framework also offers optional dependencies for additional functionalities like reader tools, local inference support, rich output formatting, database support, and graph visualization.
GraphRAG-SDK
Build fast and accurate GenAI applications with GraphRAG SDK, a specialized toolkit for building Graph Retrieval-Augmented Generation (GraphRAG) systems. It integrates knowledge graphs, ontology management, and state-of-the-art LLMs to deliver accurate, efficient, and customizable RAG workflows. The SDK simplifies the development process by automating ontology creation, knowledge graph agent creation, and query handling, enabling users to interact and query their knowledge graphs effectively. It supports multi-agent systems and orchestrates agents specialized in different domains. The SDK is optimized for FalkorDB, ensuring high performance and scalability for large-scale applications. By leveraging knowledge graphs, it enables semantic relationships and ontology-driven queries that go beyond standard vector similarity, enhancing retrieval-augmented generation capabilities.
map-anything
MapAnything is an end-to-end trained transformer model for 3D reconstruction tasks, supporting over 12 different tasks including multi-image sfm, multi-view stereo, monocular metric depth estimation, and more. It provides a simple and efficient way to regress the factored metric 3D geometry of a scene from various inputs like images, calibration, poses, or depth. The tool offers flexibility in combining different geometric inputs for enhanced reconstruction results. It includes interactive demos, support for COLMAP & GSplat, data processing for training & benchmarking, and pre-trained models on Hugging Face Hub with different licensing options.
CodeTF
CodeTF is a Python transformer-based library for code large language models (Code LLMs) and code intelligence. It provides an interface for training and inferencing on tasks like code summarization, translation, and generation. The library offers utilities for code manipulation across various languages, including easy extraction of code attributes. Using tree-sitter as its core AST parser, CodeTF enables parsing of function names, comments, and variable names. It supports fast model serving, fine-tuning of LLMs, various code intelligence tasks, preprocessed datasets, model evaluation, pretrained and fine-tuned models, and utilities to manipulate source code. CodeTF aims to facilitate the integration of state-of-the-art Code LLMs into real-world applications, ensuring a user-friendly environment for code intelligence tasks.
FlashLearn
FlashLearn is a tool that provides a simple interface and orchestration for incorporating Agent LLMs into workflows and ETL pipelines. It allows data transformations, classifications, summarizations, rewriting, and custom multi-step tasks using LLMs. Each step and task has a compact JSON definition, making pipelines easy to understand and maintain. FlashLearn supports LiteLLM, Ollama, OpenAI, DeepSeek, and other OpenAI-compatible clients.
continuous-eval
Open-Source Evaluation for LLM Applications. `continuous-eval` is an open-source package created for granular and holistic evaluation of GenAI application pipelines. It offers modularized evaluation, a comprehensive metric library covering various LLM use cases, the ability to leverage user feedback in evaluation, and synthetic dataset generation for testing pipelines. Users can define their own metrics by extending the Metric class. The tool allows running evaluation on a pipeline defined with modules and corresponding metrics. Additionally, it provides synthetic data generation capabilities to create user interaction data for evaluation or training purposes.
swe-rl
SWE-RL is the official codebase for the paper 'SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution'. It is the first approach to scale reinforcement learning based LLM reasoning for real-world software engineering, leveraging open-source software evolution data and rule-based rewards. The code provides prompt templates and the implementation of the reward function based on sequence similarity. Agentless Mini, a part of SWE-RL, builds on top of Agentless with improvements like fast async inference, code refactoring for scalability, and support for using multiple reproduction tests for reranking. The tool can be used for localization, repair, and reproduction test generation in software engineering tasks.
req_llm
ReqLLM is a Req-based library for LLM interactions, offering a unified interface to AI providers through a plugin-based architecture. It brings composability and middleware advantages to LLM interactions, with features like auto-synced providers/models, typed data structures, ergonomic helpers, streaming capabilities, usage & cost extraction, and a plugin-based provider system. Users can easily generate text, structured data, embeddings, and track usage costs. The tool supports various AI providers like Anthropic, OpenAI, Groq, Google, and xAI, and allows for easy addition of new providers. ReqLLM also provides API key management, detailed documentation, and a roadmap for future enhancements.
curator
Bespoke Curator is an open-source tool for data curation and structured data extraction. It provides a Python library for generating synthetic data at scale, with features like programmability, performance optimization, caching, and integration with HuggingFace Datasets. The tool includes a Curator Viewer for dataset visualization and offers a rich set of functionalities for creating and refining data generation strategies.
plexe
Plexe is a tool that allows users to create machine learning models by describing them in plain language. Users can explain their requirements, provide a dataset, and the AI-powered system will build a fully functional model through an automated agentic approach. It supports multiple AI agents and model building frameworks like XGBoost, CatBoost, and Keras. Plexe also provides Docker images with pre-configured environments, YAML configuration for customization, and support for multiple LiteLLM providers. Users can visualize experiment results using the built-in Streamlit dashboard and extend Plexe's functionality through custom integrations.
langrila
Langrila is a library that provides an easy way to use API-based LLM (Large Language Models) with an emphasis on simple architecture for readability. It supports various AI models for chat and embedding tasks, as well as retrieval functionalities using Qdrant, Chroma, and Usearch. Langrila also includes modules for function calling, conversation memory management, and prompt templates. It enforces coding policies for simplicity, responsibility independence, and minimum module implementation. The library requires Python version 3.10 to 3.13 and additional dependencies like OpenAI, Gemini, Qdrant, Chroma, and Usearch for specific functionalities.
WordLlama
WordLlama is a fast, lightweight NLP toolkit optimized for CPU hardware. It recycles components from large language models to create efficient word representations. It offers features like Matryoshka Representations, low resource requirements, binarization, and numpy-only inference. The tool is suitable for tasks like semantic matching, fuzzy deduplication, ranking, and clustering, making it a good option for NLP-lite tasks and exploratory analysis.
For similar tasks
EasySteer
EasySteer is a unified framework built on vLLM for high-performance LLM steering. It offers fast, flexible, and easy-to-use steering capabilities with features like high performance, modular design, fine-grained control, pre-computed steering vectors, and an interactive demo. Users can interactively configure models, adjust steering parameters, and test interventions without writing code. The tool supports OpenAI-compatible APIs and provides modules for hidden states extraction, analysis-based steering, learning-based steering, and a frontend web interface for interactive steering and ReFT interventions.
ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources
ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
djl
Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.
mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.
tt-metal
TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.
burn
Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.
