agents

A collection of production-ready subagents for Claude Code

Stars: 13451

Visit

The 'agents' repository is a comprehensive collection of 83 specialized AI subagents for Claude Code, providing domain-specific expertise across software development, infrastructure, and business operations. Each subagent incorporates current industry best practices, production-ready patterns, deep domain expertise, modern technology stacks, and optimized model selection based on task complexity.

README:

Claude Code Subagents Collection

A comprehensive collection of 83 specialized AI subagents for Claude Code, providing domain-specific expertise across software development, infrastructure, and business operations.

Overview

This repository provides production-ready subagents that extend Claude Code's capabilities with specialized knowledge. Each subagent incorporates:

Current industry best practices and standards (2024/2025)
Production-ready patterns and enterprise architectures
Deep domain expertise with 8-12 capability areas per agent
Modern technology stacks and frameworks
Optimized model selection based on task complexity

Agent Categories

Architecture & System Design

Core Architecture

Agent	Model	Description
backend-architect	opus	RESTful API design, microservice boundaries, database schemas
frontend-developer	sonnet	React components, responsive layouts, client-side state management
graphql-architect	opus	GraphQL schemas, resolvers, federation architecture
architect-reviewer	opus	Architectural consistency analysis and pattern validation
cloud-architect	opus	AWS/Azure/GCP infrastructure design and cost optimization
hybrid-cloud-architect	opus	Multi-cloud strategies across cloud and on-premises environments
kubernetes-architect	opus	Cloud-native infrastructure with Kubernetes and GitOps

UI/UX & Mobile

Agent	Model	Description
ui-ux-designer	sonnet	Interface design, wireframes, design systems
ui-visual-validator	sonnet	Visual regression testing and UI verification
mobile-developer	sonnet	React Native and Flutter application development
ios-developer	sonnet	Native iOS development with Swift/SwiftUI
flutter-expert	sonnet	Advanced Flutter development with state management

Programming Languages

Systems & Low-Level

Agent	Model	Description
c-pro	sonnet	System programming with memory management and OS interfaces
cpp-pro	sonnet	Modern C++ with RAII, smart pointers, STL algorithms
rust-pro	sonnet	Memory-safe systems programming with ownership patterns
golang-pro	sonnet	Concurrent programming with goroutines and channels

Web & Application

Agent	Model	Description
javascript-pro	sonnet	Modern JavaScript with ES6+, async patterns, Node.js
typescript-pro	sonnet	Advanced TypeScript with type systems and generics
python-pro	sonnet	Python development with advanced features and optimization
ruby-pro	sonnet	Ruby with metaprogramming, Rails patterns, gem development
php-pro	sonnet	Modern PHP with frameworks and performance optimization

Enterprise & JVM

Agent	Model	Description
java-pro	sonnet	Modern Java with streams, concurrency, JVM optimization
scala-pro	sonnet	Enterprise Scala with functional programming and distributed systems
csharp-pro	sonnet	C# development with .NET frameworks and patterns

Specialized Platforms

Agent	Model	Description
elixir-pro	sonnet	Elixir with OTP patterns and Phoenix frameworks
unity-developer	sonnet	Unity game development and optimization
minecraft-bukkit-pro	sonnet	Minecraft server plugin development
sql-pro	sonnet	Complex SQL queries and database optimization

Infrastructure & Operations

DevOps & Deployment

Agent	Model	Description
devops-troubleshooter	sonnet	Production debugging, log analysis, deployment troubleshooting
deployment-engineer	sonnet	CI/CD pipelines, containerization, cloud deployments
terraform-specialist	opus	Infrastructure as Code with Terraform modules and state management
dx-optimizer	sonnet	Developer experience optimization and tooling improvements

Database Management

Agent	Model	Description
database-optimizer	opus	Query optimization, index design, migration strategies
database-admin	sonnet	Database operations, backup, replication, monitoring

Incident Response & Network

Agent	Model	Description
incident-responder	opus	Production incident management and resolution
network-engineer	sonnet	Network debugging, load balancing, traffic analysis

Quality Assurance & Security

Code Quality & Review

Agent	Model	Description
code-reviewer	opus	Code review with security focus and production reliability
security-auditor	opus	Vulnerability assessment and OWASP compliance
backend-security-coder	opus	Secure backend coding practices, API security implementation
frontend-security-coder	opus	XSS prevention, CSP implementation, client-side security
mobile-security-coder	opus	Mobile security patterns, WebView security, biometric auth
architect-reviewer	opus	Architectural consistency and pattern validation

Testing & Debugging

Agent	Model	Description
test-automator	sonnet	Comprehensive test suite creation (unit, integration, e2e)
tdd-orchestrator	sonnet	Test-Driven Development methodology guidance
debugger	sonnet	Error resolution and test failure analysis
error-detective	sonnet	Log analysis and error pattern recognition

Performance & Observability

Agent	Model	Description
performance-engineer	opus	Application profiling and optimization
observability-engineer	opus	Production monitoring, distributed tracing, SLI/SLO management
search-specialist	haiku	Advanced web research and information synthesis

Data & AI

Data Engineering & Analytics

Agent	Model	Description
data-scientist	opus	Data analysis, SQL queries, BigQuery operations
data-engineer	sonnet	ETL pipelines, data warehouses, streaming architectures

Machine Learning & AI

Agent	Model	Description
ai-engineer	opus	LLM applications, RAG systems, prompt pipelines
ml-engineer	opus	ML pipelines, model serving, feature engineering
mlops-engineer	opus	ML infrastructure, experiment tracking, model registries
prompt-engineer	opus	LLM prompt optimization and engineering

Documentation & Technical Writing

Agent	Model	Description
docs-architect	opus	Comprehensive technical documentation generation
api-documenter	sonnet	OpenAPI/Swagger specifications and developer docs
reference-builder	haiku	Technical references and API documentation
tutorial-engineer	sonnet	Step-by-step tutorials and educational content
mermaid-expert	sonnet	Diagram creation (flowcharts, sequences, ERDs)

Business & Operations

Business Analysis & Finance

Agent	Model	Description
business-analyst	sonnet	Metrics analysis, reporting, KPI tracking
quant-analyst	opus	Financial modeling, trading strategies, market analysis
risk-manager	sonnet	Portfolio risk monitoring and management

Marketing & Sales

Agent	Model	Description
content-marketer	sonnet	Blog posts, social media, email campaigns
sales-automator	haiku	Cold emails, follow-ups, proposal generation

Support & Legal

Agent	Model	Description
customer-support	sonnet	Support tickets, FAQ responses, customer communication
hr-pro	opus	HR operations, policies, employee relations
legal-advisor	opus	Privacy policies, terms of service, legal documentation

Specialized Domains

Agent	Model	Description
blockchain-developer	sonnet	Web3 apps, smart contracts, DeFi protocols
payment-integration	sonnet	Payment processor integration (Stripe, PayPal)
legacy-modernizer	sonnet	Legacy code refactoring and modernization
context-manager	haiku	Multi-agent context management

SEO & Content Optimization

Agent	Model	Description
seo-content-auditor	sonnet	Content quality analysis, E-E-A-T signals assessment
seo-meta-optimizer	haiku	Meta title and description optimization
seo-keyword-strategist	haiku	Keyword analysis and semantic variations
seo-structure-architect	haiku	Content structure and schema markup
seo-snippet-hunter	haiku	Featured snippet formatting
seo-content-refresher	haiku	Content freshness analysis
seo-cannibalization-detector	haiku	Keyword overlap detection
seo-authority-builder	sonnet	E-E-A-T signal analysis
seo-content-writer	sonnet	SEO-optimized content creation
seo-content-planner	haiku	Content planning and topic clusters

Model Configuration

Agents are assigned to specific Claude models based on task complexity and computational requirements. The system uses three model tiers:

Model Distribution Summary

Model	Agent Count	Use Case
Haiku	11	Quick, focused tasks with minimal computational overhead
Sonnet	46	Standard development and specialized engineering tasks
Opus	22	Complex reasoning, architecture, and critical analysis

Haiku Model Agents

Category	Agents
Context & Reference	`context-manager`, `reference-builder`, `sales-automator`, `search-specialist`
SEO Optimization	`seo-meta-optimizer`, `seo-keyword-strategist`, `seo-structure-architect`, `seo-snippet-hunter`, `seo-content-refresher`, `seo-cannibalization-detector`, `seo-content-planner`

Sonnet Model Agents

Category	Count	Agents
Programming Languages	18	All language-specific agents (JavaScript, Python, Java, C++, etc.)
Frontend & UI	5	`frontend-developer`, `ui-ux-designer`, `ui-visual-validator`, `mobile-developer`, `ios-developer`
Infrastructure	8	`devops-troubleshooter`, `deployment-engineer`, `dx-optimizer`, `database-admin`, `network-engineer`, `flutter-expert`, `api-documenter`, `tutorial-engineer`
Quality & Testing	4	`test-automator`, `tdd-orchestrator`, `debugger`, `error-detective`
Business & Support	6	`business-analyst`, `risk-manager`, `content-marketer`, `customer-support`, `mermaid-expert`, `legacy-modernizer`
Data & Content	5	`data-engineer`, `payment-integration`, `seo-content-auditor`, `seo-authority-builder`, `seo-content-writer`

Opus Model Agents

Category	Count	Agents
Architecture & Design	7	`architect-reviewer`, `backend-architect`, `cloud-architect`, `hybrid-cloud-architect`, `kubernetes-architect`, `graphql-architect`, `terraform-specialist`
Critical Analysis	6	`code-reviewer`, `security-auditor`, `performance-engineer`, `observability-engineer`, `incident-responder`, `database-optimizer`
AI/ML Complex	5	`ai-engineer`, `ml-engineer`, `mlops-engineer`, `data-scientist`, `prompt-engineer`
Business Critical	4	`docs-architect`, `hr-pro`, `legal-advisor`, `quant-analyst`

Installation

Clone the repository to the Claude agents directory:

cd ~/.claude
git clone https://github.com/wshobson/agents.git

The subagents will be automatically available to Claude Code once placed in the ~/.claude/agents/ directory.

Usage

Automatic Delegation

Claude Code automatically selects the appropriate subagent based on task context and requirements. The system analyzes your request and delegates to the most suitable specialist.

Explicit Invocation

Specify a subagent by name to use a particular specialist:

"Use code-reviewer to analyze the recent changes"
"Have security-auditor scan for vulnerabilities"
"Get performance-engineer to optimize this bottleneck"

Usage Examples

Code Quality & Security

code-reviewer: Analyze component for best practices
security-auditor: Check for OWASP compliance
tdd-orchestrator: Implement feature with test-first approach
performance-engineer: Profile and optimize bottlenecks

Development & Architecture

backend-architect: Design authentication API
frontend-developer: Create responsive dashboard
graphql-architect: Design federated GraphQL schema
mobile-developer: Build cross-platform mobile app

Infrastructure & Operations

devops-troubleshooter: Analyze production logs
cloud-architect: Design scalable AWS architecture
network-engineer: Debug SSL certificate issues
database-admin: Configure backup and replication
terraform-specialist: Write infrastructure modules

Data & Machine Learning

data-scientist: Analyze customer behavior dataset
ai-engineer: Build RAG system for document search
mlops-engineer: Set up experiment tracking
ml-engineer: Deploy model to production

Business & Documentation

business-analyst: Create metrics dashboard
docs-architect: Generate technical documentation
api-documenter: Write OpenAPI specifications
content-marketer: Create SEO-optimized content

Multi-Agent Workflows

Subagents coordinate automatically for complex tasks. The system intelligently sequences multiple specialists based on task requirements.

Common Workflow Patterns

Feature Development

"Implement user authentication"
→ backend-architect → frontend-developer → test-automator → security-auditor

Performance Optimization

"Optimize checkout process"
→ performance-engineer → database-optimizer → frontend-developer

Production Incidents

"Debug high memory usage"
→ incident-responder → devops-troubleshooter → error-detective → performance-engineer

Infrastructure Setup

"Set up disaster recovery"
→ database-admin → database-optimizer → terraform-specialist

ML Pipeline Development

"Build ML pipeline with monitoring"
→ mlops-engineer → ml-engineer → data-engineer → performance-engineer

Integration with Claude Code Commands

For sophisticated multi-agent orchestration, use the Claude Code Commands collection which provides 52 pre-built slash commands:

/full-stack-feature   # Coordinates 8+ agents for complete feature development
/incident-response    # Activates incident management workflow
/ml-pipeline         # Sets up end-to-end ML infrastructure
/security-hardening  # Implements security best practices across stack

Subagent Format

Each subagent is defined as a Markdown file with frontmatter:

---
name: subagent-name
description: Activation criteria for this subagent
model: haiku|sonnet|opus  # Optional: Model selection
tools: tool1, tool2       # Optional: Tool restrictions
---

System prompt defining the subagent's expertise and behavior

Model Selection Criteria

haiku: Simple, deterministic tasks with minimal reasoning
sonnet: Standard development and engineering tasks
opus: Complex analysis, architecture, and critical operations

Agent Orchestration Patterns

Sequential Processing

Agents execute in sequence, passing context forward:

backend-architect → frontend-developer → test-automator → security-auditor

Parallel Execution

Multiple agents work simultaneously on different aspects:

performance-engineer + database-optimizer → Merged analysis

Conditional Routing

Dynamic agent selection based on analysis:

debugger → [backend-architect | frontend-developer | devops-troubleshooter]

Validation Pipeline

Primary work followed by specialized review:

payment-integration → security-auditor → Validated implementation

Agent Selection Guide

Architecture & Planning

Task	Recommended Agent	Key Capabilities
API Design	`backend-architect`	RESTful APIs, microservices, database schemas
Cloud Infrastructure	`cloud-architect`	AWS/Azure/GCP design, scalability planning
UI/UX Design	`ui-ux-designer`	Interface design, wireframes, design systems
System Architecture	`architect-reviewer`	Pattern validation, consistency analysis

Development by Language

Language Category	Agents	Primary Use Cases
Systems Programming	`c-pro`, `cpp-pro`, `rust-pro`, `golang-pro`	OS interfaces, embedded systems, high performance
Web Development	`javascript-pro`, `typescript-pro`, `python-pro`, `ruby-pro`, `php-pro`	Full-stack web applications, APIs, scripting
Enterprise	`java-pro`, `csharp-pro`, `scala-pro`	Large-scale applications, enterprise systems
Mobile	`ios-developer`, `flutter-expert`, `mobile-developer`	Native and cross-platform mobile apps
Specialized	`elixir-pro`, `unity-developer`, `minecraft-bukkit-pro`	Domain-specific development

Operations & Infrastructure

Task	Recommended Agent	Key Capabilities
Production Issues	`devops-troubleshooter`	Log analysis, deployment debugging
Critical Incidents	`incident-responder`	Outage response, immediate mitigation
Database Performance	`database-optimizer`	Query optimization, indexing strategies
Database Operations	`database-admin`	Backup, replication, disaster recovery
Infrastructure as Code	`terraform-specialist`	Terraform modules, state management
Network Issues	`network-engineer`	Network debugging, load balancing

Quality & Security

Task	Recommended Agent	Key Capabilities
Code Review	`code-reviewer`	Security focus, best practices
Security Audit	`security-auditor`	Vulnerability scanning, OWASP compliance
Test Creation	`test-automator`	Unit, integration, E2E test suites
Performance Issues	`performance-engineer`	Profiling, optimization
Bug Investigation	`debugger`	Error resolution, root cause analysis

Data & Machine Learning

Task	Recommended Agent	Key Capabilities
Data Analysis	`data-scientist`	SQL queries, statistical analysis
LLM Applications	`ai-engineer`	RAG systems, prompt pipelines
ML Development	`ml-engineer`	Model training, feature engineering
ML Operations	`mlops-engineer`	ML infrastructure, experiment tracking

Documentation & Business

Task	Recommended Agent	Key Capabilities
Technical Docs	`docs-architect`	Comprehensive documentation generation
API Documentation	`api-documenter`	OpenAPI/Swagger specifications
Business Metrics	`business-analyst`	KPI tracking, reporting
Legal Compliance	`legal-advisor`	Privacy policies, terms of service

Best Practices

Task Delegation

Automatic selection - Let Claude Code analyze context and select optimal agents
Clear requirements - Specify constraints, tech stack, and quality standards
Trust specialization - Each agent is optimized for their specific domain

Multi-Agent Workflows

High-level requests - Allow agents to coordinate complex multi-step tasks
Context preservation - Ensure agents have necessary background information
Integration review - Verify how different agents' outputs work together

Explicit Control

Direct invocation - Specify agents when you need particular expertise
Strategic combination - Use multiple specialists for validation
Review patterns - Request specific review workflows (e.g., "security-auditor reviews API design")

Performance Optimization

Monitor effectiveness - Track which agents work best for your use cases
Iterative refinement - Use agent feedback to improve requirements
Complexity matching - Align task complexity with agent capabilities

Contributing

To add a new subagent:

Create a new .md file with appropriate frontmatter
Use lowercase, hyphen-separated naming convention
Write clear activation criteria in the description
Define comprehensive system prompt with expertise areas

Troubleshooting

Agent Not Activating

Ensure request clearly indicates the domain
Be specific about task type and requirements
Use explicit invocation if automatic selection fails

Unexpected Agent Selection

Provide more context about tech stack
Include specific requirements in request
Use direct agent naming for precise control

Conflicting Recommendations

Normal behavior - specialists have different priorities
Request reconciliation between specific agents
Consider trade-offs based on project requirements

Missing Context

Include background information in requests
Reference previous work or patterns
Provide project-specific constraints

License

MIT License - see LICENSE file for details.

Resources

For Tasks:

Click tags to check more tools for each tasks

analyze component design authentication api create responsive dashboard analyze production logs build ml pipeline

For Jobs:

software developer data scientist devops engineer quality assurance analyst technical writer

Alternative AI tools for agents

Similar Open Source Tools

agents

github

: 13.5k

commands

Production-ready slash commands for Claude Code that accelerate development through intelligent automation and multi-agent orchestration. Contains 52 commands organized into workflows and tools categories. Workflows orchestrate complex tasks with multiple agents, while tools provide focused functionality for specific development tasks. Commands can be used with prefixes for organization or flattened for convenience. Best practices include using workflows for complex tasks and tools for specific scopes, chaining commands strategically, and providing detailed context for effective usage.

github

: 774

LLM-Finetune

LLM-Finetune is a repository for fine-tuning language models for various NLP tasks such as text classification and named entity recognition. It provides instructions and scripts for training and inference using models like Qwen2-VL and GLM4. The repository also includes datasets for tasks like text classification, named entity recognition, and multimodal tasks. Users can easily prepare the environment, download datasets, train models, and perform inference using the provided scripts and notebooks. Additionally, the repository references SwanLab, an AI training record, analysis, and visualization tool.

github

: 286

hume-api-examples

This repository contains examples of how to use the Hume API with different frameworks and languages. It includes examples for Empathic Voice Interface (EVI) and Expression Measurement API. The EVI examples cover custom language models, modal, Next.js integration, Vue integration, Hume Python SDK, and React integration. The Expression Measurement API examples include models for face, language, burst, and speech, with implementations in Python and Typescript using frameworks like Next.js.

github

: 164

no-cost-ai

No-cost-ai is a repository dedicated to providing a comprehensive list of free AI models and tools for developers, researchers, and curious builders. It serves as a living index for accessing state-of-the-art AI models without any cost. The repository includes information on various AI applications such as chat interfaces, media generation, voice and music tools, AI IDEs, and developer APIs and platforms. Users can find links to free models, their limits, and usage instructions. Contributions to the repository are welcome, and users are advised to use the listed services at their own risk due to potential changes in models, limitations, and reliability of free services.

github

: 74

hcaptcha-challenger

hCaptcha Challenger is a tool designed to gracefully face hCaptcha challenges using a multimodal large language model. It does not rely on Tampermonkey scripts or third-party anti-captcha services, instead implementing interfaces for 'AI vs AI' scenarios. The tool supports various challenge types such as image labeling, drag and drop, and advanced tasks like self-supervised challenges and Agentic Workflow. Users can access documentation in multiple languages and leverage resources for tasks like model training, dataset annotation, and model upgrading. The tool aims to enhance user experience in handling hCaptcha challenges with innovative AI capabilities.

github

: 1.6k

PredictorLLM

PredictorLLM is an advanced trading agent framework that utilizes large language models to automate trading in financial markets. It includes a profiling module to establish agent characteristics, a layered memory module for retaining and prioritizing financial data, and a decision-making module to convert insights into trading strategies. The framework mimics professional traders' behavior, surpassing human limitations in data processing and continuously evolving to adapt to market conditions for superior investment outcomes.

github

: 57

llm-export

llm-export is a tool for exporting llm models to onnx and mnn formats. It has features such as passing onnxruntime correctness tests, optimizing the original code to support dynamic shapes, reducing constant parts, optimizing onnx models using OnnxSlim for performance improvement, and exporting lora weights to onnx and mnn formats. Users can clone the project locally, clone the desired LLM project locally, and use LLMExporter to export the model. The tool supports various export options like exporting the entire model as one onnx model, exporting model segments as multiple models, exporting model vocabulary to a text file, exporting specific model layers like Embedding and lm_head, testing the model with queries, validating onnx model consistency with onnxruntime, converting onnx models to mnn models, and more. Users can specify export paths, skip optimization steps, and merge lora weights before exporting.

github

: 255

beet

Beet is a collection of crates for authoring and running web pages, games and AI behaviors. It includes crates like `beet_flow` for scenes-as-control-flow bevy library, `beet_spatial` for spatial behaviors, `beet_ml` for machine learning, `beet_sim` for simulation tooling, `beet_rsx` for authoring tools for html and bevy, and `beet_router` for file-based router for web docs. The `beet` crate acts as a base crate that re-exports sub-crates based on feature flags, similar to the `bevy` crate structure.

github

: 80

SecReport

SecReport is a platform for collaborative information security penetration testing report writing and exporting, powered by ChatGPT. It standardizes penetration testing processes, allows multiple users to edit reports, offers custom export templates, generates vulnerability summaries and fix suggestions using ChatGPT, and provides APP security compliance testing reports. The tool aims to streamline the process of creating and managing security reports for penetration testing and compliance purposes.

github

: 170

BizFinBench

BizFinBench is a benchmark tool designed for evaluating large language models (LLMs) in logic-heavy and precision-critical domains such as finance. It comprises over 100,000 bilingual financial questions rooted in real-world business scenarios. The tool covers five dimensions: numerical calculation, reasoning, information extraction, prediction recognition, and knowledge-based question answering, mapped to nine fine-grained categories. BizFinBench aims to assess the capacity of LLMs in real-world financial scenarios and provides insights into their strengths and limitations.

github

: 203

awsome-distributed-training

This repository contains reference architectures and test cases for distributed model training with Amazon SageMaker Hyperpod, AWS ParallelCluster, AWS Batch, and Amazon EKS. The test cases cover different types and sizes of models as well as different frameworks and parallel optimizations (Pytorch DDP/FSDP, MegatronLM, NemoMegatron...).

github

: 230

airport

The 'airport' repository provides free Clash Meta nodes sourced from the internet, with testing every 6 hours to ensure quality and low latency. It includes features such as node deduplication, regional renaming, and geographical grouping.

github

: 218

YuLan-Mini

YuLan-Mini is a lightweight language model with 2.4 billion parameters that achieves performance comparable to industry-leading models despite being pre-trained on only 1.08T tokens. It excels in mathematics and code domains. The repository provides pre-training resources, including data pipeline, optimization methods, and annealing approaches. Users can pre-train their own language models, perform learning rate annealing, fine-tune the model, research training dynamics, and synthesize data. The team behind YuLan-Mini is AI Box at Renmin University of China. The code is released under the MIT License with future updates on model weights usage policies. Users are advised on potential safety concerns and ethical use of the model.

github

: 168

jailbreak_llms

This is the official repository for the ACM CCS 2024 paper 'Do Anything Now': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models. The project employs a new framework called JailbreakHub to conduct the first measurement study on jailbreak prompts in the wild, collecting 15,140 prompts from December 2022 to December 2023, including 1,405 jailbreak prompts. The dataset serves as the largest collection of in-the-wild jailbreak prompts. The repository contains examples of harmful language and is intended for research purposes only.

github

: 251

flute

FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) is a tool designed for uniform quantization and lookup table quantization of weights in lower-precision intervals. It offers flexibility in mapping intervals to arbitrary values through a lookup table. FLUTE supports various quantization formats such as int4, int3, int2, fp4, fp3, fp2, nf4, nf3, nf2, and even custom tables. The tool also introduces new quantization algorithms like Learned Normal Float (NFL) for improved performance and calibration data learning. FLUTE provides benchmarks, model zoo, and integration with frameworks like vLLM and HuggingFace for easy deployment and usage.

github

: 229

For similar tasks

agents

github

: 13.5k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 980

agentcloud

AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

github

: 583

oss-fuzz-gen

This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.

github

: 1.2k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136