agentsys

AI writes code. This automates everything else · 12 plugins · 41 agents · 27 skills · for Claude Code, OpenCode, Codex.

Stars: 424

Visit

AgentSys is a modular runtime and orchestration system for AI agents, with 13 plugins, 42 agents, and 28 skills that compose into structured pipelines for software development. It handles task selection, branch management, code review, artifact cleanup, CI, PR comments, and deployment. The system runs on Claude Code, OpenCode, and Codex CLI, providing a functional software suite and runtime for AI agent orchestration.

README:

AgentSys

A modular runtime and orchestration system for AI agents.

Renamed from awesome-slash — The awesome- prefix implies a curated list of links, but this project is a functional software suite and runtime. Please update your installs: npm install -g agentsys

13 plugins · 42 agents · 28 skills · 26k lines of lib code · 3,357 tests · 3 platforms

Commands · Installation · Website · Discussions

Built for Claude Code · Codex CLI · OpenCode

New skills, agents, and integrations ship constantly. Follow for real-time updates:

AI models can write code. That's not the hard part anymore. The hard part is everything around it — task selection, branch management, code review, artifact cleanup, CI, PR comments, deployment. AgentSys is the runtime that orchestrates agents to handle all of it — structured pipelines, gated phases, specialized agents, and persistent state that survives session boundaries.

Building custom skills, agents, hooks, or MCP tools? agnix is the CLI + LSP linter that catches config errors before they fail silently - real-time IDE validation, auto suggestions, auto-fix, and 155 rules for Cursor, Claude Code, Cline, Copilot, Codex, Windsurf, and more.

What This Is

An agent orchestration system — 13 plugins, 42 agents, and 28 skills that compose into structured pipelines for software development.

Each agent has a single responsibility, a specific model assignment, and defined inputs/outputs. Pipelines enforce phase gates so agents can't skip steps. State persists across sessions so work survives interruptions.

The system runs on Claude Code, OpenCode, and Codex CLI. Install the plugins, get the runtime.

The Approach

Code does code work. AI does AI work.

Detection: regex, AST analysis, static analysis—fast, deterministic, no tokens wasted
Judgment: LLM calls for synthesis, planning, review—where reasoning matters
Result: 77% fewer tokens for /drift-detect vs multi-agent approaches, certainty-graded findings throughout

Certainty levels exist because not all findings are equal:

Level	Meaning	Action
HIGH	Definitely a problem	Safe to auto-fix
MEDIUM	Probably a problem	Needs context
LOW	Might be a problem	Needs human judgment

This came from testing on 1,000+ repositories.

Commands

Command	What it does
`/next-task`	Task → exploration → plan → implementation → review → ship
`/agnix`	Lint agent configs - 155 rules for Skills, Memory, Hooks, MCP across 10+ AI tools
`/ship`	Branch → PR → CI → reviews addressed → merge → cleanup
`/deslop`	3-phase detection pipeline, certainty-graded findings
`/perf`	10-phase performance investigation with baselines and profiling
`/drift-detect`	AST-based plan vs code analysis, finds what's documented but not built
`/audit-project`	Multi-agent code review, iterates until issues resolved
`/enhance`	Analyzes prompts, agents, plugins, docs, hooks, skills
`/repo-map`	AST symbol and import mapping via ast-grep
`/sync-docs`	Finds outdated references, stale examples, missing CHANGELOG entries
`/learn`	Research any topic, gather online sources, create learning guide with RAG index
`/consult`	Consult another AI CLI tool for a second opinion. Use when you want to cross-check ideas, get alternative approaches, or validate decisions with Gemini, Codex, Claude, OpenCode, or Copilot.
`/debate`	Structured debate between two AI tools to stress-test ideas. Proposer/Challenger format with a verdict.

Each command works standalone. Together, they compose into end-to-end pipelines.

Skills

28 skills included across the plugins:

Category	Skills
Performance	`perf:perf-analyzer`, `perf:perf-baseline-manager`, `perf:perf-benchmarker`, `perf:perf-code-paths`, `perf:perf-investigation-logger`, `perf:perf-profiler`, `perf:perf-theory-gatherer`, `perf:perf-theory-tester`
Enhancement	`enhance:enhance-agent-prompts`, `enhance:enhance-claude-memory`, `enhance:enhance-cross-file`, `enhance:enhance-docs`, `enhance:enhance-hooks`, `enhance:enhance-orchestrator`, `enhance:enhance-plugins`, `enhance:enhance-prompts`, `enhance:enhance-skills`
Workflow	`next-task:discover-tasks`, `next-task:orchestrate-review`, `next-task:validate-delivery`
Cleanup	`deslop:deslop`, `sync-docs:sync-docs`
Analysis	`debate:debate`, `drift-detect:drift-analysis`, `repo-map:repo-mapping`
Productivity	`consult:consult`
Learning	`learn:learn`
Linting	`agnix:agnix`

Skills are the reusable implementation units. Agents invoke skills; commands orchestrate agents. When you install a plugin, its skills become available to all agents in that session.

Quick Navigation

Section	What's there
The Approach	Why it's built this way
Commands	All 12 commands overview
Skills	28 skills across plugins
Command Details	Deep dive into each command
How Commands Work Together	Standalone vs integrated
Design Philosophy	The thinking behind the architecture
Installation	Get started
Research & Testing	What went into building this
Documentation	Links to detailed docs

Command Details

/next-task

Purpose: Complete task-to-production automation.

What happens when you run it:

Policy Selection - Choose task source (GitHub issues, GitLab, local file), priority filter, stopping point
Task Discovery - Shows top 5 prioritized tasks, you pick one
Worktree Setup - Creates isolated branch and working directory
Exploration - Deep codebase analysis to understand context
Planning - Designs implementation approach
User Approval - You review and approve the plan (last human interaction)
Implementation - Executes the plan
Pre-Review - Runs deslop-agent and test-coverage-checker
Review Loop - Multi-agent review iterates until clean
Delivery Validation - Verifies tests pass, build passes, requirements met
Docs Update - Updates CHANGELOG and related documentation
Ship - Creates PR, monitors CI, addresses comments, merges

Phase 9 uses the orchestrate-review skill to spawn parallel reviewers (code quality, security, performance, test coverage) plus conditional specialists.

Agents involved:

Agent	Model	Role
task-discoverer	sonnet	Finds and ranks tasks from your source
worktree-manager	haiku	Creates git worktrees and branches
exploration-agent	opus	Deep codebase analysis before planning
planning-agent	opus	Designs step-by-step implementation plan
implementation-agent	opus	Writes the actual code
test-coverage-checker	sonnet	Validates tests exist and are meaningful
delivery-validator	sonnet	Final checks before shipping
ci-monitor	haiku	Watches CI status
ci-fixer	sonnet	Fixes CI failures and review comments
simple-fixer	haiku	Executes mechanical edits

Cross-plugin agent:

Agent	Plugin	Role
deslop-agent	deslop	Removes AI artifacts before review
sync-docs-agent	sync-docs	Updates documentation

Usage:

/next-task              # Start new workflow
/next-task --resume     # Resume interrupted workflow
/next-task --status     # Check current state
/next-task --abort      # Cancel and cleanup

Full workflow documentation →

/agnix

Purpose: Lint agent configurations before they break your workflow. The first dedicated linter for AI agent configs.

agnix is a standalone open-source project that provides the validation engine. This plugin integrates it into your workflow.

The problem it solves:

Agent configurations are code. They affect behavior, security, and reliability. But unlike application code, they have no linting. You find out your SKILL.md is malformed when the agent fails. You discover your hooks have security issues when they're exploited. You realize your CLAUDE.md has conflicting rules when the AI behaves unexpectedly.

agnix catches these issues before they cause problems.

What it validates:

Category	What It Checks
Structure	Required fields, valid YAML/JSON, proper frontmatter
Security	Prompt injection vectors, overpermissive tools, exposed secrets
Consistency	Conflicting rules, duplicate definitions, broken references
Best Practices	Tool restrictions, model selection, trigger phrase quality
Cross-Platform	Compatibility across Claude Code, Cursor, Copilot, Codex, OpenCode, Gemini CLI, Cline, and more

155 validation rules (57 auto-fixable) derived from:

Official tool specifications (Claude Code, Cursor, GitHub Copilot, Codex CLI, OpenCode, Gemini CLI, and more)
Research papers on agent reliability and prompt injection
Real-world testing across 500+ repositories
Community-reported issues and edge cases

Supported files:

File Type	Examples
Skills	`SKILL.md`, `*/SKILL.md`
Memory	`CLAUDE.md`, `AGENTS.md`, `.github/CLAUDE.md`
Hooks	`.claude/settings.json`, hooks configuration
MCP	`*.mcp.json`, MCP server configs
Cursor	`.cursor/rules/*.mdc`, `.cursorrules`
Copilot	`.github/copilot-instructions.md`

CI/CD Integration:

agnix outputs SARIF format for GitHub Code Scanning. Add it to your workflow:

- name: Lint agent configs
  run: agnix --format sarif > results.sarif
- uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

Usage:

/agnix                       # Validate current project
/agnix --fix                 # Auto-fix fixable issues
/agnix --strict              # Treat warnings as errors
/agnix --target claude-code  # Only Claude Code rules
/agnix --format sarif        # Output for GitHub Code Scanning

Agent: agnix-agent (sonnet model)

External tool: Requires agnix CLI

npm install -g agnix         # Install via npm
# or
cargo install agnix-cli      # Install via Cargo
# or
brew install agnix           # Install via Homebrew (macOS)

Why use agnix:

Catch config errors before they cause agent failures
Enforce security best practices across your team
Maintain consistency as your agent configs grow
Integrate validation into CI/CD pipelines
Support multiple AI tools from one linter

/ship

Purpose: Takes your current branch from "ready to commit" to "merged PR."

What happens when you run it:

Pre-flight - Detects CI platform, deployment platform, branch strategy
Commit - Stages and commits with generated message (if uncommitted changes)
Push & PR - Pushes branch, creates pull request
CI Monitor - Waits for CI, retries on transient failures
Review Wait - Waits 3 minutes for auto-reviewers (Copilot, Claude, Gemini, Codex)
Address Comments - Handles every comment from every reviewer
Merge - Merges when all comments resolved and CI passes
Deploy - Deploys and validates (if multi-branch workflow)
Cleanup - Removes worktree, closes issue, deletes branch

Platform Detection:

Type	Detected
CI	GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis
Deploy	Railway, Vercel, Netlify, Fly.io, Render
Project	Node.js, Python, Rust, Go, Java

Review Comment Handling:

Every comment gets addressed. No exceptions. The workflow categorizes comments and handles each:

Code fixes get implemented
Style suggestions get applied
Questions get answered
False positives get explained

If something can't be fixed, the workflow replies explaining why and resolves the thread.

Usage:

/ship                       # Full workflow
/ship --dry-run             # Preview without executing
/ship --strategy rebase     # Use rebase instead of squash

Full workflow documentation →

/deslop

Purpose: Finds AI slop—debug statements, placeholder text, verbose comments, TODOs—and removes it.

How detection works:

Three phases run in sequence:

Phase 1: Regex Patterns (HIGH certainty)
- console.log, print(), dbg!(), println!()
- // TODO, // FIXME, // HACK
- Empty catch blocks, disabled linters
- Hardcoded secrets (API keys, tokens)
Phase 2: Multi-Pass Analyzers (MEDIUM certainty)
- Doc-to-code ratio (excessive comments)
- Verbosity ratio (AI preambles)
- Over-engineering patterns
- Buzzword inflation
- Dead code detection
- Stub functions
Phase 3: CLI Tools (LOW certainty, optional)
- jscpd, madge, escomplex (JS/TS)
- pylint, radon (Python)
- golangci-lint (Go)
- clippy (Rust)

Languages supported: JavaScript/TypeScript, Python, Rust, Go, Java

Usage:

/deslop              # Report only (safe)
/deslop apply        # Fix HIGH certainty issues
/deslop apply src/ 10  # Fix 10 issues in src/

Thoroughness levels:

quick - Phase 1 only (fastest)
normal - Phase 1 + Phase 2 (default)
deep - All phases if tools available

Pattern reference →

/perf

Purpose: Structured performance investigation with baselines, profiling, and evidence-backed decisions.

10-phase methodology (based on recorded real performance investigation sessions):

Setup - Confirm scenario, success criteria, benchmark command
Baseline - 60s minimum runs, PERF_METRICS markers required
Breaking Point - Binary search to find failure threshold
Constraints - CPU/memory limits, measure delta vs baseline
Hypotheses - Generate up to 5 hypotheses with evidence and confidence
Code Paths - Use repo-map to identify entrypoints and hot files
Profiling - Language-specific tools (--cpu-prof, JFR, cProfile, pprof)
Optimization - One change per experiment, 2+ validation passes
Decision - Continue or stop based on measurable improvement
Consolidation - Final baseline, evidence log, investigation complete

Agents and skills:

Component	Role
perf-orchestrator	Coordinates all phases
perf-theory-gatherer	Generates hypotheses from git history and code
perf-theory-tester	Validates hypotheses with controlled experiments
perf-analyzer	Synthesizes findings into recommendations
perf-code-paths	Maps entrypoints and likely hot paths
perf-investigation-logger	Structured evidence logging

Usage:

/perf                 # Start new investigation
/perf --resume        # Resume previous investigation

Phase flags (advanced):

/perf --phase baseline --command "npm run bench" --version v1.2.0
/perf --phase breaking-point --param-min 1 --param-max 500
/perf --phase constraints --cpu 1 --memory 1GB
/perf --phase hypotheses --hypotheses-file perf-hypotheses.json
/perf --phase optimization --change "reduce allocations"
/perf --phase decision --verdict stop --rationale "no measurable improvement"

/drift-detect

Purpose: Compares your documentation and plans to what's actually in the code.

The problem it solves:

Your roadmap says "user authentication: done." But is it actually implemented? Your GitHub issue says "add dark mode." Is it already in the codebase? Plans drift from reality. This command finds the drift.

How it works:

JavaScript collectors gather data (fast, token-efficient)
- GitHub issues and their labels
- Documentation files
- Actual code exports and implementations
Single Opus call performs semantic analysis
- Matches concepts, not strings ("user auth" matches auth/, login.js, session.ts)
- Identifies implemented but not documented
- Identifies documented but not implemented
- Finds stale issues that should be closed

Why this approach:

Multi-agent collection wastes tokens on coordination. JavaScript collectors are fast and deterministic. One well-prompted LLM call does the actual analysis. Result: 77% token reduction vs multi-agent approaches.

Tested on 1,000+ repositories before release.

Usage:

/drift-detect              # Full analysis
/drift-detect --depth quick  # Quick scan

/audit-project

Purpose: Multi-agent code review that iterates until issues are resolved.

What happens when you run it:

Up to 10 specialized role-based agents run based on your project:

Agent	When Active	Focus Area
code-quality-reviewer	Always	Code quality, error handling
security-expert	Always	Vulnerabilities, auth, secrets
performance-engineer	Always	N+1 queries, memory, blocking ops
test-quality-guardian	Always	Coverage, edge cases, mocking
architecture-reviewer	If 50+ files	Modularity, patterns, SOLID
database-specialist	If DB detected	Queries, indexes, transactions
api-designer	If API detected	REST, errors, pagination
frontend-specialist	If frontend detected	Components, state, UX
backend-specialist	If backend detected	Services, domain logic
devops-reviewer	If CI/CD detected	Pipelines, configs, secrets

Findings are collected and categorized by severity (critical/high/medium/low). All non-false-positive issues get fixed automatically. The loop repeats until no open issues remain.

Usage:

/audit-project                   # Full review
/audit-project --quick           # Single pass
/audit-project --resume          # Resume from queue file
/audit-project --domain security # Security focus only
/audit-project --recent          # Only recent changes

Agent reference →

/enhance

Purpose: Analyzes your prompts, plugins, agents, docs, hooks, and skills for improvement opportunities.

Seven analyzers run in parallel:

Analyzer	What it checks
plugin-enhancer	Plugin structure, MCP tool definitions, security patterns
agent-enhancer	Agent frontmatter, prompt quality
claudemd-enhancer	CLAUDE.md/AGENTS.md structure, token efficiency
docs-enhancer	Documentation readability, RAG optimization
prompt-enhancer	Prompt engineering patterns, clarity, examples
hooks-enhancer	Hook frontmatter, structure, safety
skills-enhancer	SKILL.md structure, trigger phrases

Each finding includes:

Certainty level (HIGH/MEDIUM/LOW)
Specific location (file:line)
What's wrong
How to fix it
Whether it can be auto-fixed

Auto-learning: Detects obvious false positives (pattern docs, workflow gates) and saves them for future runs. Reduces noise over time without manual suppression files.

Usage:

/enhance                    # Run all analyzers
/enhance --focus=agent      # Just agent prompts
/enhance --apply            # Apply HIGH certainty fixes
/enhance --show-suppressed  # Show what's being filtered
/enhance --no-learn         # Analyze but don't save false positives

/repo-map

Purpose: Builds an AST-based map of symbols and imports for fast repo analysis.

What it generates:

Cached file→symbols map (exports, functions, classes)
Import graph for dependency hints

Output is cached at {state-dir}/repo-map.json and exposed via the MCP repo_map tool.

Why it matters:

Tools like /drift-detect and planners can use the map instead of re-scanning the repo every time.

Usage:

/repo-map init        # First-time map generation
/repo-map update      # Incremental update
/repo-map status      # Check freshness

Required: ast-grep (sg) must be installed.

/sync-docs

Purpose: Sync documentation with actual code changes—find outdated refs, update CHANGELOG, flag stale examples.

The problem it solves:

You refactor auth.js into auth/index.js. Your README still says import from './auth'. You rename a function. Three docs still reference the old name. You ship a feature. CHANGELOG doesn't mention it. Documentation drifts from code. This command finds the drift.

What it detects:

Category	Examples
Broken references	Imports to moved/renamed files, deleted exports
Version mismatches	Doc says v2.0, package.json says v2.1
Stale code examples	Import paths that no longer exist
Missing CHANGELOG	`feat:` and `fix:` commits without entries

Auto-fixable vs flagged:

Auto-fixable (apply mode)	Flagged for review
Version number updates	Removed exports referenced in docs
CHANGELOG entries for commits	Code examples needing context
	Function renames

Usage:

/sync-docs              # Check what docs need updates (safe)
/sync-docs apply        # Apply safe fixes
/sync-docs report src/  # Check docs related to src/
/sync-docs --all        # Full codebase scan

/learn

Purpose: Research any topic online and create a comprehensive learning guide with RAG-optimized indexes.

What it does:

Progressive Discovery - Uses funnel approach (broad → specific → deep) to find quality sources
Quality Scoring - Scores sources by authority, recency, depth, examples, uniqueness
Just-In-Time Extraction - Fetches only high-scoring sources to save tokens
Synthesis - Creates structured learning guide with examples and best practices
RAG Index - Updates CLAUDE.md/AGENTS.md master index for future lookups
Enhancement - Runs enhance:enhance-docs and enhance:enhance-prompts

Depth levels:

Depth	Sources	Use Case
brief	10	Quick overview
medium	20	Default, balanced
deep	40	Comprehensive

Output structure:

agent-knowledge/
  CLAUDE.md                    # Master index (updated each run)
  AGENTS.md                    # Index for OpenCode/Codex
  recursion.md                 # Topic-specific guide
  resources/
    recursion-sources.json     # Source metadata with quality scores

Usage:

/learn recursion                    # Default (20 sources)
/learn react hooks --depth=deep     # Comprehensive (40 sources)
/learn kubernetes --depth=brief     # Quick overview (10 sources)
/learn python async --no-enhance    # Skip enhancement pass

Agent: learn-agent (opus model for research quality)

/consult

Purpose: Get a second opinion from another AI CLI tool without leaving your current session.

What it does:

Tool Detection - Detects which AI CLI tools are installed (cross-platform)
Interactive Picker - If no tool specified, shows only installed tools to choose from
Effort Mapping - Maps effort levels to per-provider models and reasoning flags
Execution - Runs the consultation with safe-mode defaults and 120s timeout
Session Continuity - Saves session state for Claude and Gemini (supports --continue)

Supported tools:

Tool	Default Model (high)	Reasoning Control
Claude	opus	max-turns
Gemini	gemini-3-pro-preview	built-in
Codex	gpt-5.3-codex	model_reasoning_effort
OpenCode	github-copilot/claude-opus-4-6	--variant
Copilot	(default)	none

Usage:

/consult "Is this the right approach?" --tool=gemini --effort=high
/consult "Review for performance issues" --tool=codex
/consult "Suggest alternatives" --tool=claude --effort=max
/consult "Continue from where we left off" --continue
/consult "Explain this error" --context=diff --tool=gemini

Agent: consult-agent (sonnet model for orchestration)

How Commands Work Together

Standalone use:

/deslop apply          # Just clean up your code
/sync-docs             # Just check if docs need updates
/ship                  # Just ship this branch
/audit-project         # Just review the codebase

Integrated workflow:

When you run /next-task, it orchestrates everything:

/next-task picks task → explores codebase → plans implementation
    ↓
implementation-agent writes code
    ↓
deslop-agent cleans AI artifacts
    ↓
Phase 9 review loop iterates until approved
    ↓
delivery-validator checks requirements
    ↓
sync-docs-agent syncs documentation
    ↓
[/ship](#ship) creates PR → monitors CI → merges

The workflow tracks state so you can resume from any point.

Design Philosophy

Architecture decisions and trade-offs (click to expand)

The Actual Problem

Frontier models write good code. That's solved. What's not solved:

Context management - Models forget what they're doing mid-session
Compaction amnesia - Long sessions get summarized, losing critical state
Task drift - Without structure, agents wander from the actual goal
Skipped steps - Agents skip reviews, tests, or cleanup when not enforced
Token waste - Using LLM calls for work that static analysis can do faster
Babysitting - Manually orchestrating each phase of development
Repetitive requests - Asking for the same workflow every single session

How This Addresses It

1. One agent, one job, done extremely well

Same principle as good code: single responsibility. The exploration-agent explores. The implementation-agent implements. Phase 9 spawns multiple focused reviewers. No agent tries to do everything. Specialized agents, each with narrow scope and clear success criteria.

2. Pipeline with gates, not a monolith

Same principle as DevOps. Each step must pass before the next begins. Can't push before review. Can't merge before CI passes. Hooks enforce this—agents literally cannot skip phases.

3. Tools do tool work, agents do agent work

If static analysis, regex, or a shell command can do it, don't ask an LLM. Pattern detection uses pre-indexed regex. File discovery uses glob. Platform detection uses file existence checks. The LLM only handles what requires judgment.

4. Agents don't need to know how tools work

The slop detector returns findings with certainty levels. The agent doesn't need to understand the three-phase pipeline, the regex patterns, or the analyzer heuristics. Good tool design means the consumer doesn't need implementation details.

5. Build tools where tools don't exist

Many tasks lack existing tools. JavaScript collectors for drift-detect. Multi-pass analyzers for slop detection. The result: agents receive structured data, not raw problems to figure out.

6. Research-backed prompt engineering

Documented techniques that measurably improve results:

Progressive disclosure - Agents see only what's needed for the current step
Structured output - JSON between delimiters, XML tags for sections
Explicit constraints - What agents MUST NOT do matters as much as what they do
Few-shot examples - Where patterns aren't obvious
Tool calling over generation - Let the model use tools rather than generate tool-like output

7. Validate plan and results, not every step

Approve the plan. See the results. The middle is automated. One plan approval unlocks autonomous execution through implementation, review, cleanup, and shipping.

8. Right model for the task

Match model capability to task complexity:

opus - Exploration, planning, implementation, review orchestration
sonnet - Pattern matching, validation, discovery
haiku - Git operations, file moves, CI polling

Quality compounds. Poor exploration → poor plan → poor implementation → review cycles. Early phases deserve the best model.

9. Persistent state survives sessions

Two JSON files track everything: what task, what phase. Sessions can die and resume. Multiple sessions run in parallel on different tasks using separate worktrees.

10. Delegate everything automatable

Agents don't just write code. They:

Clean their own output (deslop-agent)
Update documentation (sync-docs-agent)
Fix CI failures (ci-fixer)
Respond to review comments
Check for plan drift (/drift-detect)
Analyze their own prompts (/enhance)

If it can be specified, it can be delegated.

11. Orchestrator stays high-level

The main workflow orchestrator doesn't read files, search code, or write implementations. It launches specialized agents and receives their outputs. Keeps the orchestrator's context window available for coordination rather than filled with file contents.

12. Composable, not monolithic

Every command works standalone. /deslop cleans code without needing /next-task. /ship merges PRs without needing the full workflow. Pieces compose together, but each piece is useful on its own.

What This Gets You

Run multiple sessions - Different tasks in different worktrees, no interference
Fast iteration - Approve plan, check results, repeat
Stay in the interesting parts - Policy decisions, architecture choices, edge cases
Minimal review burden - Most issues caught and fixed before you see the output
No repetitive requests - The workflow you want, without asking each time
Scale horizontally - More sessions, more tasks, same oversight level

Installation

Claude Code (Recommended way)

/plugin marketplace add avifenesh/agentsys
/plugin install next-task@agentsys
/plugin install ship@agentsys

All Platforms (npm)

npm install -g agentsys && agentsys

Interactive installer for Claude Code, OpenCode, and Codex CLI.

# Non-interactive install
agentsys --tool claude              # Single tool
agentsys --tools "claude,opencode"  # Multiple tools
agentsys --development              # Dev mode (bypasses marketplace)

Full installation guide →

Requirements

Required:

Git
Node.js 18+

For GitHub workflows:

GitHub CLI (gh) authenticated

For GitLab workflows:

GitLab CLI (glab) authenticated

For /repo-map:

ast-grep (sg) installed

For /agnix:

agnix CLI installed (cargo install agnix-cli or brew install agnix)

Local diagnostics (optional):

npm run detect   # Platform detection (CI, deploy, project type)
npm run verify   # Tool availability + versions

Research & Testing

The system is built on research, not guesswork.

Knowledge base (agent-docs/): 8,000 lines of curated documentation from Anthropic, OpenAI, Google, and Microsoft covering:

Agent architecture and design patterns
Prompt engineering techniques
Function calling and tool use
Context efficiency and token optimization
Multi-agent systems and orchestration
Instruction following reliability

Testing:

1,818 tests passing
Drift-detect validated on 1,000+ repositories
E2E workflow testing across all commands
Cross-platform validation (Claude Code, OpenCode, Codex CLI)

Methodology:

/perf investigation phases based on recorded real performance investigation sessions
Certainty levels derived from pattern analysis across repositories
Token optimization measured and validated (77% reduction in drift-detect)

Documentation

Topic	Link
Installation	docs/INSTALLATION.md
Cross-Platform Setup	docs/CROSS_PLATFORM.md
Usage Examples	docs/USAGE.md
Architecture	docs/ARCHITECTURE.md

Workflow Deep-Dives

Workflow	Link
/next-task Flow	docs/workflows/NEXT-TASK.md
/ship Flow	docs/workflows/SHIP.md

Reference

Topic	Link
Slop Patterns	docs/reference/SLOP-PATTERNS.md
Agent Reference	docs/reference/AGENTS.md

Support

Issues: github.com/avifenesh/agentsys/issues
Discussions: github.com/avifenesh/agentsys/discussions

MIT License | Made by Avi Fenesh

For Tasks:

Click tags to check more tools for each tasks

explore codebase plan implementation clean code artifacts sync documentation merge prs

For Jobs:

software developer ai engineer devops engineer machine learning engineer software architect

Alternative AI tools for agentsys

Similar Open Source Tools

agentsys

github

: 424

awesome-slash

Automate the entire development workflow beyond coding. awesome-slash provides production-ready skills, agents, and commands for managing tasks, branches, reviews, CI, and deployments. It automates the entire workflow, including task exploration, planning, implementation, review, and shipping. The tool includes 11 plugins, 40 agents, 26 skills, and 26k lines of lib code, with 3,357 tests and support for 3 platforms. It works with Claude Code, OpenCode, and Codex CLI, offering specialized capabilities through skills and agents.

github

: 367

empirica

Empirica is an epistemic self-awareness framework for AI agents to understand their knowledge boundaries. It introduces epistemic vectors to measure knowledge state and uncertainty, enabling honest communication. The tool emerged from 600+ real working sessions across various AI systems, providing cognitive infrastructure for distinguishing between confident knowledge and guessing. Empirica's 13 foundational vectors cover engagement, domain knowledge depth, execution capability, information access, understanding clarity, coherence, signal-to-noise ratio, information richness, working state, progress rate, task completion level, work significance, and explicit doubt tracking. It is applicable across industries like software development, research, healthcare, legal, education, and finance, aiding in tasks such as code review, hypothesis testing, diagnostic confidence, case analysis, learning assessment, and risk assessment.

github

: 112

rhesis

Rhesis is a comprehensive test management platform designed for Gen AI teams, offering tools to create, manage, and execute test cases for generative AI applications. It ensures the robustness, reliability, and compliance of AI systems through features like test set management, automated test generation, edge case discovery, compliance validation, integration capabilities, and performance tracking. The platform is open source, emphasizing community-driven development, transparency, extensible architecture, and democratizing AI safety. It includes components such as backend services, frontend applications, SDK for developers, worker services, chatbot applications, and Polyphemus for uncensored LLM service. Rhesis enables users to address challenges unique to testing generative AI applications, such as non-deterministic outputs, hallucinations, edge cases, ethical concerns, and compliance requirements.

github

: 277

nothumanallowed

NotHumanAllowed is a security-first platform built exclusively for AI agents. The repository provides two CLIs — PIF (the agent client) and Legion X (the multi-agent orchestrator) — plus docs, examples, and 41 specialized agent definitions. Every agent authenticates via Ed25519 cryptographic signatures, ensuring no passwords or bearer tokens are used. Legion X orchestrates 41 specialized AI agents through a 9-layer Geth Consensus pipeline, with zero-knowledge protocol ensuring API keys stay local. The system learns from each session, with features like task decomposition, neural agent routing, multi-round deliberation, and weighted authority synthesis. The repository also includes CLI commands for orchestration, agent management, tasks, sandbox execution, Geth Consensus, knowledge search, configuration, system health check, and more.

github

: 59

claudia

Claudia is a personal AI assistant that tracks relationships and commitments, helping users remember important details and connections across their network. It catches commitments, remembers context, warns before things slip, and shows the source of information. Claudia syncs memory to an Obsidian vault, ensuring data privacy by running fully locally. Users can try a demo mode with pre-populated data or install Claudia for personalized workspace creation. It is suitable for roles like consultants, executives, founders, solo professionals, and creators, offering features like client folders, deliverable tracking, leadership tools, and collaboration tracking.

github

: 56

airunner

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

github

: 1.2k

Evaluator

NeMo Evaluator SDK is an open-source platform for robust, reproducible, and scalable evaluation of Large Language Models. It enables running hundreds of benchmarks across popular evaluation harnesses against any OpenAI-compatible model API. The platform ensures auditable and trustworthy results by executing evaluations in open-source Docker containers. NeMo Evaluator SDK is built on four core principles: Reproducibility by Default, Scale Anywhere, State-of-the-Art Benchmarking, and Extensible and Customizable.

github

: 194

explain-openclaw

Explain OpenClaw is a comprehensive documentation repository for the OpenClaw framework, a self-hosted AI assistant platform. It covers various aspects such as plain English explanations, technical architecture, deployment scenarios, privacy and safety measures, security audits, worst-case security scenarios, optimizations, and AI model comparisons. The repository serves as a living knowledge base with beginner-friendly explanations and detailed technical insights for contributors.

github

: 69

orchestkit

OrchestKit is a powerful and flexible orchestration tool designed to streamline and automate complex workflows. It provides a user-friendly interface for defining and managing orchestration tasks, allowing users to easily create, schedule, and monitor workflows. With support for various integrations and plugins, OrchestKit enables seamless automation of tasks across different systems and applications. Whether you are a developer looking to automate deployment processes or a system administrator managing complex IT operations, OrchestKit offers a comprehensive solution to simplify and optimize your workflow management.

github

: 74

pocketpaw

PocketPaw is a lightweight and user-friendly tool designed for managing and organizing your digital assets. It provides a simple interface for users to easily categorize, tag, and search for files across different platforms. With PocketPaw, you can efficiently organize your photos, documents, and other files in a centralized location, making it easier to access and share them. Whether you are a student looking to organize your study materials, a professional managing project files, or a casual user wanting to declutter your digital space, PocketPaw is the perfect solution for all your file management needs.

github

: 182

netdata

Netdata is an open-source, real-time infrastructure monitoring platform that provides instant insights, zero configuration deployment, ML-powered anomaly detection, efficient monitoring with minimal resource usage, and secure & distributed data storage. It offers real-time, per-second updates and clear insights at a glance. Netdata's origin story involves addressing the limitations of existing monitoring tools and led to a fundamental shift in infrastructure monitoring. It is recognized as the most energy-efficient tool for monitoring Docker-based systems according to a study by the University of Amsterdam.

github

: 77.8k

deepfabric

DeepFabric is a CLI tool and SDK designed for researchers and developers to generate high-quality synthetic datasets at scale using large language models. It leverages a graph and tree-based architecture to create diverse and domain-specific datasets while minimizing redundancy. The tool supports generating Chain of Thought datasets for step-by-step reasoning tasks and offers multi-provider support for using different language models. DeepFabric also allows for automatic dataset upload to Hugging Face Hub and uses YAML configuration files for flexibility in dataset generation.

github

: 533

agentfield

AgentField is an open-source control plane designed for autonomous AI agents, providing infrastructure for agents to make decisions beyond chatbots. It offers features like scaling infrastructure, routing & discovery, async execution, durable state, observability, trust infrastructure with cryptographic identity, verifiable credentials, and policy enforcement. Users can write agents in Python, Go, TypeScript, or interact via REST APIs. The tool enables the creation of AI backends that reason autonomously within defined boundaries, offering predictability and flexibility. AgentField aims to bridge the gap between AI frameworks and production-ready infrastructure for AI agents.

github

: 661

openakita

OpenAkita is a self-evolving AI Agent framework that autonomously learns new skills, performs daily self-checks and repairs, accumulates experience from task execution, and persists until the task is done. It auto-generates skills, installs dependencies, learns from mistakes, and remembers preferences. The framework is standards-based, multi-platform, and provides a Setup Center GUI for intuitive installation and configuration. It features self-learning and evolution mechanisms, a Ralph Wiggum Mode for persistent execution, multi-LLM endpoints, multi-platform IM support, desktop automation, multi-agent architecture, scheduled tasks, identity and memory management, a tool system, and a guided wizard for setup.

github

: 93

dotclaude

A sophisticated multi-agent configuration system for Claude Code that provides specialized agents and command templates to accelerate code review, refactoring, security audits, tech-lead-guidance, and UX evaluations. It offers essential commands, directory structure details, agent system overview, command templates, usage patterns, collaboration philosophy, sync management, advanced usage guidelines, and FAQ. The tool aims to streamline development workflows, enhance code quality, and facilitate collaboration between developers and AI agents.

github

: 149

For similar tasks

agentsys

github

: 424

chat-with-code

Chat-with-code is a codebase chatbot that enables users to interact with their codebase using the OpenAI Language Model. It provides a user-friendly chat interface where users can ask questions and interact with their code. The tool clones, chunks, and embeds the codebase, allowing for natural language interactions. It is designed to assist users in exploring and understanding their codebase more intuitively.

github

: 51

Devon

Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.

github

: 2.6k

sage

Sage is a tool that allows users to chat with any codebase, providing a chat interface for code understanding and integration. It simplifies the process of learning how a codebase works by offering heavily documented answers sourced directly from the code. Users can set up Sage locally or on the cloud with minimal effort. The tool is designed to be easily customizable, allowing users to swap components of the pipeline and improve the algorithms powering code understanding and generation.

github

: 705

brokk

Brokk is a code assistant designed to understand code semantically, allowing LLMs to work effectively on large codebases. It offers features like agentic search, summarizing related classes, parsing stack traces, adding source for usages, and autonomously fixing errors. Users can interact with Brokk through different panels and commands, enabling them to manipulate context, ask questions, search codebase, run shell commands, and more. Brokk helps with tasks like debugging regressions, exploring codebase, AI-powered refactoring, and working with dependencies. It is particularly useful for making complex, multi-file edits with o1pro.

github

: 64

AI-Governor-Framework

The AI Governor Framework is a system designed to govern AI assistants in coding projects by providing rules and workflows to ensure consistency, respect architectural decisions, and enforce coding standards. It leverages Context Engineering to provide the AI with the right information at the right time, using an In-Repo approach to keep governance rules and architectural context directly inside the repository. The framework consists of two core components: The Governance Engine for passive rules and the Operator's Playbook for active protocols. It follows a 4-step Operator's Playbook to move features from idea to production with clarity and control.

github

: 58

awesome-slash

github

: 367

smithers

Smithers is a tool for declarative AI workflow orchestration using React components. It allows users to define complex multi-agent workflows as component trees, ensuring composability, durability, and error handling. The tool leverages React's re-rendering mechanism to persist outputs to SQLite, enabling crashed workflows to resume seamlessly. Users can define schemas for task outputs, create workflow instances, define agents, build workflow trees, and run workflows programmatically or via CLI. Smithers supports components for pipeline stages, structured output validation with Zod, MDX prompts, validation loops with Ralph, dynamic branching, and various built-in tools like read, edit, bash, grep, and write. The tool follows a clear workflow execution process involving defining, rendering, executing, re-rendering, and repeating tasks until completion, all while storing task results in SQLite for fault tolerance.

github

: 52

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k