cordum
Cordum (cordum.io) is a platform-only control plane for autonomous AI Agents and external workers. It uses NATS for the bus, Redis for state and payload pointers, and CAP v2 wire contracts for jobs, results, and heartbeats. Workers and product packs live outside this repo.Core cordum
Stars: 448
Cordum is a control plane for AI agents designed to close the Trust Gap by providing safety, observability, and control features. It allows teams to deploy autonomous agents with built-in governance mechanisms, including safety policies, workflow orchestration, job routing, observability, and human-in-the-loop approvals. The tool aims to address the challenges of deploying AI agents in production by offering visibility, safety rails, audit trails, and approval mechanisms for sensitive operations.
README:
AI Agent Governance Control Plan
Deploy autonomous agents with built-in safety, observability, and control.
AI agents are powerful. They're also unpredictable.
Teams deploying agents in production face the Trust Gap: the distance between what an agent can do and what you're confident letting it do unsupervised.
Without governance, you're flying blind:
- No visibility into what agents are doing
- No safety rails before dangerous actions
- No audit trail when things go wrong
- No way to require human approval for sensitive operations
Cordum is a control plane for AI agents that closes the Trust Gap.
┌─────────────────────────────────────────────────────────────────┐
│ Cordum │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ API │──▶│ Scheduler│──▶│ Safety │──▶│ Worker Pools │ │
│ │ Gateway │ │ │ │ Kernel │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ [Dashboard] [Workflows] [Policies] [Your Agents] │
└─────────────────────────────────────────────────────────────────┘
What Cordum does:
- Safety Kernel — Policy checks (allow/deny/throttle/human-approve) before any job runs
- Workflow Engine — Orchestrate multi-step agent workflows with retries, approvals, and timeouts
- Job Routing — Distribute work across agent pools with capability-based routing
- Observability — Full audit trail, traces, and real-time dashboard
- Human-in-the-Loop — Require approval for sensitive operations
Prerequisites: Docker, Docker Compose, Go 1.24+
# Clone the repo
git clone https://github.com/cordum-io/cordum.git
cd cordum
# Set an API key
export CORDUM_API_KEY="$(openssl rand -hex 32)"
# Start everything (auto-generates TLS certs on first run)
go run ./cmd/cordumctl up
# Open dashboard
open http://localhost:8082Or use the quickstart script:
export CORDUM_API_KEY="$(openssl rand -hex 32)"
./tools/scripts/quickstart.shThat's it. You have a running Cordum instance with API, scheduler, safety kernel, dashboard, and TLS enabled by default. System configuration is auto-bootstrapped on first startup.
Cordum uses CAP (Cordum Agent Protocol) for all agent communication:
- Submit — Client submits a job via API
- Safety Check — Scheduler asks Safety Kernel: allow, deny, throttle, or require approval?
- Dispatch — Approved jobs route to the right worker pool via NATS
- Execute — Your agent runs the job (using MCP, LangChain, whatever)
- Result — Agent returns result; Cordum updates state and notifies client
Client ──▶ API ──▶ Scheduler ──▶ Safety Kernel ──▶ NATS ──▶ Agent Pool
│ │
▼ ▼
[Redis State] [Your Agents]
Key design choices:
-
Payloads stay off the bus —
context_ptrandresult_ptrreference Redis/S3, keeping the message bus lean - Protocol-first — CAP is an independent spec; Cordum is the reference implementation
- Workers are external — Cordum is the control plane; your agents run wherever you want
| Feature | Description |
|---|---|
| Safety Policies | Define rules for what agents can/can't do. Enforce before execution. |
| Output Safety | Evaluate completed job outputs and allow, redact, or quarantine unsafe results. |
| Human Approval | Flag sensitive jobs for manual review before they run. |
| Workflows | Multi-step DAGs with fan-out, retries, delays, and conditions. |
| Pool Routing | Route jobs by capability, region, or custom tags. |
| Heartbeats | Know which agents are alive, their capacity, and load. |
| Audit Trail | Every job, decision, and result logged and queryable. |
| Dashboard | Real-time UI for workflows, jobs, approvals, and policies. |
| Multi-tenant | API keys with RBAC for teams and environments. |
cordum/
├── cmd/ # Service entrypoints + CLI
│ ├── cordum-api-gateway/ # API gateway (HTTP/WS + gRPC)
│ ├── cordum-scheduler/ # Scheduler + safety gating
│ ├── cordum-safety-kernel/ # Policy evaluation
│ ├── cordum-workflow-engine/ # Workflow orchestration
│ ├── cordum-context-engine/ # Optional context/memory service
│ └── cordumctl/ # CLI
├── core/ # Core libraries
│ ├── controlplane/ # Gateway, scheduler, safety kernel
│ ├── context/ # Context engine implementation
│ ├── infra/ # Config, storage, bus, metrics
│ ├── protocol/ # API protos + CAP aliases
│ └── workflow/ # Workflow engine
├── dashboard/ # React UI
├── sdk/ # SDK + worker runtime
├── cordum-helm/ # Helm chart
├── deploy/k8s/ # Kubernetes manifests
└── docs/ # Documentation
| Doc | Description |
|---|---|
| System Overview | Architecture and data flow |
| Core Reference | Deep technical details |
| Docker Guide | Running with Compose |
| Agent Protocol | CAP bus + pointer semantics |
| MCP Server | MCP stdio + HTTP/SSE integration |
| Pack Format | How to package agent capabilities |
| Local E2E | Full local walkthrough |
Cordum implements CAP (Cordum Agent Protocol) — an open protocol for distributed AI agent orchestration.
CAP vs MCP:
- MCP = tool-calling protocol for a single model
- CAP = job protocol for distributed agent clusters
They're complementary. Use CAP for orchestration, MCP inside your agents for tools.
Read more: MCP vs CAP: Why Your AI Agents Need Both Protocols
Cordum includes an MCP server framework with:
-
Standalone stdio mode via
cmd/cordum-mcp(for Claude Desktop/Code local integration) -
Gateway HTTP/SSE mode via
/mcp/messageand/mcp/sse(whenmcp.enabled=true)
See docs/mcp-server.md for setup, auth headers, and client configuration examples.
The Go SDK makes it easy to build CAP-compatible workers:
import (
"log"
"github.com/cordum/cordum/sdk/runtime"
)
type Input struct {
Prompt string `json:"prompt"`
}
type Output struct {
Summary string `json:"summary"`
}
func main() {
agent := &runtime.Agent{Retries: 2}
runtime.Register(agent, "job.summarize", func(ctx runtime.Context, input Input) (Output, error) {
// Your agent logic here
return Output{Summary: input.Prompt}, nil
})
if err := agent.Start(); err != nil {
log.Fatal(err)
}
select {}
}SDKs: Go (stable) | Python | Node
- Discord: Join the conversation
- GitHub Discussions: Ask questions
- Twitter/X: @coraboratedai
- Email: [email protected]
Cordum Enterprise adds:
- SSO/SAML integration
- Advanced RBAC
- SIEM export
- Priority support
Contact us for pricing.
Cordum follows a transparent governance model with a protocol stability pledge, maintainer structure, and clear decision-making process. See GOVERNANCE.md for details including:
- Protocol Stability: CAP v2 wire format frozen until February 2027
- Security: SECURITY.md for vulnerability reporting
- Versioning: Semantic versioning with deprecation policy
See ROADMAP.md for the full feature roadmap, completed milestones, and planned work.
See CHANGELOG.md for a detailed log of all changes by version.
We welcome contributions! See CONTRIBUTING.md for guidelines.
Licensed under Business Source License 1.1 (BUSL-1.1).
Free for self-hosted and internal use. Not permitted for competing hosted/managed offerings. See LICENSE for details and Change Date.
Ready to govern your AI agents?
cordum.io · CAP Protocol · Discord
⭐ Star this repo if Cordum helps you deploy agents safely
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for cordum
Similar Open Source Tools
cordum
Cordum is a control plane for AI agents designed to close the Trust Gap by providing safety, observability, and control features. It allows teams to deploy autonomous agents with built-in governance mechanisms, including safety policies, workflow orchestration, job routing, observability, and human-in-the-loop approvals. The tool aims to address the challenges of deploying AI agents in production by offering visibility, safety rails, audit trails, and approval mechanisms for sensitive operations.
incidentfox
IncidentFox is an open-source AI SRE tool designed to assist in incident response by automatically investigating incidents, finding root causes, and suggesting fixes. It integrates with observability stack, infrastructure, and collaboration tools, forming hypotheses, collecting data, and reasoning through to find root causes. The tool is built for production on-call scenarios, handling log sampling, alert correlation, anomaly detection, and dependency mapping. IncidentFox is highly customizable, Slack-first, and works on various platforms like web UI, GitHub, PagerDuty, and API. It aims to reduce incident resolution time, alert noise, and improve knowledge retention for engineering teams.
lihil
Lihil is a performant, productive, and professional web framework designed to make Python the mainstream programming language for web development. It is 100% test covered and strictly typed, offering fast performance, ergonomic API, and built-in solutions for common problems. Lihil is suitable for enterprise web development, delivering robust and scalable solutions with best practices in microservice architecture and related patterns. It features dependency injection, OpenAPI docs generation, error response generation, data validation, message system, testability, and strong support for AI features. Lihil is ASGI compatible and uses starlette as its ASGI toolkit, ensuring compatibility with starlette classes and middlewares. The framework follows semantic versioning and has a roadmap for future enhancements and features.
mimiclaw
MimiClaw is a pocket AI assistant that runs on a $5 chip, specifically designed for the ESP32-S3 board. It operates without Linux or Node.js, using pure C language. Users can interact with MimiClaw through Telegram, enabling it to handle various tasks and learn from local memory. The tool is energy-efficient, running on USB power 24/7. With MimiClaw, users can have a personal AI assistant on a chip the size of a thumb, making it convenient and accessible for everyday use.
httpjail
httpjail is a cross-platform tool designed for monitoring and restricting HTTP/HTTPS requests from processes using network isolation and transparent proxy interception. It provides process-level network isolation, HTTP/HTTPS interception with TLS certificate injection, script-based and JavaScript evaluation for custom request logic, request logging, default deny behavior, and zero-configuration setup. The tool operates on Linux and macOS, creating an isolated network environment for target processes and intercepting all HTTP/HTTPS traffic through a transparent proxy enforcing user-defined rules.
VT.ai
VT.ai is a multimodal AI platform that offers dynamic conversation routing with SemanticRouter, multi-modal interactions (text/image/audio), an assistant framework with code interpretation, real-time response streaming, cross-provider model switching, and local model support with Ollama integration. It supports various AI providers such as OpenAI, Anthropic, Google Gemini, Groq, Cohere, and OpenRouter, providing a wide range of core capabilities for AI orchestration.
mcp-ts-template
The MCP TypeScript Server Template is a production-grade framework for building powerful and scalable Model Context Protocol servers with TypeScript. It features built-in observability, declarative tooling, robust error handling, and a modular, DI-driven architecture. The template is designed to be AI-agent-friendly, providing detailed rules and guidance for developers to adhere to best practices. It enforces architectural principles like 'Logic Throws, Handler Catches' pattern, full-stack observability, declarative components, and dependency injection for decoupling. The project structure includes directories for configuration, container setup, server resources, services, storage, utilities, tests, and more. Configuration is done via environment variables, and key scripts are available for development, testing, and publishing to the MCP Registry.
aichildedu
AICHILDEDU is a microservice-based AI education platform for children that integrates LLMs, image generation, and speech synthesis to provide personalized storybook creation, intelligent conversational learning, and multimedia content generation. It offers features like personalized story generation, educational quiz creation, multimedia integration, age-appropriate content, multi-language support, user management, parental controls, and asynchronous processing. The platform follows a microservice architecture with components like API Gateway, User Service, Content Service, Learning Service, and AI Services. Technologies used include Python, FastAPI, PostgreSQL, MongoDB, Redis, LangChain, OpenAI GPT models, TensorFlow, PyTorch, Transformers, MinIO, Elasticsearch, Docker, Docker Compose, and JWT-based authentication.
DeepTutor
DeepTutor is an AI-powered personalized learning assistant that offers a suite of modules for massive document knowledge Q&A, interactive learning visualization, knowledge reinforcement with practice exercise generation, deep research, and idea generation. The tool supports multi-agent collaboration, dynamic topic queues, and structured outputs for various tasks. It provides a unified system entry for activity tracking, knowledge base management, and system status monitoring. DeepTutor is designed to streamline learning and research processes by leveraging AI technologies and interactive features.
Shannon
Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.
nono
nono is a secure, kernel-enforced capability shell for running AI agents and any POSIX style process. It leverages OS security primitives to create an environment where unauthorized operations are structurally impossible. It provides protections against destructive commands and securely stores API keys, tokens, and secrets. The tool is agent-agnostic, works with any AI agent or process, and blocks dangerous commands by default. It follows a capability-based security model with defense-in-depth, ensuring secure execution of commands and protecting sensitive data.
claudex
Claudex is an open-source, self-hosted Claude Code UI that runs entirely on your machine. It provides multiple sandboxes, allows users to use their own plans, offers a full IDE experience with VS Code in the browser, and is extensible with skills, agents, slash commands, and MCP servers. Users can run AI agents in isolated environments, view and interact with a browser via VNC, switch between multiple AI providers, automate tasks with Celery workers, and enjoy various chat features and preview capabilities. Claudex also supports marketplace plugins, secrets management, integrations like Gmail, and custom instructions. The tool is configured through providers and supports various providers like Anthropic, OpenAI, OpenRouter, and Custom. It has a tech stack consisting of React, FastAPI, Python, PostgreSQL, Celery, Redis, and more.
multi-agent-shogun
multi-agent-shogun is a system that runs multiple AI coding CLI instances simultaneously, orchestrating them like a feudal Japanese army. It supports Claude Code, OpenAI Codex, GitHub Copilot, and Kimi Code. The system allows you to command your AI army with zero coordination cost, enabling parallel execution, non-blocking workflow, cross-session memory, event-driven communication, and full transparency. It also features skills discovery, phone notifications, pane border task display, shout mode, and multi-CLI support.
distill
Distill is a reliability layer for LLM context that provides deterministic deduplication to remove redundancy before reaching the model. It aims to reduce redundant data, lower costs, provide faster responses, and offer more efficient and deterministic results. The tool works by deduplicating, compressing, summarizing, and caching context to ensure reliable outputs. It offers various installation methods, including binary download, Go install, Docker usage, and building from source. Distill can be used for tasks like deduplicating chunks, connecting to vector databases, integrating with AI assistants, analyzing files for duplicates, syncing vectors to Pinecone, querying from the command line, and managing configuration files. The tool supports self-hosting via Docker, Docker Compose, building from source, Fly.io deployment, Render deployment, and Railway integration. Distill also provides monitoring capabilities with Prometheus-compatible metrics, Grafana dashboard, and OpenTelemetry tracing.
mcp-prompts
mcp-prompts is a Python library that provides a collection of prompts for generating creative writing ideas. It includes a variety of prompts such as story starters, character development, plot twists, and more. The library is designed to inspire writers and help them overcome writer's block by offering unique and engaging prompts to spark creativity. With mcp-prompts, users can access a wide range of writing prompts to kickstart their imagination and enhance their storytelling skills.
mesh
MCP Mesh is an open-source control plane for MCP traffic that provides a unified layer for authentication, routing, and observability. It replaces multiple integrations with a single production endpoint, simplifying configuration management. Built for multi-tenant organizations, it offers workspace/project scoping for policies, credentials, and logs. With core capabilities like MeshContext, AccessControl, and OpenTelemetry, it ensures fine-grained RBAC, full tracing, and metrics for tools and workflows. Users can define tools with input/output validation, access control checks, audit logging, and OpenTelemetry traces. The project structure includes apps for full-stack MCP Mesh, encryption, observability, and more, with deployment options ranging from Docker to Kubernetes. The tech stack includes Bun/Node runtime, TypeScript, Hono API, React, Kysely ORM, and Better Auth for OAuth and API keys.
For similar tasks
cordum
Cordum is a control plane for AI agents designed to close the Trust Gap by providing safety, observability, and control features. It allows teams to deploy autonomous agents with built-in governance mechanisms, including safety policies, workflow orchestration, job routing, observability, and human-in-the-loop approvals. The tool aims to address the challenges of deploying AI agents in production by offering visibility, safety rails, audit trails, and approval mechanisms for sensitive operations.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.
