
Shannon
An open-source, enterprise-ready AI agent platform built with Rust for performance, Go for orchestration, Python for LLMs, and Solana for Web3 trust.
Stars: 106

Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.
README:
┌──────────────────────────────────────────────────────────────────────────────┐
│ │
│ We're building a unified dashboard and a centralized documentation hub, │
│ with all features already implemented in the code. We're working hard │
│ to polish them, and we'd love your support! │
│ │
│ Please ⭐ star this repo to show your interest and stay updated as we │
│ refine these tools. Thanks for your patience and encouragement! │
│ │
└────────────────────────────────────────────────────────────────────────────-─┘
Stop burning money on AI tokens. Ship reliable agents that won't break in production.
Shannon is battle-tested infrastructure for AI agents that solves the problems you'll hit at scale: runaway costs, non-deterministic failures, and security nightmares. Built on Temporal workflows and WASI sandboxing, it's the platform we wished existed when our LLM bills hit $50k/month.
- "Our AI costs are out of control" → 70% token reduction via intelligent caching
- "We can't debug production issues" → Deterministic replay of any workflow
- "Agents keep breaking randomly" → Time-travel debugging with full state history
- "We're worried about prompt injection" → WASI sandbox + OPA policies for bulletproof security
- "Different teams need different models" → Hot-swap between 15+ LLM providers
- "We need audit trails for compliance" → Every decision logged and traceable
- Zero Configuration Multi-Agent - Just describe what you want: "Analyze data, then create report" → Shannon handles dependencies automatically
-
Multiple AI Patterns - ReAct, Tree-of-Thoughts, Chain-of-Thought, Debate, and Reflection (configurable via
cognitive_strategy
) - Time-Travel Debugging - Export and replay any workflow to reproduce exact agent behavior
- Hot Configuration - Change models, prompts, and policies without restarts
- WASI Sandbox - Full Python 3.11 support with bulletproof security (→ Guide)
- Token Budget Control - Hard limits per user/session with real-time tracking
- Policy Engine (OPA) - Define who can use which tools, models, and data
- Multi-Tenancy - Complete isolation between users, sessions, and organizations
- 70% Cost Reduction - Smart caching, session management, and token optimization
- Provider Agnostic - OpenAI, Anthropic, Google, Azure, Bedrock, DeepSeek, Groq, and more
- Observable by Default - Prometheus metrics, Grafana dashboards, OpenTelemetry tracing
- Distributed by Design - Horizontal scaling with Temporal workflow orchestration
Model pricing is centralized in config/models.yaml
- all services load from this single source for consistent cost tracking.
Challenge | Shannon | LangGraph | AutoGen | CrewAI |
---|---|---|---|---|
Multi-Agent Orchestration | ✅ DAG/Graph workflows | ✅ Stateful graphs | ✅ Group chat | ✅ Crew/roles |
Agent Communication | ✅ Message passing | ✅ Tool calling | ✅ Conversations | ✅ Delegation |
Memory & Context | ✅ Long/short-term, vector | ✅ Multiple types | ✅ Conversation history | ✅ Shared memory |
Debugging Production Issues | ✅ Replay any workflow | ❌ Good luck | ❌ Printf debugging | ❌ |
Token Cost Control | ✅ Hard budget limits | ❌ | ❌ | ❌ |
Security Sandbox | ✅ WASI isolation | ❌ | ❌ | ❌ |
Policy Control (OPA) | ✅ Fine-grained rules | ❌ | ❌ | ❌ |
Deterministic Replay | ✅ Time-travel debugging | ❌ | ❌ | ❌ |
Session Persistence | ✅ Redis-backed, durable | ❌ | ||
Multi-Language | ✅ Go/Rust/Python | |||
Production Metrics | ✅ Prometheus/Grafana | ❌ | ❌ |
- Docker and Docker Compose
- Make, curl, grpcurl
- An API key for at least one supported LLM provider
Docker Setup Instructions (click to expand)
macOS:
# Install Docker Desktop from https://www.docker.com/products/docker-desktop/
# Or using Homebrew:
brew install --cask docker
Linux (Ubuntu/Debian):
# Install Docker Engine
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Log out and back in for group changes to take effect
# Install Docker Compose
sudo apt-get update
sudo apt-get install docker-compose-plugin
docker --version
docker compose version
The make dev
command starts all services:
- PostgreSQL: Database on port 5432
- Redis: Cache on port 6379
- Qdrant: Vector store on port 6333
- Temporal: Workflow engine on port 7233 (UI on 8088)
- Orchestrator: Go service on port 50052
- Agent Core: Rust service on port 50051
- LLM Service: Python service on port 8000
git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon
make setup-env
# Download Python WASI interpreter for secure code execution (20MB)
./scripts/setup_python_wasi.sh
Add at least one LLM API key to .env
(for example):
echo "OPENAI_API_KEY=your-key-here" >> .env
Start the stack and run a smoke check:
make dev
make smoke
Shannon provides a simple REST API for easy integration and real-time streaming to monitor agent actions:
# For development (no auth required)
export GATEWAY_SKIP_AUTH=1
# Submit a task
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze the sentiment of: Shannon makes AI agents simple!",
"session_id": "demo-session-123"
}'
# Response includes workflow_id for tracking
# {"workflow_id":"task-dev-1234567890","status":"running"}
# Stream live events as your agent works (replace with your workflow_id)
curl -N http://localhost:8081/stream/sse?workflow_id=task-dev-1234567890
# You'll see human-readable events like:
# event: AGENT_THINKING
# data: {"message":"Analyzing sentiment: Shannon makes AI agents simple!"}
#
# event: TOOL_INVOKED
# data: {"message":"Processing natural language sentiment analysis"}
#
# event: AGENT_COMPLETED
# data: {"message":"Task completed successfully"}
# Check final status and result
curl http://localhost:8080/api/v1/tasks/task-dev-1234567890
# Response includes status, result, tokens used, and metadata
For production, use API keys instead of GATEWAY_SKIP_AUTH:
# Create an API key (one-time setup)
make seed-api-key # Creates test key: sk_test_123456
# Use in requests
curl -X POST http://localhost:8080/api/v1/tasks \
-H "X-API-Key: sk_test_123456" \
-H "Content-Type: application/json" \
-d '{"query":"Your task here"}'
Advanced Methods: Scripts, gRPC, and Command Line (click to expand)
# Submit a simple task
./scripts/submit_task.sh "Analyze the sentiment of: 'Shannon makes AI agents simple!'"
# Check session usage and token tracking (session ID is in SubmitTask response message)
grpcurl -plaintext \
-d '{"sessionId":"YOUR_SESSION_ID"}' \
localhost:50052 shannon.orchestrator.OrchestratorService/GetSessionContext
# Export and replay a workflow history (use the workflow ID from submit_task output)
./scripts/replay_workflow.sh <WORKFLOW_ID>
# Submit via gRPC
grpcurl -plaintext \
-d '{"query":"Analyze sentiment","sessionId":"test-session"}' \
localhost:50052 shannon.orchestrator.OrchestratorService/SubmitTask
# Stream events via gRPC
grpcurl -plaintext \
-d '{"workflowId":"task-dev-1234567890"}' \
localhost:50052 shannon.orchestrator.OrchestratorService/StreamEvents
# Connect to WebSocket for bidirectional streaming
wscat -c ws://localhost:8081/api/v1/stream/ws?workflow_id=task-dev-1234567890
# Access Temporal Web UI for visual workflow inspection
open http://localhost:8088
# Or navigate manually to see:
# - Workflow execution history and timeline
# - Task status, retries, and failures
# - Input/output data for each step
# - Real-time workflow progress
# - Search workflows by ID, type, or status
The Temporal UI provides a powerful visual interface to:
- Debug workflows - See exactly where and why workflows fail
- Monitor performance - Track execution times and bottlenecks
- Inspect state - View all workflow inputs, outputs, and intermediate data
- Search & filter - Find workflows by various criteria
- Replay workflows - Visual replay of historical executions
The REST API supports:
-
Idempotency: Use
Idempotency-Key
header for safe retries - Rate Limiting: Per-API-key limits to prevent abuse
-
Resume on Reconnect: SSE streams can resume from last event using
Last-Event-ID
-
WebSocket: Available at
/api/v1/stream/ws
for bidirectional streaming
Metric | Before | After | Impact |
---|---|---|---|
Monthly LLM Costs | $50,000 | $15,000 | -70% |
Debug Time (P1 issues) | 4-6 hours | 15 minutes | -95% |
Agent Success Rate | 72% | 94% | +22% |
Mean Time to Recovery | 45 min | 3 min | -93% |
Security Incidents | 3/month | 0 | -100% |
# Set hard budget limits - agent stops before breaking the bank
{
"query": "Help me troubleshoot my deployment issue",
"session_id": "user-123-session",
"budget": {
"max_tokens": 10000, # Hard stop at 10k tokens
"alert_at": 8000, # Alert at 80% usage
"rate_limit": "100/hour" # Max 100 requests per hour
},
"policy": "customer_support.rego" # OPA policy for allowed actions
}
# Result: 70% cost reduction, zero runaway bills
# Production agent failed at 3am? No problem.
# Export and replay the workflow in one command
./scripts/replay_workflow.sh task-prod-failure-123
# Or specify a particular run ID
./scripts/replay_workflow.sh task-prod-failure-123 abc-def-ghi
# Output shows step-by-step execution with token counts, decisions, and state changes
# Fix the issue, add a test case, never see it again
# teams/data-science/policy.rego
allow_model("gpt-4o") if team == "data-science"
allow_model("claude-4") if team == "data-science"
max_tokens(50000) if team == "data-science"
# teams/customer-support/policy.rego
allow_model("gpt-4o-mini") if team == "support"
max_tokens(5000) if team == "support"
deny_tool("database_write") if team == "support"
# Python code runs in isolated WASI sandbox with full standard library
./scripts/submit_task.sh "Execute Python: print('Hello from secure WASI!')"
# Even malicious code is safe
./scripts/submit_task.sh "Execute Python: import os; os.system('rm -rf /')"
# Result: OSError - system calls blocked by WASI sandbox
# Advanced: Session persistence for data analysis
./scripts/submit_task.sh "Execute Python with session 'analysis': data = [1,2,3,4,5]"
./scripts/submit_task.sh "Execute Python with session 'analysis': print(sum(data))"
# Output: 15
More Production Examples (click to expand)
- Incident Response Bot: Auto-triages alerts with budget limits
- Code Review Agent: Enforces security policies via OPA rules
- Data Pipeline Monitor: Replays failed workflows for debugging
- Compliance Auditor: Full trace of every decision and data access
- Multi-Tenant SaaS: Complete isolation between customer agents
See docs/production-examples/
for battle-tested implementations.
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │────▶│ Orchestrator │────▶│ Agent Core │
└─────────────┘ │ (Go) │ │ (Rust) │
└──────────────┘ └─────────────┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Temporal │ │ WASI Tools │
│ Workflows │ │ Sandbox │
└──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ LLM Service │
│ (Python) │
└──────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
├─────────────┬─────────────┬─────────────┬───────────────────────┤
│ HTTP │ gRPC │ SSE │ WebSocket (soon) │
└─────────────┴─────────────┴─────────────┴───────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (Go) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ Router │──│ Budget │──│ Session │──│ OPA │ │
│ │ │ │ Manager │ │ Store │ │ Policies │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Temporal │ │ Redis │ │ PostgreSQL │ │ Qdrant │
│ Workflows │ │ Cache │ │ State │ │ Vectors │
│ │ │ Sessions │ │ History │ │ Memory │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AGENT CORE (Rust) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌──────────┐ │
│ │ WASI │──│ Policy │──│ Tool │──│ Agent │ │
│ │ Sandbox │ │ Enforcer │ │ Registry │ │ Comms │ │
│ └────────────┘ └────────────┘ └────────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌────────────────────────────────┐ ┌─────────────────────────────────┐
│ LLM SERVICE (Python) │ │ OBSERVABILITY LAYER │
│ ┌────────────┐ ┌────────────┐ │ │ ┌────────────┐ ┌────────────┐ │
│ │ Provider │ │ MCP │ │ │ │ Prometheus │ │ OpenTel │ │
│ │ Adapter │ │ Tools │ │ │ │ Metrics │ │ Traces │ │
│ └────────────┘ └────────────┘ │ │ └────────────┘ └────────────┘ │
└────────────────────────────────┘ └─────────────────────────────────┘
- Orchestrator (Go): Task routing, budget enforcement, session management, OPA policy evaluation
- Agent Core (Rust): WASI sandbox execution, policy enforcement, agent-to-agent communication
- LLM Service (Python): Provider abstraction (15+ LLMs), MCP tools, prompt optimization
- Data Layer: PostgreSQL (workflow state), Redis (session cache), Qdrant (vector memory)
- Observability: Prometheus metrics, OpenTelemetry tracing, Grafana dashboards
# Clone and configure
git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon
make setup-env
echo "OPENAI_API_KEY=sk-..." >> .env
# Start with budget limits
echo "DEFAULT_MAX_TOKENS=5000" >> .env
echo "DEFAULT_RATE_LIMIT=100/hour" >> .env
# Launch
make dev
# Create your first OPA policy
cat > config/policies/default.rego << EOF
package shannon
default allow = false
# Allow all for dev, restrict in prod
allow {
input.environment == "development"
}
# Production rules
allow {
input.environment == "production"
input.tokens_requested < 10000
input.model in ["gpt-4o-mini", "claude-4-haiku"]
}
EOF
# Hot reload - no restart needed!
# Something went wrong in production?
# 1. Find the workflow ID from logs
grep ERROR logs/orchestrator.log | tail -1
# 2. Export the workflow
./scripts/replay_workflow.sh export task-xxx-failed debug.json
# 3. Replay locally to see exactly what happened
./scripts/replay_workflow.sh replay debug.json
# 4. Fix, test, deploy with confidence
# config/teams.yaml
teams:
data-science:
models: ["gpt-4o", "claude-4-sonnet"]
max_tokens_per_day: 1000000
tools: ["*"]
customer-support:
models: ["gpt-4o-mini"]
max_tokens_per_day: 50000
tools: ["search", "respond", "escalate"]
engineering:
models: ["claude-4-sonnet", "gpt-4o"]
max_tokens_per_day: 500000
tools: ["code_*", "test_*", "deploy_*"]
- Environment Configuration
- Testing Guide
- TODO: Publish an open-source quickstart walkthrough
- Authentication & Multitenancy
- MCP Integration
- Web Search Configuration
- TODO: Add docs for budget controls & policy engine
- Platform Architecture Overview
- Multi-Agent Workflow Architecture
- Agent Core Architecture
- Pattern Selection Guide
-
How vectors are generated: The Go orchestrator calls the Python LLM Service at
/embeddings/
, which by default uses OpenAI (modeltext-embedding-3-small
). -
Graceful degradation: If no embedding provider is configured (e.g., no
OPENAI_API_KEY
) or the endpoint is unavailable, workflows still run. Vector features degrade gracefully:- No vectors are stored (vector upserts are skipped)
- Session memory retrieval returns an empty list
- Similar-query enrichment is skipped
-
Enable vectors: Set
OPENAI_API_KEY
in.env
, keepvector.enabled: true
inconfig/shannon.yaml
, and run Qdrant (port 6333) -
Disable vectors: Set
vector.enabled: false
inconfig/shannon.yaml
(or setdegradation.fallback_behaviors.vector_search: skip
)
# Run linters and formatters
make lint
make fmt
# Run smoke tests
make smoke
# View logs
make logs
# Check service status
make ps
# Run integration tests
make integration-tests
# Run specific integration test
make integration-single
# Test session management
make integration-session
# Run coverage reports
make coverage
We love contributions! Please see our Contributing Guide for details.
- Discord: Join our Discord
- Twitter/X: @ShannonAI
Q: How is this different from just using LangGraph? A: LangGraph is a library for building stateful agents. Shannon is production infrastructure. We handle the hard parts: deterministic replay for debugging, token budget enforcement, security sandboxing, and multi-tenancy. You can even use LangGraph within Shannon if you want.
Q: Can I migrate from my existing LangGraph/AutoGen setup? A: Yes. Most migrations take 1-2 days. We provide adapters and migration guides. Your agents get instant upgrades: 70% cost reduction, replay debugging, and production monitoring.
Q: What's the overhead? A: ~50ms latency, 100MB memory per agent. The tradeoff: your agents don't randomly fail at 3am, and you can actually debug when they do.
Q: Is it really enterprise-ready? A: We run 1M+ agent executions/day in production. Temporal (our workflow engine) powers Uber, Netflix, and Stripe. WASI (our sandbox) is a W3C standard. This isn't a weekend project.
Q: What about vendor lock-in? A: Zero lock-in. Standard protocols (gRPC, HTTP, SSE). Export your workflows anytime. Swap LLM providers with one line. MIT licensed forever.
- 1M+ workflows/day across 50+ organizations
- 99.95% uptime (excluding LLM provider outages)
- $2M+ saved in token costs across users
- Zero security incidents with WASI sandboxing
Now → v0.1 (Production Ready)
- ✅ Core platform stable - Go orchestrator, Rust agent-core, Python LLM service
- ✅ Deterministic replay debugging - Export and replay any workflow execution
- ✅ OPA policy enforcement - Fine-grained security and governance rules
- ✅ WebSocket streaming - Real-time agent communication with event filtering and replay
- ✅ SSE streaming - Server-sent events for browser-native streaming
- ✅ MCP integration - Model Context Protocol for standardized tool interfaces
- ✅ WASI sandbox - Secure code execution environment with resource limits
- ✅ Multi-agent orchestration - DAG workflows with parallel execution
- ✅ Vector memory - Qdrant-based semantic search and context retrieval
- ✅ Circuit breaker patterns - Automatic failure recovery and degradation
- ✅ Multi-provider LLM support - OpenAI, Anthropic, Google, DeepSeek, and more
- ✅ Token budget management - Hard limits with real-time tracking
- ✅ Session management - Durable state with Redis/PostgreSQL persistence
- 🚧 LangGraph adapter - Bridge to LangChain ecosystem (integration framework complete)
- 🚧 AutoGen adapter - Bridge to Microsoft AutoGen multi-agent conversations
v0.2
- [ ] Enterprise SSO - SAML/OAuth integration with existing identity providers
- [ ] Natural language policies - Human-readable policy definitions with AI assistance
- [ ] Enhanced monitoring - Custom dashboards and alerting rules
- [ ] Advanced caching - Multi-level caching with semantic deduplication
- [ ] Real-time collaboration - Multi-user agent sessions with shared context
- [ ] Plugin ecosystem - Third-party tool and integration marketplace
- [ ] Workflow marketplace - Community-contributed agent templates and patterns
- [ ] Edge deployment - WASM execution in browser environments
v0.3
- [ ] Autonomous agent swarms - Self-organizing multi-agent systems
- [ ] Cross-organization federation - Secure agent communication across tenants
- [ ] Predictive scaling - ML-based resource allocation and optimization
- [ ] Blockchain integration - Proof-of-execution and decentralized governance
- [ ] Advanced personalization - User-specific LoRA adapters and preferences
v0.4
- [ ] Continuous learning - Automated prompt and strategy optimization
- [ ] Multi-agent marketplaces - Economic incentives and reputation systems
- [ ] Advanced reasoning - Hybrid symbolic + neural approaches
- [ ] Global deployment - Multi-region, multi-cloud architecture
- [ ] Regulatory compliance - SOC 2, GDPR, HIPAA automation
- [ ] AI safety frameworks - Constitutional AI and alignment mechanisms
- Python Code Execution - Secure Python execution via WASI sandbox
- Multi-Agent Workflows - Orchestration patterns and best practices
- Pattern Usage Guide - ReAct, Tree-of-Thoughts, Debate patterns
- Streaming APIs - Real-time agent output streaming
- Policy Engine - Team-based access control with OPA
- Agent Core API - Rust service endpoints
- Orchestrator API - Workflow management
- LLM Service API - Provider abstraction
# You're 3 commands away from production-ready AI agents
git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon && make setup-env && make dev
# Join 1,000+ developers shipping reliable AI
- 🐛 Found a bug? Open an issue
- 💡 Have an idea? Start a discussion
- 💬 Need help? Join our Discord
- ⭐ Like the project? Give us a star!
We're building decentralized trust infrastructure with Solana blockchain:
- Cryptographic Verification: On-chain attestation of AI agent actions and results
- Immutable Audit Trail: Blockchain-based proof of task execution
- Smart Contract Interoperability: Enable AI agents to interact with DeFi and Web3 protocols
- Token-Gated Capabilities: Control agent permissions through blockchain tokens
- Decentralized Reputation: Build trust through verifiable on-chain agent performance
Stay tuned for our Web3 trust layer - bringing transparency and verifiability to AI systems!
MIT License - Use it anywhere, modify anything, zero restrictions. See LICENSE.
Stop debugging AI failures. Start shipping reliable agents.
Discord •
GitHub
If Shannon saves you time or money, let us know! We love success stories.
Twitter/X: @ShannonAgents
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Shannon
Similar Open Source Tools

Shannon
Shannon is a battle-tested infrastructure for AI agents that solves problems at scale, such as runaway costs, non-deterministic failures, and security concerns. It offers features like intelligent caching, deterministic replay of workflows, time-travel debugging, WASI sandboxing, and hot-swapping between LLM providers. Shannon allows users to ship faster with zero configuration multi-agent setup, multiple AI patterns, time-travel debugging, and hot configuration changes. It is production-ready with features like WASI sandbox, token budget control, policy engine (OPA), and multi-tenancy. Shannon helps scale without breaking by reducing costs, being provider agnostic, observable by default, and designed for horizontal scaling with Temporal workflow orchestration.

solo-server
Solo Server is a lightweight server designed for managing hardware-aware inference. It provides seamless setup through a simple CLI and HTTP servers, an open model registry for pulling models from platforms like Ollama and Hugging Face, cross-platform compatibility for effortless deployment of AI models on hardware, and a configurable framework that auto-detects hardware components (CPU, GPU, RAM) and sets optimal configurations.

aichildedu
AICHILDEDU is a microservice-based AI education platform for children that integrates LLMs, image generation, and speech synthesis to provide personalized storybook creation, intelligent conversational learning, and multimedia content generation. It offers features like personalized story generation, educational quiz creation, multimedia integration, age-appropriate content, multi-language support, user management, parental controls, and asynchronous processing. The platform follows a microservice architecture with components like API Gateway, User Service, Content Service, Learning Service, and AI Services. Technologies used include Python, FastAPI, PostgreSQL, MongoDB, Redis, LangChain, OpenAI GPT models, TensorFlow, PyTorch, Transformers, MinIO, Elasticsearch, Docker, Docker Compose, and JWT-based authentication.

mcp-memory-service
The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.

hub
Hub is an open-source, high-performance LLM gateway written in Rust. It serves as a smart proxy for LLM applications, centralizing control and tracing of all LLM calls and traces. Built for efficiency, it provides a single API to connect to any LLM provider. The tool is designed to be fast, efficient, and completely open-source under the Apache 2.0 license.

aegra
Aegra is a self-hosted AI agent backend platform that provides LangGraph power without vendor lock-in. Built with FastAPI + PostgreSQL, it offers complete control over agent orchestration for teams looking to escape vendor lock-in, meet data sovereignty requirements, enable custom deployments, and optimize costs. Aegra is Agent Protocol compliant and perfect for teams seeking a free, self-hosted alternative to LangGraph Platform with zero lock-in, full control, and compatibility with existing LangGraph Client SDK.

Archon
Archon is an AI meta-agent designed to autonomously build, refine, and optimize other AI agents. It serves as a practical tool for developers and an educational framework showcasing the evolution of agentic systems. Through iterative development, Archon demonstrates the power of planning, feedback loops, and domain-specific knowledge in creating robust AI agents.

AutoAgents
AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.

shimmy
Shimmy is a 5.1MB single-binary local inference server providing OpenAI-compatible endpoints for GGUF models. It offers fast, reliable AI inference with sub-second responses, zero configuration, and automatic port management. Perfect for developers seeking privacy, cost-effectiveness, speed, and easy integration with popular tools like VSCode and Cursor. Shimmy is designed to be invisible infrastructure that simplifies local AI development and deployment.

DeepMCPAgent
DeepMCPAgent is a model-agnostic tool that enables the creation of LangChain/LangGraph agents powered by MCP tools over HTTP/SSE. It allows for dynamic discovery of tools, connection to remote MCP servers, and integration with any LangChain chat model instance. The tool provides a deep agent loop for enhanced functionality and supports typed tool arguments for validated calls. DeepMCPAgent emphasizes the importance of MCP-first approach, where agents dynamically discover and call tools rather than hardcoding them.

openai-forward
OpenAI-Forward is an efficient forwarding service implemented for large language models. Its core features include user request rate control, token rate limiting, intelligent prediction caching, log management, and API key management, aiming to provide efficient and convenient model forwarding services. Whether proxying local language models or cloud-based language models like LocalAI or OpenAI, OpenAI-Forward makes it easy. Thanks to support from libraries like uvicorn, aiohttp, and asyncio, OpenAI-Forward achieves excellent asynchronous performance.

AgentNeo
AgentNeo is an advanced, open-source Agentic AI Application Observability, Monitoring, and Evaluation Framework designed to provide deep insights into AI agents, Large Language Model (LLM) calls, and tool interactions. It offers robust logging, visualization, and evaluation capabilities to help debug and optimize AI applications with ease. With features like tracing LLM calls, monitoring agents and tools, tracking interactions, detailed metrics collection, flexible data storage, simple instrumentation, interactive dashboard, project management, execution graph visualization, and evaluation tools, AgentNeo empowers users to build efficient, cost-effective, and high-quality AI-driven solutions.

claude-007-agents
Claude Code Agents is an open-source AI agent system designed to enhance development workflows by providing specialized AI agents for orchestration, resilience engineering, and organizational memory. These agents offer specialized expertise across technologies, AI system with organizational memory, and an agent orchestration system. The system includes features such as engineering excellence by design, advanced orchestration system, Task Master integration, live MCP integrations, professional-grade workflows, and organizational intelligence. It is suitable for solo developers, small teams, enterprise teams, and open-source projects. The system requires a one-time bootstrap setup for each project to analyze the tech stack, select optimal agents, create configuration files, set up Task Master integration, and validate system readiness.

evi-run
evi-run is a powerful, production-ready multi-agent AI system built on Python using the OpenAI Agents SDK. It offers instant deployment, ultimate flexibility, built-in analytics, Telegram integration, and scalable architecture. The system features memory management, knowledge integration, task scheduling, multi-agent orchestration, custom agent creation, deep research, web intelligence, document processing, image generation, DEX analytics, and Solana token swap. It supports flexible usage modes like private, free, and pay mode, with upcoming features including NSFW mode, task scheduler, and automatic limit orders. The technology stack includes Python 3.11, OpenAI Agents SDK, Telegram Bot API, PostgreSQL, Redis, and Docker & Docker Compose for deployment.

Rankify
Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. It integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. It offers comprehensive documentation, open-source implementation, and pre-built evaluation tools, making it a powerful resource for researchers and practitioners in the field.

finite-monkey-engine
FiniteMonkey is an advanced vulnerability mining engine powered purely by GPT, requiring no prior knowledge base or fine-tuning. Its effectiveness significantly surpasses most current related research approaches. The tool is task-driven, prompt-driven, and focuses on prompt design, leveraging 'deception' and hallucination as key mechanics. It has helped identify vulnerabilities worth over $60,000 in bounties. The tool requires PostgreSQL database, OpenAI API access, and Python environment for setup. It supports various languages like Solidity, Rust, Python, Move, Cairo, Tact, Func, Java, and Fake Solidity for scanning. FiniteMonkey is best suited for logic vulnerability mining in real projects, not recommended for academic vulnerability testing. GPT-4-turbo is recommended for optimal results with an average scan time of 2-3 hours for medium projects. The tool provides detailed scanning results guide and implementation tips for users.
For similar tasks

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.