
nexus
The AI router
Stars: 339

Nexus is a tool that acts as a unified gateway for multiple LLM providers and MCP servers. It allows users to aggregate, govern, and control their AI stack by connecting multiple servers and providers through a single endpoint. Nexus provides features like MCP Server Aggregation, LLM Provider Routing, Context-Aware Tool Search, Protocol Support, Flexible Configuration, Security features, Rate Limiting, and Docker readiness. It supports tool calling, tool discovery, and error handling for STDIO servers. Nexus also integrates with AI assistants, Cursor, Claude Code, and LangChain for seamless usage.
README:
Plug in all your MCP servers, APIs, and LLM providers. Route everything through a unified endpoint.
Aggregate, govern, and control your AI stack.
- MCP Server Aggregation: Connect multiple MCP servers (STDIO, SSE, HTTP) through a single endpoint
- LLM Provider Routing: Unified interface for OpenAI, Anthropic, Google, and AWS Bedrock LLM providers with full tool calling support
- Context-Aware Tool Search: Intelligent fuzzy search across all connected tools using natural language queries
- Protocol Support: Supports STDIO (subprocess), SSE (Server-Sent Events), and streamable HTTP MCP servers
- Flexible Configuration: TOML-based configuration with environment variable substitution
- Security: Built-in CORS, CSRF protection, OAuth2, and TLS support
- Rate Limiting: Multi-level rate limiting with in-memory or Redis backends
- Docker Ready: Available as a container image with minimal configuration needed
curl -fsSL https://nexusrouter.com/install | bash
Pull the latest image:
docker pull ghcr.io/grafbase/nexus:latest
Or use the stable version:
docker pull ghcr.io/grafbase/nexus:stable
Or use a specific version:
docker pull ghcr.io/grafbase/nexus:X.Y.Z
git clone https://github.com/grafbase/nexus
cd nexus
cargo build --release
nexus
docker run -p 8000:8000 -v /path/to/config:/etc/nexus.toml ghcr.io/grafbase/nexus:latest
services:
nexus:
image: ghcr.io/grafbase/nexus:latest
ports:
- "8000:8000"
volumes:
- ./nexus.toml:/etc/nexus.toml
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
Create a nexus.toml
file to configure Nexus:
# LLM Provider configuration
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
forward_token = true
# Model configuration (at least one model required per provider)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
# MCP Server configuration
[mcp.servers.github]
url = "https://api.githubcopilot.com/mcp/"
auth.token = "{{ env.GITHUB_TOKEN }}"
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/YOUR_USERNAME/Desktop"]
[mcp.servers.python_server]
cmd = ["python", "-m", "mcp_server"]
env = { PYTHONPATH = "/opt/mcp" }
cwd = "/workspace"
-
server.listen_address
: The address and port Nexus will listen on (default:127.0.0.1:8000
) -
server.health.enabled
: Enable health endpoint (default:true
) -
server.health.path
: Health check endpoint path (default:/health
)
-
llm.enabled
: Enable LLM functionality (default:true
) -
llm.protocols.openai.enabled
: Enable OpenAI protocol endpoint (default:true
) -
llm.protocols.openai.path
: OpenAI endpoint path (default:/llm/openai
) -
llm.protocols.anthropic.enabled
: Enable Anthropic protocol endpoint (default:false
) -
llm.protocols.anthropic.path
: Anthropic endpoint path (default:/llm/anthropic
)
For detailed LLM provider configuration, see the LLM Provider Configuration section below.
-
mcp.enabled
: Enable MCP functionality (default:true
) -
mcp.path
: MCP endpoint path (default:/mcp
) -
mcp.enable_structured_content
: Control MCP search tool response format (default:true
)- When
true
: Uses modernstructuredContent
field for better performance and type safety - When
false
: Uses legacycontent
field withContent::json
objects for compatibility with older MCP clients
- When
-
STDIO Servers: Launch local processes that communicate via standard input/output
[mcp.servers.my_tool] cmd = ["path/to/executable", "--arg1", "--arg2"] # Optional: Set environment variables env = { DEBUG = "1", API_KEY = "{{ env.MY_API_KEY }}" } # Optional: Set working directory cwd = "/path/to/working/directory" # Optional: Configure stderr handling (default: "null") stderr = "inherit" # Show in console # or stderr = { file = "/var/log/mcp/server.log" } # Log to file
Note: STDIO servers must output valid JSON-RPC messages on stdout. The
cmd
array must have at least one element (the executable). -
SSE Servers: Connect to Server-Sent Events endpoints
[mcp.servers.my_sse_server] protocol = "sse" url = "http://example.com/sse" message_url = "http://example.com/messages" # Optional
-
HTTP Servers: Connect to streamable HTTP endpoints
[mcp.servers.my_http_server] protocol = "streamable-http" url = "https://api.example.com/mcp"
For remote MCP servers, if you omit the protocol Nexus will first try streamable HTTP and then SSE.
Add service token authentication to any server:
[mcp.servers.my_server.auth]
token = "your-token-here"
# Or use environment variables
token = "{{ env.MY_API_TOKEN }}"
If you enable OAuth2 authentication to your server, and your downstream servers all use the same authentication server, you can configure Nexus to forward the request access token to the downstream server.
[mcp.servers.my_server.auth]
type = "forward"
Nexus supports inserting static headers when making requests to MCP servers. Headers can be configured globally (for all MCP servers) or per-server.
Note: MCP currently only supports header insertion with static values. Headers from incoming requests are not forwarded.
Configure headers that apply to all MCP servers:
# Global headers for all MCP servers
[[mcp.headers]]
rule = "insert"
name = "X-Application"
value = "nexus-router"
[[mcp.headers]]
rule = "insert"
name = "X-API-Version"
value = "v1"
Configure headers for individual HTTP-based MCP servers:
[mcp.servers.my_server]
url = "https://api.example.com/mcp"
# Insert headers for this specific server
[[mcp.servers.my_server.headers]]
rule = "insert"
name = "X-API-Key"
value = "{{ env.MY_API_KEY }}" # Environment variable substitution
[[mcp.servers.my_server.headers]]
rule = "insert"
name = "X-Service-Name"
value = "my-service"
- Static Values Only: Headers are set at client initialization time with static values
-
Environment Variables: Use
{{ env.VAR_NAME }}
syntax for environment variable substitution - HTTP Servers Only: Headers only apply to HTTP-based MCP servers (not STDIO servers)
-
Insert Rule: Currently only the
insert
rule is supported for MCP
Restrict access to MCP servers and tools based on user groups:
# Server-level access control
[mcp.servers.premium_tools]
cmd = ["premium-server"]
allow = ["premium", "enterprise"] # Only these groups can access
deny = ["suspended"] # Block specific groups
# Tool-level override (more specific than server-level)
[mcp.servers.premium_tools.tools.expensive_feature]
allow = ["enterprise"] # Only enterprise can use this tool
[mcp.servers.premium_tools.tools.deprecated_tool]
allow = [] # Empty allow list blocks all access (no client ID needed)
Access control rules:
- If
allow
is set, only listed groups can access (requires client identification) - If
deny
is set, listed groups are blocked (requires client identification) - Empty
allow = []
blocks all access without requiring client identification - Tool-level rules override server-level rules
- Deny takes priority over allow
Configure OAuth2 authentication to protect your Nexus endpoints:
[server.oauth]
url = "https://your-oauth-provider.com/.well-known/jwks.json"
poll_interval = "5m"
expected_issuer = "https://your-oauth-provider.com"
expected_audience = "your-service-audience"
[server.oauth.protected_resource]
resource = "https://your-nexus-instance.com"
authorization_servers = ["https://your-oauth-provider.com"]
OAuth2 configuration options:
-
url
: JWKs endpoint URL for token validation -
poll_interval
: How often to refresh JWKs (optional, default: no polling) -
expected_issuer
: Expectediss
claim in JWT tokens (optional) -
expected_audience
: Expectedaud
claim in JWT tokens (optional) -
protected_resource.resource
: URL of this protected resource -
protected_resource.authorization_servers
: List of authorization server URLs
When OAuth2 is enabled, all endpoints except /health
and /.well-known/oauth-protected-resource
require valid JWT tokens in the Authorization: Bearer <token>
header.
Nexus supports rate limiting to prevent abuse and ensure fair resource usage:
# Global rate limiting configuration
[server.rate_limits]
enabled = true
# Storage backend configuration
[server.rate_limits.storage]
type = "memory" # or "redis" for distributed rate limiting
# For Redis backend:
# url = "redis://localhost:6379"
# key_prefix = "nexus:rate_limit:"
# Global rate limit (applies to all requests)
[server.rate_limits.global]
limit = 1000
interval = "60s"
# Per-IP rate limit
[server.rate_limits.per_ip]
limit = 100
interval = "60s"
# Per-MCP server rate limits
[mcp.servers.my_server.rate_limits]
limit = 50
interval = "60s"
# Tool-specific rate limits (override server defaults)
[mcp.servers.my_server.rate_limits.tools]
expensive_tool = { limit = 10, interval = "60s" }
cheap_tool = { limit = 100, interval = "60s" }
Rate Limiting Features:
- Multiple levels: Global, per-IP, per-server, and per-tool limits
- Storage backends: In-memory (single instance) or Redis (distributed)
- Flexible intervals: Configure time windows for each limit
- Tool-specific overrides: Set different limits for expensive operations
Redis Backend Configuration:
[server.rate_limits.storage]
type = "redis"
url = "redis://localhost:6379"
key_prefix = "nexus:rate_limit:"
response_timeout = "1s"
connection_timeout = "5s"
# Connection pool settings
[server.rate_limits.storage.pool]
max_size = 16
min_idle = 0
timeout_create = "5s"
timeout_wait = "5s"
timeout_recycle = "300s"
# TLS configuration for Redis (optional)
[server.rate_limits.storage.tls]
enabled = true
ca_cert_path = "/path/to/ca.crt"
client_cert_path = "/path/to/client.crt" # For mutual TLS
client_key_path = "/path/to/client.key"
# insecure = true # WARNING: Only for development/testing, skips certificate validation
Note: When configuring tool-specific rate limits, Nexus will warn if you reference tools that don't exist.
Nexus provides token-based rate limiting for LLM providers to help control costs and prevent abuse. Unlike request-based rate limits, token rate limits count an estimate of actual tokens consumed.
IMPORTANT: LLM rate limiting requires client identification to be enabled:
[server.client_identification]
enabled = true
# Choose identification methods (at least one required)
client_id.jwt_claim = "sub" # Extract ID from JWT 'sub' claim
# or
client_id.http_header = "X-Client-ID" # Extract ID from HTTP header
# Optional: Limit groups per user (at most one allowed)
group_id.jwt_claim = "groups" # JWT claim containing user's group
# or
group_id.http_header = "X-Group-ID" # Extract ID from HTTP header
# You must provide a list of allowed groups
[server.client_identification.validation]
group_values = ["free", "pro", "max"]
Without client identification, rate limits cannot be enforced and requests will fail with a configuration error.
Token rate limits can be configured at four levels, from most to least specific:
- Model per user + group: Specific model for specific each user in a group
- Model per user: Specific model for each user
- Provider per user + group: All models from provider for each user in a group
- Provider per user: All models from provider for each user
The most specific applicable limit is always used.
# Provider-level default rate limit (applies to all models)
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000 # 100K input tokens
interval = "1m" # Per minute
# Model-specific rate limit (overrides provider default)
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000 # More restrictive for expensive model
interval = "30s"
Configure different limits for user groups (requires group_id
and group_values
in client identification):
# Provider-level group limits
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
enterprise = { input_token_limit = 1000000, interval = "60s" }
# Model-specific group limits (override provider groups)
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
enterprise = { input_token_limit = 500000, interval = "60s" }
The limits are per user, but you can define different limits if the user is part of a specific group. If the user does not belong to any group, they will be assigned to the per-user limits.
# Client identification (REQUIRED for rate limiting)
[server.client_identification]
enabled = true
client_id.jwt_claim = "sub"
group_id.jwt_claim = "subscription_tier"
[server.client_identification.validation]
group_values = ["free", "pro", "enterprise"]
# OpenAI provider with rate limiting
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Provider-level defaults
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000
interval = "60s"
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
# GPT-4 specific limits (more restrictive)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000
interval = "60s"
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
# GPT-3.5 uses provider defaults
[llm.providers.openai.models.gpt-3-5-turbo]
- Input Tokens Only: Rate limiting is based solely on input tokens counted from the request's messages and system prompts
-
No Output Buffering: Output tokens and
max_tokens
parameter are NOT considered in rate limit calculations - Pre-check: Input tokens are checked against rate limits before processing
- Token Accumulation: Uses a sliding window algorithm to track usage over time
Note: The rate limiting is designed to be predictable and based only on what the client sends, not on variable output sizes.
When rate limited, the server returns a 429 status code. No Retry-After headers are sent to maintain consistency with downstream LLM provider behavior.
When rate limits are exceeded:
{
"error": {
"message": "Rate limit exceeded: Token rate limit exceeded. Please try again later.",
"type": "rate_limit_error",
"code": 429
}
}
- Per-User Limits: All limits are per individual user/client ID
- No Shared Pools: Currently, there are no shared/global token pools
- Streaming Support: Token counting works with both regular and streaming responses
- Provider Agnostic: Works consistently across all LLM providers
- Validation: Configuration is validated at startup; invalid group names will cause errors
Configure TLS for downstream connections:
[mcp.servers.my_server.tls]
verify_certs = true
accept_invalid_hostnames = false
root_ca_cert_path = "/path/to/ca.pem"
client_cert_path = "/path/to/client.pem"
client_key_path = "/path/to/client.key"
Nexus provides a unified interface for multiple LLM providers, allowing you to route chat completions through various services with a consistent API.
[llm]
enabled = true # Enable LLM functionality (default: true)
# OpenAI protocol endpoint configuration
[llm.protocols.openai]
enabled = true # Enable OpenAI protocol (default: true)
path = "/llm" # Custom path (default: "/llm/openai")
# Anthropic protocol endpoint configuration
[llm.protocols.anthropic]
enabled = true # Enable Anthropic protocol (default: false)
path = "/claude" # Custom path (default: "/llm/anthropic")
Nexus currently supports four major LLM providers with full tool calling capabilities:
- OpenAI (including OpenAI-compatible APIs) - Full tool calling and parallel execution support
- Anthropic (Claude models) - Tool calling with function definitions and tool choice controls
- Google (Gemini models) - Function calling with parameter schemas and tool selection
- AWS Bedrock (Multiple model families via AWS) - Tool calling support across all supported model families
Configure one or more LLM providers in your nexus.toml
:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Optional: Use a custom base URL (for Azure OpenAI, proxies, or compatible APIs)
base_url = "https://api.openai.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.openai.models.gpt-4]
# Optional: Rename the model for your users
# rename = "smart-model" # Users will see "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
# Models without rename use their original ID
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://api.anthropic.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.anthropic.models.claude-3-opus-20240229]
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://generativelanguage.googleapis.com/v1beta" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
# Note: Model IDs with dots must be quoted in TOML
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models.gemini-pro]
[llm.providers.bedrock]
type = "bedrock"
# Optional: AWS profile to use (defaults to environment settings)
profile = "{{ env.AWS_PROFILE }}"
# Optional: AWS region (defaults to environment or us-east-1)
region = "us-west-2"
# Model Configuration (REQUIRED - at least one model must be configured)
# Bedrock uses model IDs with dots, so they must be quoted
[llm.providers.bedrock.models."anthropic.claude-3-5-sonnet-20241022-v2:0"]
[llm.providers.bedrock.models."anthropic.claude-3-opus-20240229-v1:0"]
[llm.providers.bedrock.models."amazon.nova-micro-v1:0"]
[llm.providers.bedrock.models."meta.llama3-8b-instruct-v1:0"]
[llm.providers.bedrock.models."ai21.jamba-1.5-mini-v1:0"]
# Rename models for simpler access
[llm.providers.bedrock.models.claude-haiku]
rename = "anthropic.claude-3-5-haiku-20241022-v1:0" # Users will access as "bedrock/claude-haiku"
[llm.providers.bedrock.models.jamba-mini]
rename = "ai21.jamba-1.5-mini-v1:0" # Users will access as "bedrock/jamba-mini"
AWS Bedrock provides access to multiple foundation models through a single API. Key features:
- Unified Access: Use models from Anthropic, Amazon, Meta, Cohere, and more through one interface
- AWS Integration: Leverages AWS credentials and IAM for authentication
- Regional Availability: Models may vary by AWS region
- Native Streaming: Full streaming support for all compatible models
Authentication: Bedrock uses standard AWS credential chain:
- Environment variables (
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
) - AWS profile (specified in config or via
AWS_PROFILE
environment variable) - IAM role (when running on EC2/ECS/Lambda)
- AWS SSO credentials
Supported Model Families:
- AI21 Jamba: Jamba 1.5 Mini and Large models with 256K context window
- Anthropic Claude: All Claude 3 models (Opus, Sonnet, Haiku) and Claude Instant
- Amazon Nova: Nova Micro, Lite, Pro models
- Amazon Titan: Titan Text and Embeddings models
- Meta Llama: Llama 2 and Llama 3 models
- Cohere Command: Command and Command R models
- DeepSeek: DeepSeek R1 reasoning models
- Mistral: Mistral 7B and Mixtral models
Model ID Format: Bedrock model IDs follow the pattern provider.model-name-version:revision
, for example:
ai21.jamba-1.5-mini-v1:0
anthropic.claude-3-5-sonnet-20241022-v2:0
amazon.nova-micro-v1:0
meta.llama3-8b-instruct-v1:0
Each LLM provider requires explicit model configuration. This ensures that only the models you want to expose are available through Nexus.
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Each model you want to expose must be explicitly configured
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
You can rename models to provide custom identifiers for your users:
[llm.providers.openai.models.gpt-4]
rename = "smart-model" # Users will access this as "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
rename = "fast-model" # Users will access this as "openai/fast-model"
This is useful for:
- Creating business-friendly model names
- Abstracting away provider-specific model names
- Providing consistent naming across different providers
Model IDs that contain dots must be quoted in TOML:
# Correct - dots in model IDs require quotes
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models."gemini-1.5-pro"]
# Also correct - no dots, no quotes needed
[llm.providers.google.models.gemini-pro]
You can configure multiple instances of the same provider type with different names:
# Primary OpenAI account
[llm.providers.openai_primary]
type = "openai"
api_key = "{{ env.OPENAI_PRIMARY_KEY }}"
[llm.providers.openai_primary.models.gpt-4]
[llm.providers.openai_primary.models.gpt-3-5-turbo]
# Secondary OpenAI account or Azure OpenAI
[llm.providers.openai_secondary]
type = "openai"
api_key = "{{ env.OPENAI_SECONDARY_KEY }}"
base_url = "https://my-azure-instance.openai.azure.com/v1"
[llm.providers.openai_secondary.models.gpt-4]
rename = "azure-gpt-4" # Distinguish from primary account
# Anthropic
[llm.providers.claude]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.claude.models.claude-3-opus-20240229]
# Google Gemini
[llm.providers.gemini]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
[llm.providers.gemini.models."gemini-1.5-flash"]
Nexus supports token forwarding, allowing users to provide their own API keys at request time instead of using the configured keys. This feature is opt-in and disabled by default.
Enable token forwarding for any provider by setting forward_token = true
:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}" # Fallback key (optional with token forwarding)
forward_token = true # Enable token forwarding for this provider
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
# No api_key required when token forwarding is enabled
forward_token = true
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
forward_token = false # Explicitly disabled (default)
[llm.providers.google.models."gemini-1.5-flash"]
When token forwarding is enabled for a provider, users can pass their own API key using the X-Provider-API-Key
header:
# Using your own OpenAI key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-your-openai-key" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Using your own Anthropic key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-ant-your-anthropic-key" \
-d '{
"model": "anthropic/claude-3-opus-20240229",
"messages": [{"role": "user", "content": "Hello"}]
}'
-
When token forwarding is enabled (
forward_token = true
):- User-provided keys (via header) take priority
- Falls back to configured key if no header is provided
- Returns 401 error if neither key is available
-
When token forwarding is disabled (
forward_token = false
, default):- Always uses the configured API key
- Ignores the
X-Provider-API-Key
header - Returns 401 error if no configured key exists
- OAuth2 Integration: When OAuth2 is enabled, users must still authenticate with Nexus even when using token forwarding
- Key Validation: API keys are validated by the provider's API
- No Logging: User-provided keys are never logged
- HTTPS Recommended: Always use HTTPS in production to protect API keys in transit
- AWS Bedrock: Token forwarding is not supported for Bedrock providers. Bedrock uses AWS IAM credentials and request signing, which cannot be provided via simple API key headers. You must configure AWS credentials at the provider level (via environment variables, AWS profile, or explicit credentials in configuration).
Nexus supports header transformation for LLM providers, allowing you to forward, insert, remove, or rename headers when making requests to LLM APIs.
Configure header rules at the provider level:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Forward headers from incoming requests
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Request-ID"
# Forward with a default value if not present
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Trace-ID"
default = "generated-trace-id"
# Forward and rename
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Custom-Header"
rename = "X-OpenAI-Custom"
# Insert static headers
[[llm.providers.openai.headers]]
rule = "insert"
name = "X-OpenAI-Beta"
value = "assistants=v2"
# Remove headers
[[llm.providers.openai.headers]]
rule = "remove"
name = "X-Internal-Secret"
# Pattern-based forwarding (regex)
[[llm.providers.openai.headers]]
rule = "forward"
pattern = "X-Debug-.*"
# Pattern-based removal
[[llm.providers.openai.headers]]
rule = "remove"
pattern = "X-Internal-.*"
# Rename and duplicate (keeps original and adds renamed copy)
[[llm.providers.openai.headers]]
rule = "rename_duplicate"
name = "X-User-ID"
rename = "X-OpenAI-User"
Configure headers for specific models (overrides provider-level rules):
[llm.providers.openai.models.gpt-4]
# Model-specific headers override provider headers
[[llm.providers.openai.models.gpt-4.headers]]
rule = "insert"
name = "X-Model-Config"
value = "premium"
[[llm.providers.openai.models.gpt-4.headers]]
rule = "forward"
pattern = "X-Premium-.*"
-
forward: Pass headers from incoming requests to the LLM provider
-
name
: Single header name to forward -
pattern
: Regex pattern to match multiple headers -
default
: Optional default value if header is missing -
rename
: Optional new name for the forwarded header
-
-
insert: Add static headers to requests
-
name
: Header name -
value
: Static value (supports{{ env.VAR }}
substitution)
-
-
remove: Remove headers before sending to provider
-
name
: Single header name to remove -
pattern
: Regex pattern to match headers to remove
-
-
rename_duplicate: Forward header with both original and new name
-
name
: Original header name -
rename
: New header name for the duplicate -
default
: Optional default if header is missing
-
- AWS Bedrock: Does not support custom headers due to SigV4 signing requirements
- Priority: Model-level rules override provider-level rules
-
Token Forwarding: The
X-Provider-API-Key
header is handled separately for token forwarding - Pattern Matching: Patterns are case-insensitive regex expressions
Once configured, you can interact with LLM providers through Nexus's unified API:
curl http://localhost:8000/llm/models
Response:
{
"object": "list",
"data": [
{
"id": "openai_primary/gpt-4-turbo",
"object": "model",
"created": 1677651200,
"owned_by": "openai"
},
{
"id": "claude/claude-3-5-sonnet-20241022",
"object": "model",
"created": 1709164800,
"owned_by": "anthropic"
},
{
"id": "gemini/gemini-1.5-pro",
"object": "model",
"created": 1710000000,
"owned_by": "google"
}
]
}
Nexus supports advanced tool calling across all LLM providers, allowing models to invoke external functions with structured parameters. All providers use the standardized OpenAI tool calling format for consistency.
Basic Tool Calling Example:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "What'\''s the weather in San Francisco?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'
Tool Choice Options:
-
"auto"
(default): Model decides whether to call tools -
"none"
: Model won't call any tools -
"required"
: Model must call at least one tool -
{"type": "function", "function": {"name": "specific_function"}}
: Force a specific tool
Parallel Tool Calls:
When supported by the provider, models can call multiple tools simultaneously:
{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Get weather for NYC and LA"}],
"tools": [/* tool definitions */],
"parallel_tool_calls": true
}
Tool Conversation Flow:
Tool calling creates a multi-turn conversation:
- User message → asks a question requiring tool use
- Assistant message → responds with tool calls (no content)
- Tool messages → provide results from tool execution
- Assistant message → synthesizes final response
{
"messages": [
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"tool_calls": [{
"id": "call_123",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"location\": \"Paris\"}"}
}]
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": "Weather in Paris: 22°C, sunny"
},
{
"role": "assistant",
"content": "The weather in Paris is currently 22°C and sunny!"
}
]
}
Provider-Specific Tool Support:
All providers support the standardized format, but have different capabilities:
- OpenAI: Full support including parallel calls and streaming tool calls
- Anthropic: Tool calling with robust function definitions and tool choice controls
- Google: Function calling with JSON schema validation and tool selection
- AWS Bedrock: Tool calling support varies by model family (Claude, Nova, etc.)
Streaming Tool Calls:
Tool calls can be streamed just like regular responses:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Search for Python tutorials"}],
"tools": [/* tool definitions */],
"stream": true
}'
The stream will include tool call chunks as they're generated, followed by the tool execution results.
Send a chat completion request using the OpenAI-compatible format:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai_primary/gpt-4-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 150
}'
The model name format is <provider_name>/<model_id>
. Nexus automatically routes the request to the appropriate provider and transforms the request/response as needed.
Nexus supports streaming responses for all LLM providers using Server-Sent Events (SSE):
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "Write a short poem"}
],
"stream": true,
"max_tokens": 100
}'
When stream: true
is set, the response will be streamed as Server-Sent Events with the following format:
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"role":"assistant","content":"Here"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"content":" is"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":25,"total_tokens":35}}
data: [DONE]
Streaming is supported for all providers (OpenAI, Anthropic, Google) and provides:
- Real-time token delivery as they're generated
- Consistent chunk format across all providers
- Usage statistics in the final chunk
- Standard SSE format compatible with OpenAI SDKs
- Supports all standard OpenAI models (GPT-3.5, GPT-4, etc.)
- Compatible with Azure OpenAI endpoints
- Tool Calling: Full support including parallel tool calls and streaming tool calls
- Function Definitions: JSON schema support for parameters
- Tool Choice: Supports all tool choice modes including specific function forcing
- Supports streaming responses with Server-Sent Events (SSE)
- System messages are automatically extracted and placed in the
system
field - Messages must alternate between user and assistant roles
- Requires explicit
max_tokens
parameter (defaults to 4096 if not specified) - Tool Calling: Robust tool calling with function definitions and tool choice controls
- Tool Use: Supports tool_use blocks with structured parameter validation
- Streaming Tools: Tool calls can be streamed incrementally
- Supports all Claude models (Opus, Sonnet, Haiku)
- Supports streaming responses with Server-Sent Events (SSE)
- Assistant role is automatically mapped to "model" role
- System messages are placed in the
systemInstruction
field - Function Calling: Native function calling with JSON schema validation
- Tool Selection: Supports automatic tool selection and forced function calls
- Parameter Validation: Strict parameter schema enforcement
- Returns appropriate safety ratings when available
- Supports streaming responses with Server-Sent Events (SSE)
- Uses the Bedrock conversation API for seamless integration
- We have tested and verified Anthropic Claude, Amazon Nova, Meta Llama, Cohere Command, Mistral, and DeepSeek models
- Other models may work, but we have not tested them. Please report any issues or successes.
-
Tool Calling: Support varies by model family:
- Anthropic Claude: Full tool calling support through Bedrock
- Amazon Nova: Native tool calling capabilities
- Meta Llama: Function calling has issues, avoid using
- Cohere Command: Tool use support
- Mistral: Tools work well with Bedrock
- DeepSeek: Tools do not work with Bedrock
- Uses AWS SDK for authentication and request signing
- Supports all Bedrock features including streaming and model-specific parameters
- Regional endpoint selection based on configuration or AWS defaults
- Model availability varies by AWS region
Nexus provides an OpenAI-compatible API, making it easy to use with existing LLM SDKs and libraries. Simply point the SDK to your Nexus instance instead of the provider's API.
from openai import OpenAI
# Point to your Nexus instance
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token" # Use a JWT token if OAuth2 is enabled, or any string if not
)
# Use any configured provider/model
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Hello!"}
]
)
# Streaming works seamlessly
stream = client.chat.completions.create(
model="openai/gpt-4-turbo",
messages=[
{"role": "user", "content": "Write a poem"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
// Configure to use Nexus
const openai = new OpenAI({
baseURL: 'http://localhost:8000/llm',
apiKey: 'your-service-token', // Use a JWT token if OAuth2 is enabled, or any string if not
});
// Use any provider through Nexus
const response = await openai.chat.completions.create({
model: 'google/gemini-1.5-pro',
messages: [
{ role: 'user', content: 'Explain quantum computing' }
],
});
// Streaming with any provider
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-3-opus-20240229',
messages: [
{ role: 'user', content: 'Write a story' }
],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
from langchain_openai import ChatOpenAI
# Use Nexus as the LLM provider
llm = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="openai/gpt-4-turbo"
)
# Works with any configured provider
claude = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="anthropic/claude-3-5-sonnet-20241022"
)
# Regular completion (with OAuth2 authentication if enabled)
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}' | jq -r '.choices[0].message.content'
# Streaming with SSE parsing
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}' | grep "^data: " | sed 's/^data: //' | jq -r 'select(.choices != null) | .choices[0].delta.content // empty'
When OAuth2 is enabled in your Nexus configuration, you must provide a valid JWT token:
# With OAuth2 enabled
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="eyJhbGciOiJSUzI1NiIs..." # Your JWT token
)
Without OAuth2, the api_key
field is still required by most SDKs but can be any non-empty string:
# Without OAuth2
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="dummy" # Any non-empty string works
)
Nexus provides consistent error responses across all providers:
- 400 Bad Request: Invalid request format or parameters
- 401 Unauthorized: Missing or invalid API key
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Provider API error or network issues
Example error response:
{
"error": {
"message": "Invalid model format: expected 'provider/model', got 'invalid-format'",
"type": "invalid_request_error",
"code": 400
}
}
Nexus provides comprehensive observability through OpenTelemetry metrics, distributed tracing, and logs export. All telemetry features have zero overhead when the [telemetry]
section is absent from configuration.
Telemetry activation is controlled by exporters configuration:
- Metrics: Enabled when OTLP exporters are configured
- Tracing: Enabled when OTLP exporters are configured
- Logs: Enabled when OTLP exporters are configured
Each telemetry signal (metrics, tracing, logs) can override the global exporters configuration with their own specific exporters.
[telemetry]
service_name = "nexus-production" # Optional, defaults to "nexus"
# Resource attributes for all telemetry (optional)
[telemetry.resource_attributes]
environment = "production"
region = "us-east-1"
# OTLP exporter for metrics
[telemetry.exporters.otlp]
enabled = true # Default: false
endpoint = "http://localhost:4317" # Default: http://localhost:4317
protocol = "grpc" # Default: grpc (options: grpc, http)
timeout = "60s" # Default: 60s
# Protocol-specific configuration
# For gRPC protocol (headers must be lowercase, cannot start with "grpc-"):
[telemetry.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
x-nexus-shard = "primary"
# TLS configuration for secure gRPC connections (all fields optional):
[telemetry.exporters.otlp.grpc.tls]
domain_name = "custom_name" # Domain name for TLS verification (SNI)
key = "/path/to/key.pem" # Path to client private key PEM file
cert = "/path/to/cert.pem" # Path to client certificate PEM file
ca = "/path/to/ca.pem" # Path to CA certificate PEM file
# For HTTP protocol (standard HTTP header names):
[telemetry.exporters.otlp.http.headers]
Authorization = "Bearer {{ env.OTLP_TOKEN }}"
X-Nexus-Shard = "primary"
# Batch export configuration (all optional with defaults)
[telemetry.exporters.otlp.batch_export]
scheduled_delay = "5s" # Default: 5s
max_queue_size = 2048 # Default: 2048
max_export_batch_size = 512 # Default: 512
max_concurrent_exports = 1 # Default: 1
See Available Metrics for detailed metric definitions.
[telemetry.tracing]
sampling = 0.15 # Sample 15% of requests (default: 0.15)
parent_based_sampler = false # Respect parent trace sampling decision (default: false)
# Collection limits (per span)
[telemetry.tracing.collect]
max_events_per_span = 128
max_attributes_per_span = 128
max_links_per_span = 128
max_attributes_per_event = 128
max_attributes_per_link = 128
# Trace context propagation formats
[telemetry.tracing.propagation]
trace_context = true # W3C Trace Context
aws_xray = false # AWS X-Ray format
# Optional: Override global OTLP exporter for traces
[telemetry.tracing.exporters.otlp]
enabled = true
endpoint = "http://localhost:4317"
protocol = "grpc" # or "http"
timeout = "60s"
# Protocol-specific headers
[telemetry.tracing.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
# TLS configuration (same format as global OTLP):
# [telemetry.tracing.exporters.otlp.grpc.tls]
# domain_name = "custom_name"
# key = "/path/to/key.pem"
# cert = "/path/to/cert.pem"
# ca = "/path/to/ca.pem"
# Or for HTTP:
# [telemetry.tracing.exporters.otlp.http.headers]
# Authorization = "Bearer {{ env.OTLP_TOKEN }}"
See Distributed Tracing for span hierarchy and attributes.
Nexus can export application logs to OpenTelemetry collectors, automatically correlating them with distributed traces. Logs are enabled when OTLP exporters are configured (either globally or specifically for logs).
# Option 1: Use global OTLP exporter for all telemetry signals
# (logs will use the same endpoint as metrics and traces)
# Option 2: Configure a separate OTLP endpoint specifically for logs
[telemetry.logs.exporters.otlp]
enabled = true
endpoint = "http://logs-collector:4317" # Different endpoint for logs
protocol = "grpc" # or "http"
timeout = "30s"
# Protocol-specific headers
[telemetry.logs.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
# TLS configuration (same format as global OTLP):
# [telemetry.logs.exporters.otlp.grpc.tls]
# domain_name = "custom_name"
# key = "/path/to/key.pem"
# cert = "/path/to/cert.pem"
# ca = "/path/to/ca.pem"
# Or for HTTP:
# [telemetry.logs.exporters.otlp.http.headers]
# Authorization = "Bearer {{ env.OTLP_TOKEN }}"
# Logs are automatically correlated with active trace and span IDs
When logs export is enabled, logs include:
- Automatic trace and span ID correlation for distributed debugging
- Log severity levels mapped from Rust log levels
- Source code location attributes (file, line, module)
- All logs from Nexus and its dependencies
Note: OpenTelemetry's own internal logs are filtered out to prevent recursion.
All histograms also function as counters (count field tracks number of observations).
HTTP Server Metrics:
-
http.server.request.duration
(histogram)- Attributes:
http.route
,http.request.method
,http.response.status_code
- Attributes:
MCP Operation Metrics:
-
mcp.tool.call.duration
(histogram)- Attributes:
tool_name
,tool_type
(builtin/downstream),status
(success/error),client.id
,client.group
- Additional for search:
keyword_count
,result_count
- Additional for execute:
server_name
(for downstream tools) - Additional for errors:
error.type
:-
parse_error
- Invalid JSON (-32700) -
invalid_request
- Not a valid request (-32600) -
method_not_found
- Method/tool does not exist (-32601) -
invalid_params
- Invalid method parameters (-32602) -
internal_error
- Internal server error (-32603) -
rate_limit_exceeded
- Rate limit hit (-32000) -
server_error
- Other server errors (-32001 to -32099) -
unknown
- Any other error code
-
- Attributes:
-
mcp.tools.list.duration
(histogram)- Attributes:
method
(list_tools),status
,client.id
,client.group
- Attributes:
-
mcp.prompt.request.duration
(histogram)- Attributes:
method
(list_prompts/get_prompt),status
,client.id
,client.group
- Additional for errors:
error.type
(same values as above)
- Attributes:
-
mcp.resource.request.duration
(histogram)- Attributes:
method
(list_resources/read_resource),status
,client.id
,client.group
- Additional for errors:
error.type
(same values as above)
- Attributes:
LLM Operation Metrics:
-
gen_ai.client.operation.duration
(histogram)- Tracks the total duration of LLM chat completion operations
- Attributes:
-
gen_ai.system
(always "nexus.llm") -
gen_ai.operation.name
(always "chat.completions") -
gen_ai.request.model
(e.g., "openai/gpt-4") -
gen_ai.response.finish_reason
(stop/length/tool_calls/content_filter) -
client.id
(from x-client-id header) -
client.group
(from x-client-group header) -
error.type
(for failed requests):-
invalid_request
- Malformed request -
authentication_failed
- Invalid API key -
insufficient_quota
- Quota exceeded -
model_not_found
- Unknown model -
rate_limit_exceeded
- Provider or token rate limit hit -
streaming_not_supported
- Streaming unavailable for model -
invalid_model_format
- Incorrect model name format -
provider_not_found
- Unknown provider -
internal_error
- Server error -
provider_api_error
- Upstream provider error -
connection_error
- Network failure
-
-
-
gen_ai.client.time_to_first_token
(histogram)- Tracks time until first token in streaming responses
- Attributes:
gen_ai.system
,gen_ai.operation.name
,gen_ai.request.model
,client.id
,client.group
-
gen_ai.client.input.token.usage
(counter)- Cumulative input token consumption
- Attributes:
gen_ai.system
,gen_ai.request.model
,client.id
,client.group
-
gen_ai.client.output.token.usage
(counter)- Cumulative output token consumption
- Attributes:
gen_ai.system
,gen_ai.request.model
,client.id
,client.group
-
gen_ai.client.total.token.usage
(counter)- Cumulative total token consumption (input + output)
- Attributes:
gen_ai.system
,gen_ai.request.model
,client.id
,client.group
Redis Storage Metrics (when rate limiting with Redis is enabled):
-
redis.command.duration
(histogram)- Tracks Redis command execution times
- Attributes:
-
operation
- The Redis operation type:-
check_and_consume
- HTTP rate limit checking -
check_and_consume_tokens
- Token-based rate limit checking
-
-
status
(success/error) -
tokens
(for token operations) - Number of tokens checked
-
-
redis.pool.connections.in_use
(gauge)- Current number of connections in use from the pool
- No additional attributes
-
redis.pool.connections.available
(gauge)- Current number of available connections in the pool
- No additional attributes
Configuration: See Tracing Configuration in the Telemetry section.
Distributed tracing provides detailed insights into request flows across all components, helping you understand latency, identify bottlenecks, and debug issues in production.
Nexus supports multiple trace context propagation formats for incoming requests:
-
W3C Trace Context: Standard
traceparent
andtracestate
headers (enabled by default) -
AWS X-Ray:
X-Amzn-Trace-Id
header for AWS services integration
When enabled, Nexus will:
- Extract trace context from incoming HTTP requests
- Continue the trace with the parent context or start a new trace
- Create spans for all internal operations
Note: Nexus is a terminal node for traces - it does not propagate trace context to downstream MCP servers or LLM providers.
Nexus creates a hierarchical span structure for each request:
HTTP Request (root span)
├── Redis Rate Limit Check (redis:check_and_consume:global/ip) - HTTP-level rate limits
├── MCP Operation
│ ├── Redis Rate Limit Check (redis:check_and_consume:server/tool) - MCP-specific rate limits
│ ├── Tool Search (index:search)
│ └── Tool Execution
│ └── Downstream Operations (downstream:execute, downstream:call_tool)
└── LLM Operation (llm:chat_completion or llm:chat_completion_stream)
└── Token Rate Limit Check (redis:check_and_consume_tokens) - Token-based rate limits
Each span includes semantic attributes following OpenTelemetry conventions:
HTTP Spans:
-
http.request.method
- HTTP method (GET, POST, etc.) -
http.route
- Matched route pattern -
http.response.status_code
- Response status code -
url.scheme
- URL scheme (http/https) -
server.address
- Host header value -
url.full
- Full request URL -
client.id
- Client identifier (if configured) -
client.group
- Client group (if configured) -
error
- Set to "true" if request failed
MCP Spans:
-
mcp.method
- MCP method name (tools/list, tools/call, etc.) -
mcp.tool.name
- Tool being called -
mcp.tool.type
- Tool type (builtin/downstream) -
mcp.transport
- Transport type (stdio/http) -
mcp.auth_forwarded
- Whether auth was forwarded -
client.id
- Client identifier (if configured) -
client.group
- Client group (if configured) -
mcp.search.keywords
- Search keywords (for search tool) -
mcp.search.keyword_count
- Number of search keywords -
mcp.execute.target_tool
- Tool being executed -
mcp.execute.target_server
- Server hosting the tool -
mcp.error.code
- Error code if operation failed
LLM Spans:
-
gen_ai.request.model
- Model identifier -
gen_ai.request.max_tokens
- Max tokens requested -
gen_ai.request.temperature
- Temperature parameter -
gen_ai.request.has_tools
- Whether tools were provided -
gen_ai.request.tool_count
- Number of tools provided -
gen_ai.response.model
- Model used for response -
gen_ai.response.finish_reason
- Completion reason -
gen_ai.usage.input_tokens
- Input token count -
gen_ai.usage.output_tokens
- Output token count -
gen_ai.usage.total_tokens
- Total token count -
llm.stream
- Whether streaming was used -
llm.auth_forwarded
- Whether auth was forwarded -
client.id
- Client identifier (if configured) -
client.group
- Client group (if configured) -
error.type
- Error type if operation failed
Redis Spans:
-
redis.operation
- Operation type (check_and_consume or check_and_consume_tokens) -
rate_limit.scope
- Scope (global/ip/server/tool/token) -
rate_limit.limit
- Request/token limit -
rate_limit.interval_ms
- Time window in milliseconds -
rate_limit.tokens
- Number of tokens (for token operations) -
rate_limit.allowed
- Whether request was allowed -
rate_limit.retry_after_ms
- Retry delay if rate limited -
redis.pool.size
- Connection pool size -
redis.pool.available
- Available connections -
redis.pool.in_use
- Connections in use -
client.address_hash
- Hashed IP for privacy (per-IP limits) -
llm.provider
- Provider name (token limits) -
llm.model
- Model name (token limits) -
mcp.server
- Server name (per-server limits) -
mcp.tool
- Tool name (per-tool limits)
Nexus supports two sampling strategies:
Fixed-Rate Sampling (default): Randomly samples a percentage of traces based on the configured rate.
[telemetry.tracing]
sampling = 0.15 # Sample 15% of requests (0.0 to 1.0)
Parent-Based Sampling: When enabled, Nexus respects the sampling decision from parent traces in distributed systems.
[telemetry.tracing]
sampling = 0.15 # Default sampling rate for root traces
parent_based_sampler = true # Respect parent's sampling decision
With parent_based_sampler = true
:
- If a parent trace is sampled (W3C traceparent flag=01 or X-Ray Sampled=1), the child trace will be sampled
- If a parent trace is not sampled (flag=00 or Sampled=0), the child trace will not be sampled
- If no parent exists, the
sampling
rate is used
This ensures consistent sampling decisions across distributed traces, preventing incomplete trace data where some spans are sampled while others are not.
When a parent trace context is provided in request headers (via W3C Trace Context or AWS X-Ray), Nexus will continue that trace.
- Zero Overhead When Disabled: When no OTLP exporters are configured, no spans are created
- Efficient Sampling: Unsampled traces have minimal overhead
- Batched Export: Traces are batched before export to reduce network overhead
- Configurable Limits: Adjust collection limits to balance detail vs. resource usage
Nexus traces can be exported to any OpenTelemetry-compatible APM tool:
- Jaeger: Full trace visualization and analysis
- Zipkin: Distributed tracing system
- AWS X-Ray: Native AWS integration with X-Ray propagation
- Datadog: APM and distributed tracing
- New Relic: Application performance monitoring
- Grafana Tempo: Cloud-native tracing backend
Example Jaeger configuration:
[telemetry.tracing.exporters.otlp]
enabled = true
endpoint = "http://jaeger:4317"
protocol = "grpc"
Add to your Cursor settings:
- Open Cursor Settings (Cmd+, on macOS)
- Search for "Model Context Protocol"
- Enable MCP support
- Add to the MCP server configuration:
{
"nexus": {
"transport": {
"type": "http",
"url": "http://localhost:8000/mcp"
}
}
}
Make sure Nexus is running on localhost:8000
(or adjust the URL accordingly).
Add to your Claude Code configuration:
-
Open Claude Code and run the command:
claude mcp add --transport http nexus http://localhost:8000/mcp
-
Or add it to your project's
.mcp.json
file:{ "mcpServers": { "nexus": { "type": "http", "url": "http://localhost:8000/mcp" } } }
-
Verify the connection:
claude mcp list
Make sure Nexus is running before starting Claude Code.
Configure Codex CLI to talk to Nexus by editing ~/.codex/config.toml
:
Note: Nexus serves its OpenAI-compatible endpoint at
/llm/openai/
; Codex expects the/v1
suffix. Make sure thebase_url
ends with/v1
.
[model_providers.nexus]
name = "Nexus AI router"
base_url = "http://127.0.0.1:8000/llm/openai/v1"
wire_api = "chat"
query_params = {}
-
base_url
must point to the Nexus OpenAI-compatible endpoint and include the/v1
suffix (adjust host/port if Nexus runs elsewhere). -
query_params
can stay empty, but the table must exist to satisfy Codex's schema.
Start Codex with a Nexus-managed model:
codex -c model="anthropic/claude-3-5-haiku-latest" -c model_provider=nexus
Swap the model name for any provider/model pair that you have configured in Nexus.
Nexus provides two main tools to AI assistants:
-
search
: A context-aware tool search that uses fuzzy matching to find relevant tools across all connected MCP servers -
execute
: Executes a specific tool with the provided parameters
When an AI assistant connects to Nexus, it can:
- Search for tools using natural language queries
- Discover tool names, descriptions, and required parameters
- Execute tools from any connected MCP server
All tools from downstream servers are namespaced with their server name (e.g., github__search_code
, filesystem__read_file
).
Nexus acts as a unified gateway for multiple LLM providers:
- Model Discovery: Lists all available models from configured providers with consistent naming
- Request Routing: Automatically routes requests to the correct provider based on model name
- Format Translation: Converts between OpenAI's API format and provider-specific formats
- Response Normalization: Returns consistent response format regardless of provider
Models are namespaced with their provider name (e.g., openai/gpt-4
, anthropic/claude-3-opus-20240229
).
STDIO servers are spawned as child processes and communicate via JSON-RPC over standard input/output:
- Process Management: Nexus automatically manages the lifecycle of STDIO server processes
- Tool Discovery: Tools from STDIO servers are discovered dynamically and indexed for search
- Error Handling: If a STDIO process crashes or outputs invalid JSON, appropriate errors are returned
- Environment Isolation: Each STDIO server runs in its own process with configurable environment
Once configured, AI assistants can interact with Nexus like this:
-
Search for tools:
User: "I need to search for code on GitHub" Assistant: Let me search for GitHub-related tools... [Calls search with keywords: ["github", "code", "search"]]
-
Execute tools:
Assistant: I found the `github__search_code` tool. Let me search for your query... [Calls execute with name: "github__search_code" and appropriate arguments]
[mcp.servers.python_tools]
cmd = ["python", "-m", "my_mcp_server"]
env = { PYTHONPATH = "/opt/mcp", PYTHONUNBUFFERED = "1" }
stderr = "inherit" # See Python output during development
[mcp.servers.node_tools]
cmd = ["node", "mcp-server.js"]
cwd = "/path/to/project"
env = { NODE_ENV = "production" }
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user"]
- Check executable path: Ensure the command exists and is executable
-
View stderr output: Set
stderr = "inherit"
temporarily to see error messages - Verify JSON-RPC output: The server must output valid JSON-RPC on stdout
-
Check working directory: Ensure
cwd
path exists if specified
- Wait for initialization: STDIO servers may take a moment to start
- Use search: STDIO tools only appear in search results, not in the base tool list
- Check server logs: Enable stderr logging to see if the server is responding to tool list requests
- Always use environment variables for sensitive tokens
- Enable TLS verification for production deployments
- Use CORS configuration to restrict access
- Configure OAuth2 authentication for production deployments
- Ensure JWKs URLs use HTTPS in production
- Validate JWT token issuer and audience claims
- Keep your MCP servers and Nexus updated
- Be cautious when running STDIO servers with elevated privileges
- Validate and sanitize any user input passed to STDIO server commands
When using OAuth2 authentication:
- Use HTTPS: Always use HTTPS for JWKs URLs and protected resources in production
-
Validate Claims: Configure
expected_issuer
andexpected_audience
to validate JWT claims -
Metadata Endpoint: The
/.well-known/oauth-protected-resource
endpoint provides OAuth2 metadata and is publicly accessible -
Health Checks: The
/health
endpoint bypasses OAuth2 authentication for monitoring systems
Nexus is licensed under the Mozilla Public License 2.0 (MPL-2.0). See the LICENSE file for details.
See CONTRIBUTING.md for guidelines on how to contribute to Nexus.
- Documentation: https://nexusrouter.com/docs
- Issues: https://github.com/grafbase/nexus/issues
- Discord: Grafbase Discord
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for nexus
Similar Open Source Tools

nexus
Nexus is a tool that acts as a unified gateway for multiple LLM providers and MCP servers. It allows users to aggregate, govern, and control their AI stack by connecting multiple servers and providers through a single endpoint. Nexus provides features like MCP Server Aggregation, LLM Provider Routing, Context-Aware Tool Search, Protocol Support, Flexible Configuration, Security features, Rate Limiting, and Docker readiness. It supports tool calling, tool discovery, and error handling for STDIO servers. Nexus also integrates with AI assistants, Cursor, Claude Code, and LangChain for seamless usage.

superagent
Superagent is an open-source AI assistant framework and API that allows developers to add powerful AI assistants to their applications. These assistants use large language models (LLMs), retrieval augmented generation (RAG), and generative AI to help users with a variety of tasks, including question answering, chatbot development, content generation, data aggregation, and workflow automation. Superagent is backed by Y Combinator and is part of YC W24.

mcp-documentation-server
The mcp-documentation-server is a lightweight server application designed to serve documentation files for projects. It provides a simple and efficient way to host and access project documentation, making it easy for team members and stakeholders to find and reference important information. The server supports various file formats, such as markdown and HTML, and allows for easy navigation through the documentation. With mcp-documentation-server, teams can streamline their documentation process and ensure that project information is easily accessible to all involved parties.

LLMVoX
LLMVoX is a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming Text-to-Speech (TTS) system designed to convert text outputs from Large Language Models into high-fidelity streaming speech with low latency. It achieves significantly lower Word Error Rate compared to speech-enabled LLMs while operating at comparable latency and speech quality. Key features include being lightweight & fast with only 30M parameters, LLM-agnostic for easy integration with existing models, multi-queue streaming for continuous speech generation, and multilingual support for easy adaptation to new languages.

mcp-omnisearch
mcp-omnisearch is a Model Context Protocol (MCP) server that acts as a unified gateway to multiple search providers and AI tools. It integrates Tavily, Perplexity, Kagi, Jina AI, Brave, Exa AI, and Firecrawl to offer a wide range of search, AI response, content processing, and enhancement features through a single interface. The server provides powerful search capabilities, AI response generation, content extraction, summarization, web scraping, structured data extraction, and more. It is designed to work flexibly with the API keys available, enabling users to activate only the providers they have keys for and easily add more as needed.

Groq2API
Groq2API is a REST API wrapper around the Groq2 model, a large language model trained by Google. The API allows you to send text prompts to the model and receive generated text responses. The API is easy to use and can be integrated into a variety of applications.

ai-wechat-bot
Gewechat is a project based on the Gewechat project to implement a personal WeChat channel, using the iPad protocol for login. It can obtain wxid and send voice messages, which is more stable than the itchat protocol. The project provides documentation for the API. Users can deploy the Gewechat service and use the ai-wechat-bot project to interface with it. Configuration parameters for Gewechat and ai-wechat-bot need to be set in the config.json file. Gewechat supports sending voice messages, with limitations on the duration of received voice messages. The project has restrictions such as requiring the server to be in the same province as the device logging into WeChat, limited file download support, and support only for text and image messages.

supergateway
Supergateway is a tool that allows running MCP stdio-based servers over SSE (Server-Sent Events) with one command. It is useful for remote access, debugging, or connecting to SSE-based clients when your MCP server only speaks stdio. The tool supports running in SSE to Stdio mode as well, where it connects to a remote SSE server and exposes a local stdio interface for downstream clients. Supergateway can be used with ngrok to share local MCP servers with remote clients and can also be run in a Docker containerized deployment. It is designed with modularity in mind, ensuring compatibility and ease of use for AI tools exchanging data.

ck
ck (seek) is a semantic grep tool that finds code by meaning, not just keywords. It replaces traditional grep by understanding the user's search intent. It allows users to search for code based on concepts like 'error handling' and retrieves relevant code even if the exact keywords are not present. ck offers semantic search, drop-in grep compatibility, hybrid search combining keyword precision with semantic understanding, agent-friendly output in JSONL format, smart file filtering, and various advanced features. It supports multiple search modes, relevance scoring, top-K results, and smart exclusions. Users can index projects for semantic search, choose embedding models, and search specific files or directories. The tool is designed to improve code search efficiency and accuracy for developers and AI agents.

code_puppy
Code Puppy is an AI-powered code generation agent designed to understand programming tasks, generate high-quality code, and explain its reasoning. It supports multi-language code generation, interactive CLI, and detailed code explanations. The tool requires Python 3.9+ and API keys for various models like GPT, Google's Gemini, Cerebras, and Claude. It also integrates with MCP servers for advanced features like code search and documentation lookups. Users can create custom JSON agents for specialized tasks and access a variety of tools for file management, code execution, and reasoning sharing.

quantalogic
QuantaLogic is a ReAct framework for building advanced AI agents that seamlessly integrates large language models with a robust tool system. It aims to bridge the gap between advanced AI models and practical implementation in business processes by enabling agents to understand, reason about, and execute complex tasks through natural language interaction. The framework includes features such as ReAct Framework, Universal LLM Support, Secure Tool System, Real-time Monitoring, Memory Management, and Enterprise Ready components.

MCPSpy
MCPSpy is a command-line tool leveraging eBPF technology to monitor Model Context Protocol (MCP) communication at the kernel level. It provides real-time visibility into JSON-RPC 2.0 messages exchanged between MCP clients and servers, supporting Stdio and HTTP transports. MCPSpy offers security analysis, debugging, performance monitoring, compliance assurance, and learning opportunities for understanding MCP communications. The tool consists of eBPF programs, an eBPF loader, an HTTP session manager, an MCP protocol parser, and output handlers for console display and JSONL output.

lumen
Lumen is a command-line tool that leverages AI to enhance your git workflow. It assists in generating commit messages, understanding changes, interactive searching, and analyzing impacts without the need for an API key. With smart commit messages, git history insights, interactive search, change analysis, and rich markdown output, Lumen offers a seamless and flexible experience for users across various git workflows.

flo-ai
Flo AI is a Python framework that enables users to build production-ready AI agents and teams with minimal code. It allows users to compose complex AI architectures using pre-built components while maintaining the flexibility to create custom components. The framework supports composable, production-ready, YAML-first, and flexible AI systems. Users can easily create AI agents and teams, manage teams of AI agents working together, and utilize built-in support for Retrieval-Augmented Generation (RAG) and compatibility with Langchain tools. Flo AI also provides tools for output parsing and formatting, tool logging, data collection, and JSON output collection. It is MIT Licensed and offers detailed documentation, tutorials, and examples for AI engineers and teams to accelerate development, maintainability, scalability, and testability of AI systems.

memento-mcp
Memento MCP is a scalable, high-performance knowledge graph memory system designed for LLMs. It offers semantic retrieval, contextual recall, and temporal awareness to any LLM client supporting the model context protocol. The system is built on core concepts like entities and relations, utilizing Neo4j as its storage backend for unified graph and vector search capabilities. With advanced features such as semantic search, temporal awareness, confidence decay, and rich metadata support, Memento MCP provides a robust solution for managing knowledge graphs efficiently and effectively.

emohaa-free-api
Emohaa AI Free API is a free API that allows you to access the Emohaa AI chatbot. Emohaa AI is a powerful chatbot that can understand and respond to a wide range of natural language queries. It can be used for a variety of purposes, such as customer service, information retrieval, and language translation. The Emohaa AI Free API is easy to use and can be integrated into any application. It is a great way to add AI capabilities to your projects without having to build your own chatbot from scratch.
For similar tasks

nexus
Nexus is a tool that acts as a unified gateway for multiple LLM providers and MCP servers. It allows users to aggregate, govern, and control their AI stack by connecting multiple servers and providers through a single endpoint. Nexus provides features like MCP Server Aggregation, LLM Provider Routing, Context-Aware Tool Search, Protocol Support, Flexible Configuration, Security features, Rate Limiting, and Docker readiness. It supports tool calling, tool discovery, and error handling for STDIO servers. Nexus also integrates with AI assistants, Cursor, Claude Code, and LangChain for seamless usage.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.