nullclaw

Fastest, smallest, and fully autonomous AI assistant infrastructure written in Zig

Stars: 1066

Visit

NullClaw is the smallest fully autonomous AI assistant infrastructure, a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc. It features an impossibly small 678 KB static binary with no runtime or framework overhead, near-zero memory usage, instant startup, true portability across different CPU architectures, and a feature-complete stack with 22+ providers, 11 channels, and 18+ tools. The tool is lean by default, secure by design, fully swappable with core systems as vtable interfaces, and offers no lock-in with OpenAI-compatible provider support and pluggable custom endpoints.

README:

NullClaw

Null overhead. Null compromise. 100% Zig. 100% Agnostic.
678 KB binary. ~1 MB RAM. Boots in <2 ms. Runs on anything with a CPU.

The smallest fully autonomous AI assistant infrastructure — a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc.

678 KB binary · <2 ms startup · 2,843 tests · 22+ providers · 13 channels · Pluggable everything

Features

Impossibly Small: 678 KB static binary — no runtime, no VM, no framework overhead.
Near-Zero Memory: ~1 MB peak RSS. Runs comfortably on the cheapest ARM SBCs and microcontrollers.
Instant Startup: <2 ms on Apple Silicon, <8 ms on a 0.8 GHz edge core.
True Portability: Single self-contained binary across ARM, x86, and RISC-V. Drop it anywhere, it just runs.
Feature-Complete: 22+ providers, 11 channels, 18+ tools, hybrid vector+FTS5 memory, multi-layer sandbox, tunnels, hardware peripherals, MCP, subagents, streaming, voice — the full stack.

Why nullclaw

Lean by default: Zig compiles to a tiny static binary. No allocator overhead, no garbage collector, no runtime.
Secure by design: pairing, strict sandboxing (landlock, firejail, bubblewrap, docker), explicit allowlists, workspace scoping, encrypted secrets.
Fully swappable: core systems are vtable interfaces (providers, channels, tools, memory, tunnels, peripherals, observers, runtimes).
No lock-in: OpenAI-compatible provider support + pluggable custom endpoints.

Benchmark Snapshot

Local machine benchmark (macOS arm64, Feb 2026), normalized for 0.8 GHz edge hardware.

	OpenClaw	NanoBot	PicoClaw	ZeroClaw	🦞 NullClaw
Language	TypeScript	Python	Go	Rust	Zig
RAM	> 1 GB	> 100 MB	< 10 MB	< 5 MB	~1 MB
Startup (0.8 GHz)	> 500 s	> 30 s	< 1 s	< 10 ms	< 8 ms
Binary Size	~28 MB (dist)	N/A (Scripts)	~8 MB	3.4 MB	678 KB
Tests	—	—	—	1,017	2,843
Source Files	~400+	—	—	~120	~110
Cost	Mac Mini $599	Linux SBC ~$50	Linux Board $10	Any $10 hardware	Any $5 hardware

Measured with /usr/bin/time -l on ReleaseSmall builds. nullclaw is a static binary with zero runtime dependencies.

Reproduce locally:

zig build -Doptimize=ReleaseSmall
ls -lh zig-out/bin/nullclaw

/usr/bin/time -l zig-out/bin/nullclaw --help
/usr/bin/time -l zig-out/bin/nullclaw status

Quick Start

git clone https://github.com/nullclaw/nullclaw.git
cd nullclaw
zig build -Doptimize=ReleaseSmall

# Quick setup
nullclaw onboard --api-key sk-... --provider openrouter

# Or interactive wizard
nullclaw onboard --interactive

# Chat
nullclaw agent -m "Hello, nullclaw!"

# Interactive mode
nullclaw agent

# Start the gateway (webhook server)
nullclaw gateway                # default: 127.0.0.1:3000
nullclaw gateway --port 8080    # custom port

# Start full autonomous runtime
nullclaw daemon

# Check status
nullclaw status

# Run system diagnostics
nullclaw doctor

# Check channel health
nullclaw channel doctor

# Manage background service
nullclaw service install
nullclaw service status

# Migrate memory from OpenClaw
nullclaw migrate openclaw --dry-run
nullclaw migrate openclaw

Dev fallback (no global install): prefix commands with zig-out/bin/ (example: zig-out/bin/nullclaw status).

Architecture

Every subsystem is a vtable interface — swap implementations with a config change, zero code changes.

Subsystem	Interface	Ships with	Extend
AI Models	`Provider`	22+ providers (OpenRouter, Anthropic, OpenAI, Ollama, Venice, Groq, Mistral, xAI, DeepSeek, Together, Fireworks, Perplexity, Cohere, Bedrock, etc.)	`custom:https://your-api.com` — any OpenAI-compatible API
Channels	`Channel`	CLI, Telegram, Discord, Slack, iMessage, Matrix, WhatsApp, Webhook, IRC, Lark/Feishu, DingTalk, QQ, MaixCam	Any messaging API
Memory	`Memory`	SQLite with hybrid search (FTS5 + vector cosine similarity), Markdown	Any persistence backend
Tools	`Tool`	shell, file_read, file_write, file_edit, memory_store, memory_recall, memory_forget, browser_open, screenshot, composio, http_request, hardware_info, hardware_memory, and more	Any capability
Observability	`Observer`	Noop, Log, File, Multi	Prometheus, OTel
Runtime	`RuntimeAdapter`	Native, Docker (sandboxed), WASM (wasmtime)	Any runtime
Security	`Sandbox`	Landlock, Firejail, Bubblewrap, Docker, auto-detect	Any sandbox backend
Identity	`IdentityConfig`	OpenClaw (markdown), AIEOS v1.1 (JSON)	Any identity format
Tunnel	`Tunnel`	None, Cloudflare, Tailscale, ngrok, Custom	Any tunnel binary
Heartbeat	Engine	HEARTBEAT.md periodic tasks	—
Skills	Loader	TOML manifests + SKILL.md instructions	Community skill packs
Peripherals	`Peripheral`	Serial, Arduino, Raspberry Pi GPIO, STM32/Nucleo	Any hardware interface
Cron	Scheduler	Cron expressions + one-shot timers with JSON persistence	—

Memory System

All custom, zero external dependencies:

Layer	Implementation
Vector DB	Embeddings stored as BLOB in SQLite, cosine similarity search
Keyword Search	FTS5 virtual tables with BM25 scoring
Hybrid Merge	Weighted merge (configurable vector/keyword weights)
Embeddings	`EmbeddingProvider` vtable — OpenAI, custom URL, or noop
Hygiene	Automatic archival + purge of stale memories
Snapshots	Export/import full memory state for migration

{
  "memory": {
    "backend": "sqlite",
    "auto_save": true,
    "embedding_provider": "openai",
    "vector_weight": 0.7,
    "keyword_weight": 0.3,
    "hygiene_enabled": true,
    "snapshot_enabled": false
  }
}

Security

nullclaw enforces security at every layer.

#	Item	Status	How
1	Gateway not publicly exposed	Done	Binds `127.0.0.1` by default. Refuses `0.0.0.0` without tunnel or explicit `allow_public_bind`.
2	Pairing required	Done	6-digit one-time code on startup. Exchange via `POST /pair` for bearer token.
3	Filesystem scoped	Done	`workspace_only = true` by default. Null byte injection blocked. Symlink escape detection.
4	Access via tunnel only	Done	Gateway refuses public bind without active tunnel. Supports Tailscale, Cloudflare, ngrok, or custom.
5	Sandbox isolation	Done	Auto-detects best backend: Landlock, Firejail, Bubblewrap, or Docker.
6	Encrypted secrets	Done	API keys encrypted with ChaCha20-Poly1305 using local key file.
7	Resource limits	Done	Configurable memory, CPU, disk, and subprocess limits.
8	Audit logging	Done	Signed event trail with configurable retention.

Channel Allowlists

Empty allowlist = deny all inbound messages
"*" = allow all (explicit opt-in)
Otherwise = exact-match allowlist

Configuration

Config: ~/.nullclaw/config.json (created by onboard)

OpenClaw compatible: nullclaw uses the same config structure as OpenClaw (snake_case). Providers live under models.providers, the default model under agents.defaults.model.primary, and channels use accounts wrappers.

{
  "default_provider": "openrouter",
  "default_temperature": 0.7,

  "models": {
    "providers": {
      "openrouter": { "api_key": "sk-or-..." },
      "groq": { "api_key": "gsk_..." },
      "anthropic": { "api_key": "sk-ant-...", "base_url": "https://api.anthropic.com" }
    }
  },

  "agents": {
    "defaults": {
      "model": { "primary": "anthropic/claude-sonnet-4" },
      "heartbeat": { "every": "30m" }
    },
    "list": [
      { "id": "researcher", "model": { "primary": "anthropic/claude-opus-4" }, "system_prompt": "..." }
    ]
  },

  "channels": {
    "telegram": {
      "accounts": {
        "main": {
          "bot_token": "123:ABC",
          "allow_from": ["user1"],
          "reply_in_private": true,
          "proxy": "socks5://..."
        }
      }
    },
    "discord": {
      "accounts": {
        "main": {
          "token": "disc-token",
          "guild_id": "12345",
          "allow_from": ["user1"],
          "allow_bots": false
        }
      }
    },
    "irc": {
      "accounts": {
        "main": {
          "host": "irc.libera.chat",
          "port": 6697,
          "nick": "nullclaw",
          "channel": "#nullclaw",
          "tls": true,
          "allow_from": ["user1"]
        }
      }
    },
    "slack": {
      "accounts": {
        "main": {
          "bot_token": "xoxb-...",
          "app_token": "xapp-...",
          "allow_from": ["user1"]
        }
      }
    }
  },

  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "language": "ru",
        "models": [{ "provider": "groq", "model": "whisper-large-v3" }]
      }
    }
  },

  "mcp_servers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }
  },

  "memory": {
    "backend": "sqlite",
    "auto_save": true,
    "embedding_provider": "openai",
    "vector_weight": 0.7,
    "keyword_weight": 0.3
  },

  "gateway": {
    "port": 3000,
    "require_pairing": true,
    "allow_public_bind": false
  },

  "autonomy": {
    "level": "supervised",
    "workspace_only": true,
    "max_actions_per_hour": 20,
    "max_cost_per_day_cents": 500
  },

  "runtime": {
    "kind": "native",
    "docker": {
      "image": "alpine:3.20",
      "network": "none",
      "memory_limit_mb": 512,
      "read_only_rootfs": true
    }
  },


  "tunnel": { "provider": "none" },
  "secrets": { "encrypt": true },
  "identity": { "format": "openclaw" },

  "security": {
    "sandbox": { "backend": "auto" },
    "resources": { "max_memory_mb": 512, "max_cpu_percent": 80 },
    "audit": { "enabled": true, "retention_days": 90 }
  }
}

Gateway API

Endpoint	Method	Auth	Description
`/health`	GET	None	Health check (always public)
`/pair`	POST	`X-Pairing-Code` header	Exchange one-time code for bearer token
`/webhook`	POST	`Authorization: Bearer <token>`	Send message: `{"message": "your prompt"}`
`/whatsapp`	GET	Query params	Meta webhook verification
`/whatsapp`	POST	None (Meta signature)	WhatsApp incoming message webhook

Commands

Command	Description
`onboard --api-key sk-... --provider openrouter`	Quick setup with API key and provider
`onboard --interactive`	Full interactive wizard
`onboard --channels-only`	Reconfigure channels/allowlists only
`agent -m "..."`	Single message mode
`agent`	Interactive chat mode
`gateway`	Start webhook server (default: `127.0.0.1:3000`)
`daemon`	Start long-running autonomous runtime
`service install\|start\|stop\|status\|uninstall`	Manage background service
`doctor`	Diagnose system health
`status`	Show full system status
`channel doctor`	Run channel health checks
`cron list\|add\|remove\|pause\|resume\|run`	Manage scheduled tasks
`skills list\|install\|remove\|info`	Manage skill packs
`hardware scan\|flash\|monitor`	Hardware device management
`models list\|info\|benchmark`	Model catalog
`migrate openclaw [--dry-run] [--source PATH]`	Import memory from OpenClaw workspace

Development

zig build                          # Dev build
zig build -Doptimize=ReleaseSmall  # Release build (678 KB)
zig build test --summary all       # 2,843 tests

Project Stats

Language:     Zig 0.15
Source files: ~110
Lines of code: ~45,000
Tests:        2,843
Binary:       678 KB (ReleaseSmall)
Peak RSS:     ~1 MB
Startup:      <2 ms (Apple Silicon)
Dependencies: 0 (besides libc + optional SQLite)

Source Layout

src/
  main.zig              CLI entry point + argument parsing
  root.zig              Module hierarchy (public API)
  config.zig            JSON config loader + 30 sub-config structs
  agent.zig             Agent loop, auto-compaction, tool dispatch
  daemon.zig            Daemon supervisor with exponential backoff
  gateway.zig           HTTP gateway (rate limiting, idempotency, pairing)
  channels/             11 channel implementations (telegram, discord, slack, ...)
  providers/            22+ AI provider implementations
  memory/               SQLite backend, embeddings, vector search, hygiene, snapshots
  tools/                18 tool implementations
  security/             Secrets (ChaCha20), sandbox backends (landlock, firejail, ...)
  cron.zig              Cron scheduler with JSON persistence
  health.zig            Component health registry
  tunnel.zig            Tunnel vtable (cloudflare, ngrok, tailscale, custom)
  peripherals.zig       Hardware peripheral vtable (serial, Arduino, RPi, Nucleo)
  runtime.zig           Runtime vtable (native, docker, WASM)
  skillforge.zig        Skill discovery (GitHub), evaluation, integration
  ...

Contributing

Implement a vtable interface, submit a PR:

New Provider -> src/providers/
New Channel -> src/channels/
New Tool -> src/tools/
New Memory backend -> src/memory/
New Tunnel -> src/tunnel.zig
New Sandbox backend -> src/security/
New Peripheral -> src/peripherals.zig
New Skill -> ~/.nullclaw/workspace/skills/<name>/

Disclaimer

nullclaw is a pure open-source software project. It has no token, no cryptocurrency, no blockchain component, and no financial instrument of any kind. This project is not affiliated with any token or financial product.

License

MIT — see LICENSE

nullclaw — Null overhead. Null compromise. Deploy anywhere. Swap anything.

Star History

For Tasks:

Click tags to check more tools for each tasks

deploy ai assistant manage memory run system diagnostics manage background services migrate memory from openclaw

For Jobs:

software developer system administrator ai engineer embedded systems engineer devops engineer

Alternative AI tools for nullclaw

Similar Open Source Tools

nullclaw

github

: 1.1k

headroom

Headroom is a tool designed to optimize the context layer for Large Language Models (LLMs) applications by compressing redundant boilerplate outputs. It intercepts context from tool outputs, logs, search results, and intermediate agent steps, stabilizes dynamic content like timestamps and UUIDs, removes low-signal content, and preserves original data for retrieval only when needed by the LLM. It ensures provider caching works efficiently by aligning prompts for cache hits. The tool works as a transparent proxy with zero code changes, offering significant savings in token count and enabling reversible compression for various types of content like code, logs, JSON, and images. Headroom integrates seamlessly with frameworks like LangChain, Agno, and MCP, supporting features like memory, retrievers, agents, and more.

github

: 586

shodh-memory

Shodh-Memory is a cognitive memory system designed for AI agents to persist memory across sessions, learn from experience, and run entirely offline. It features Hebbian learning, activation decay, and semantic consolidation, packed into a single ~17MB binary. Users can deploy it on cloud, edge devices, or air-gapped systems to enhance the memory capabilities of AI agents.

github

: 85

AnyCrawl

AnyCrawl is a high-performance crawling and scraping toolkit designed for SERP crawling, web scraping, site crawling, and batch tasks. It offers multi-threading and multi-process capabilities for high performance. The tool also provides AI extraction for structured data extraction from pages, making it LLM-friendly and easy to integrate and use.

github

: 2.1k

sparrow

Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation

github

: 5.0k

zeroclaw

ZeroClaw is a fast, small, and fully autonomous AI assistant infrastructure built with Rust. It features a lean runtime, cost-efficient deployment, fast cold starts, and a portable architecture. It is secure by design, fully swappable, and supports OpenAI-compatible provider support. The tool is designed for low-cost boards and small cloud instances, with a memory footprint of less than 5MB. It is suitable for tasks like deploying AI assistants, swapping providers/channels/tools, and pluggable everything.

github

: 15.6k

Conduit

Conduit is a unified Swift 6.2 SDK for local and cloud LLM inference, providing a single Swift-native API that can target Anthropic, OpenRouter, Ollama, MLX, HuggingFace, and Apple’s Foundation Models without rewriting your prompt pipeline. It allows switching between local, cloud, and system providers with minimal code changes, supports downloading models from HuggingFace Hub for local MLX inference, generates Swift types directly from LLM responses, offers privacy-first options for on-device running, and is built with Swift 6.2 concurrency features like actors, Sendable types, and AsyncSequence.

github

: 61

vlmrun-hub

VLMRun Hub is a versatile tool for managing and running virtual machines in a centralized manner. It provides a user-friendly interface to easily create, start, stop, and monitor virtual machines across multiple hosts. With VLMRun Hub, users can efficiently manage their virtualized environments and streamline their workflow. The tool offers flexibility and scalability, making it suitable for both small-scale personal projects and large-scale enterprise deployments.

github

: 495

PraisonAI

Praison AI is a low-code, centralised framework that simplifies the creation and orchestration of multi-agent systems for various LLM applications. It emphasizes ease of use, customization, and human-agent interaction. The tool leverages AutoGen and CrewAI frameworks to facilitate the development of AI-generated scripts and movie concepts. Users can easily create, run, test, and deploy agents for scriptwriting and movie concept development. Praison AI also provides options for full automatic mode and integration with OpenAI models for enhanced AI capabilities.

github

: 5.6k

ruby_llm-agents

RubyLLM::Agents is a production-ready Rails engine for building, managing, and monitoring LLM-powered AI agents. It seamlessly integrates with Rails apps, providing features like automatic execution tracking, cost analytics, budget controls, and a real-time dashboard. Users can build intelligent AI agents in Ruby using a clean DSL and support various LLM providers like OpenAI GPT-4, Anthropic Claude, and Google Gemini. The engine offers features such as agent DSL configuration, execution tracking, cost analytics, reliability with retries and fallbacks, budget controls, multi-tenancy support, async execution with Ruby fibers, real-time dashboard, streaming, conversation history, image operations, alerts, and more.

github

: 77

tools

Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.

github

: 940

api-for-open-llm

This project provides a unified backend interface for open large language models (LLMs), offering a consistent experience with OpenAI's ChatGPT API. It supports various open-source LLMs, enabling developers to seamlessly integrate them into their applications. The interface features streaming responses, text embedding capabilities, and support for LangChain, a tool for developing LLM-based applications. By modifying environment variables, developers can easily use open-source models as alternatives to ChatGPT, providing a cost-effective and customizable solution for various use cases.

github

: 2.3k

ai-coders-context

The @ai-coders/context repository provides the Ultimate MCP for AI Agent Orchestration, Context Engineering, and Spec-Driven Development. It simplifies context engineering for AI by offering a universal process called PREVC, which consists of Planning, Review, Execution, Validation, and Confirmation steps. The tool aims to address the problem of context fragmentation by introducing a single `.context/` directory that works universally across different tools. It enables users to create structured documentation, generate agent playbooks, manage workflows, provide on-demand expertise, and sync across various AI tools. The tool follows a structured, spec-driven development approach to improve AI output quality and ensure reproducible results across projects.

github

: 380

augustus

Augustus is a Go-based LLM vulnerability scanner designed for security professionals to test large language models against a wide range of adversarial attacks. It integrates with 28 LLM providers, covers 210+ adversarial attacks including prompt injection, jailbreaks, encoding exploits, and data extraction, and produces actionable vulnerability reports. The tool is built for production security testing with features like concurrent scanning, rate limiting, retry logic, and timeout handling out of the box.

github

: 120

pipelock

Pipelock is an all-in-one security harness designed for AI agents, offering control over network egress, detection of credential exfiltration, scanning for prompt injection, and monitoring workspace integrity. It utilizes capability separation to restrict the agent process with secrets and employs a separate fetch proxy for web browsing. The tool runs a 7-layer scanner pipeline on every request to ensure security. Pipelock is suitable for users running AI agents like Claude Code, OpenHands, or any AI agent with shell access and API keys.

github

: 103

gollama

Gollama is a tool designed for managing Ollama models through a Text User Interface (TUI). Users can list, inspect, delete, copy, and push Ollama models, as well as link them to LM Studio. The application offers interactive model selection, sorting by various criteria, and actions using hotkeys. It provides features like sorting and filtering capabilities, displaying model metadata, model linking, copying, pushing, and more. Gollama aims to be user-friendly and useful for managing models, especially for cleaning up old models.

github

: 912

For similar tasks

explain-openclaw

Explain OpenClaw is a comprehensive documentation repository for the OpenClaw framework, a self-hosted AI assistant platform. It covers various aspects such as plain English explanations, technical architecture, deployment scenarios, privacy and safety measures, security audits, worst-case security scenarios, optimizations, and AI model comparisons. The repository serves as a living knowledge base with beginner-friendly explanations and detailed technical insights for contributors.

github

: 69

nullclaw

github

: 1.1k

motorhead

Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.

github

: 840

MemGPT

MemGPT is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. MemGPT can be used to create perpetual chatbots with self-editing memory, chat with your data by talking to your local files or SQL database, and more.

github

: 11.9k

polyfire-js

Polyfire is an all-in-one managed backend for AI apps that allows users to build AI applications directly from the frontend, eliminating the need for a separate backend. It simplifies the process by providing most backend services in just a few lines of code. With Polyfire, users can easily create chatbots, transcribe audio files, generate simple text, manage long-term memory, and generate images. The tool also offers starter guides and tutorials to help users get started quickly and efficiently.

github

: 126

GPTSwarm

GPTSwarm is a graph-based framework for LLM-based agents that enables the creation of LLM-based agents from graphs and facilitates the customized and automatic self-organization of agent swarms with self-improvement capabilities. The library includes components for domain-specific operations, graph-related functions, LLM backend selection, memory management, and optimization algorithms to enhance agent performance and swarm efficiency. Users can quickly run predefined swarms or utilize tools like the file analyzer. GPTSwarm supports local LM inference via LM Studio, allowing users to run with a local LLM model. The framework has been accepted by ICML2024 and offers advanced features for experimentation and customization.

github

: 460

DistServe

DistServe improves the performance of large language models serving by disaggregating the prefill and decoding computation. It allows setting parallelism configs and scheduling strategies for the two phases independently, handling KV-Cache communication and memory management automatically. Utilizes a high-performance C++ Transformer inference library SwiftTransformer with features like model/pipeline parallelism, FlashAttention, Continuous Batching, and PagedAttention. Supports GPT-2, OPT, and LLaMA2 models.

github

: 537

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 697

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k