nullclaw
Fastest, smallest, and fully autonomous AI assistant infrastructure written in Zig
Stars: 1066
NullClaw is the smallest fully autonomous AI assistant infrastructure, a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc. It features an impossibly small 678 KB static binary with no runtime or framework overhead, near-zero memory usage, instant startup, true portability across different CPU architectures, and a feature-complete stack with 22+ providers, 11 channels, and 18+ tools. The tool is lean by default, secure by design, fully swappable with core systems as vtable interfaces, and offers no lock-in with OpenAI-compatible provider support and pluggable custom endpoints.
README:
Null overhead. Null compromise. 100% Zig. 100% Agnostic.
678 KB binary. ~1 MB RAM. Boots in <2 ms. Runs on anything with a CPU.
The smallest fully autonomous AI assistant infrastructure — a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc.
678 KB binary · <2 ms startup · 2,843 tests · 22+ providers · 13 channels · Pluggable everything
- Impossibly Small: 678 KB static binary — no runtime, no VM, no framework overhead.
- Near-Zero Memory: ~1 MB peak RSS. Runs comfortably on the cheapest ARM SBCs and microcontrollers.
- Instant Startup: <2 ms on Apple Silicon, <8 ms on a 0.8 GHz edge core.
- True Portability: Single self-contained binary across ARM, x86, and RISC-V. Drop it anywhere, it just runs.
- Feature-Complete: 22+ providers, 11 channels, 18+ tools, hybrid vector+FTS5 memory, multi-layer sandbox, tunnels, hardware peripherals, MCP, subagents, streaming, voice — the full stack.
- Lean by default: Zig compiles to a tiny static binary. No allocator overhead, no garbage collector, no runtime.
- Secure by design: pairing, strict sandboxing (landlock, firejail, bubblewrap, docker), explicit allowlists, workspace scoping, encrypted secrets.
- Fully swappable: core systems are vtable interfaces (providers, channels, tools, memory, tunnels, peripherals, observers, runtimes).
- No lock-in: OpenAI-compatible provider support + pluggable custom endpoints.
Local machine benchmark (macOS arm64, Feb 2026), normalized for 0.8 GHz edge hardware.
| OpenClaw | NanoBot | PicoClaw | ZeroClaw | 🦞 NullClaw | |
|---|---|---|---|---|---|
| Language | TypeScript | Python | Go | Rust | Zig |
| RAM | > 1 GB | > 100 MB | < 10 MB | < 5 MB | ~1 MB |
| Startup (0.8 GHz) | > 500 s | > 30 s | < 1 s | < 10 ms | < 8 ms |
| Binary Size | ~28 MB (dist) | N/A (Scripts) | ~8 MB | 3.4 MB | 678 KB |
| Tests | — | — | — | 1,017 | 2,843 |
| Source Files | ~400+ | — | — | ~120 | ~110 |
| Cost | Mac Mini $599 | Linux SBC ~$50 | Linux Board $10 | Any $10 hardware | Any $5 hardware |
Measured with
/usr/bin/time -lon ReleaseSmall builds. nullclaw is a static binary with zero runtime dependencies.
Reproduce locally:
zig build -Doptimize=ReleaseSmall
ls -lh zig-out/bin/nullclaw
/usr/bin/time -l zig-out/bin/nullclaw --help
/usr/bin/time -l zig-out/bin/nullclaw statusgit clone https://github.com/nullclaw/nullclaw.git
cd nullclaw
zig build -Doptimize=ReleaseSmall
# Quick setup
nullclaw onboard --api-key sk-... --provider openrouter
# Or interactive wizard
nullclaw onboard --interactive
# Chat
nullclaw agent -m "Hello, nullclaw!"
# Interactive mode
nullclaw agent
# Start the gateway (webhook server)
nullclaw gateway # default: 127.0.0.1:3000
nullclaw gateway --port 8080 # custom port
# Start full autonomous runtime
nullclaw daemon
# Check status
nullclaw status
# Run system diagnostics
nullclaw doctor
# Check channel health
nullclaw channel doctor
# Manage background service
nullclaw service install
nullclaw service status
# Migrate memory from OpenClaw
nullclaw migrate openclaw --dry-run
nullclaw migrate openclawDev fallback (no global install): prefix commands with
zig-out/bin/(example:zig-out/bin/nullclaw status).
Every subsystem is a vtable interface — swap implementations with a config change, zero code changes.
| Subsystem | Interface | Ships with | Extend |
|---|---|---|---|
| AI Models | Provider |
22+ providers (OpenRouter, Anthropic, OpenAI, Ollama, Venice, Groq, Mistral, xAI, DeepSeek, Together, Fireworks, Perplexity, Cohere, Bedrock, etc.) |
custom:https://your-api.com — any OpenAI-compatible API |
| Channels | Channel |
CLI, Telegram, Discord, Slack, iMessage, Matrix, WhatsApp, Webhook, IRC, Lark/Feishu, DingTalk, QQ, MaixCam | Any messaging API |
| Memory | Memory |
SQLite with hybrid search (FTS5 + vector cosine similarity), Markdown | Any persistence backend |
| Tools | Tool |
shell, file_read, file_write, file_edit, memory_store, memory_recall, memory_forget, browser_open, screenshot, composio, http_request, hardware_info, hardware_memory, and more | Any capability |
| Observability | Observer |
Noop, Log, File, Multi | Prometheus, OTel |
| Runtime | RuntimeAdapter |
Native, Docker (sandboxed), WASM (wasmtime) | Any runtime |
| Security | Sandbox |
Landlock, Firejail, Bubblewrap, Docker, auto-detect | Any sandbox backend |
| Identity | IdentityConfig |
OpenClaw (markdown), AIEOS v1.1 (JSON) | Any identity format |
| Tunnel | Tunnel |
None, Cloudflare, Tailscale, ngrok, Custom | Any tunnel binary |
| Heartbeat | Engine | HEARTBEAT.md periodic tasks | — |
| Skills | Loader | TOML manifests + SKILL.md instructions | Community skill packs |
| Peripherals | Peripheral |
Serial, Arduino, Raspberry Pi GPIO, STM32/Nucleo | Any hardware interface |
| Cron | Scheduler | Cron expressions + one-shot timers with JSON persistence | — |
All custom, zero external dependencies:
| Layer | Implementation |
|---|---|
| Vector DB | Embeddings stored as BLOB in SQLite, cosine similarity search |
| Keyword Search | FTS5 virtual tables with BM25 scoring |
| Hybrid Merge | Weighted merge (configurable vector/keyword weights) |
| Embeddings |
EmbeddingProvider vtable — OpenAI, custom URL, or noop |
| Hygiene | Automatic archival + purge of stale memories |
| Snapshots | Export/import full memory state for migration |
{
"memory": {
"backend": "sqlite",
"auto_save": true,
"embedding_provider": "openai",
"vector_weight": 0.7,
"keyword_weight": 0.3,
"hygiene_enabled": true,
"snapshot_enabled": false
}
}nullclaw enforces security at every layer.
| # | Item | Status | How |
|---|---|---|---|
| 1 | Gateway not publicly exposed | Done | Binds 127.0.0.1 by default. Refuses 0.0.0.0 without tunnel or explicit allow_public_bind. |
| 2 | Pairing required | Done | 6-digit one-time code on startup. Exchange via POST /pair for bearer token. |
| 3 | Filesystem scoped | Done |
workspace_only = true by default. Null byte injection blocked. Symlink escape detection. |
| 4 | Access via tunnel only | Done | Gateway refuses public bind without active tunnel. Supports Tailscale, Cloudflare, ngrok, or custom. |
| 5 | Sandbox isolation | Done | Auto-detects best backend: Landlock, Firejail, Bubblewrap, or Docker. |
| 6 | Encrypted secrets | Done | API keys encrypted with ChaCha20-Poly1305 using local key file. |
| 7 | Resource limits | Done | Configurable memory, CPU, disk, and subprocess limits. |
| 8 | Audit logging | Done | Signed event trail with configurable retention. |
- Empty allowlist = deny all inbound messages
-
"*"= allow all (explicit opt-in) - Otherwise = exact-match allowlist
Config: ~/.nullclaw/config.json (created by onboard)
OpenClaw compatible: nullclaw uses the same config structure as OpenClaw (snake_case). Providers live under
models.providers, the default model underagents.defaults.model.primary, and channels useaccountswrappers.
{
"default_provider": "openrouter",
"default_temperature": 0.7,
"models": {
"providers": {
"openrouter": { "api_key": "sk-or-..." },
"groq": { "api_key": "gsk_..." },
"anthropic": { "api_key": "sk-ant-...", "base_url": "https://api.anthropic.com" }
}
},
"agents": {
"defaults": {
"model": { "primary": "anthropic/claude-sonnet-4" },
"heartbeat": { "every": "30m" }
},
"list": [
{ "id": "researcher", "model": { "primary": "anthropic/claude-opus-4" }, "system_prompt": "..." }
]
},
"channels": {
"telegram": {
"accounts": {
"main": {
"bot_token": "123:ABC",
"allow_from": ["user1"],
"reply_in_private": true,
"proxy": "socks5://..."
}
}
},
"discord": {
"accounts": {
"main": {
"token": "disc-token",
"guild_id": "12345",
"allow_from": ["user1"],
"allow_bots": false
}
}
},
"irc": {
"accounts": {
"main": {
"host": "irc.libera.chat",
"port": 6697,
"nick": "nullclaw",
"channel": "#nullclaw",
"tls": true,
"allow_from": ["user1"]
}
}
},
"slack": {
"accounts": {
"main": {
"bot_token": "xoxb-...",
"app_token": "xapp-...",
"allow_from": ["user1"]
}
}
}
},
"tools": {
"media": {
"audio": {
"enabled": true,
"language": "ru",
"models": [{ "provider": "groq", "model": "whisper-large-v3" }]
}
}
},
"mcp_servers": {
"filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem"] }
},
"memory": {
"backend": "sqlite",
"auto_save": true,
"embedding_provider": "openai",
"vector_weight": 0.7,
"keyword_weight": 0.3
},
"gateway": {
"port": 3000,
"require_pairing": true,
"allow_public_bind": false
},
"autonomy": {
"level": "supervised",
"workspace_only": true,
"max_actions_per_hour": 20,
"max_cost_per_day_cents": 500
},
"runtime": {
"kind": "native",
"docker": {
"image": "alpine:3.20",
"network": "none",
"memory_limit_mb": 512,
"read_only_rootfs": true
}
},
"tunnel": { "provider": "none" },
"secrets": { "encrypt": true },
"identity": { "format": "openclaw" },
"security": {
"sandbox": { "backend": "auto" },
"resources": { "max_memory_mb": 512, "max_cpu_percent": 80 },
"audit": { "enabled": true, "retention_days": 90 }
}
}| Endpoint | Method | Auth | Description |
|---|---|---|---|
/health |
GET | None | Health check (always public) |
/pair |
POST |
X-Pairing-Code header |
Exchange one-time code for bearer token |
/webhook |
POST | Authorization: Bearer <token> |
Send message: {"message": "your prompt"}
|
/whatsapp |
GET | Query params | Meta webhook verification |
/whatsapp |
POST | None (Meta signature) | WhatsApp incoming message webhook |
| Command | Description |
|---|---|
onboard --api-key sk-... --provider openrouter |
Quick setup with API key and provider |
onboard --interactive |
Full interactive wizard |
onboard --channels-only |
Reconfigure channels/allowlists only |
agent -m "..." |
Single message mode |
agent |
Interactive chat mode |
gateway |
Start webhook server (default: 127.0.0.1:3000) |
daemon |
Start long-running autonomous runtime |
service install|start|stop|status|uninstall |
Manage background service |
doctor |
Diagnose system health |
status |
Show full system status |
channel doctor |
Run channel health checks |
cron list|add|remove|pause|resume|run |
Manage scheduled tasks |
skills list|install|remove|info |
Manage skill packs |
hardware scan|flash|monitor |
Hardware device management |
models list|info|benchmark |
Model catalog |
migrate openclaw [--dry-run] [--source PATH] |
Import memory from OpenClaw workspace |
zig build # Dev build
zig build -Doptimize=ReleaseSmall # Release build (678 KB)
zig build test --summary all # 2,843 testsLanguage: Zig 0.15
Source files: ~110
Lines of code: ~45,000
Tests: 2,843
Binary: 678 KB (ReleaseSmall)
Peak RSS: ~1 MB
Startup: <2 ms (Apple Silicon)
Dependencies: 0 (besides libc + optional SQLite)
src/
main.zig CLI entry point + argument parsing
root.zig Module hierarchy (public API)
config.zig JSON config loader + 30 sub-config structs
agent.zig Agent loop, auto-compaction, tool dispatch
daemon.zig Daemon supervisor with exponential backoff
gateway.zig HTTP gateway (rate limiting, idempotency, pairing)
channels/ 11 channel implementations (telegram, discord, slack, ...)
providers/ 22+ AI provider implementations
memory/ SQLite backend, embeddings, vector search, hygiene, snapshots
tools/ 18 tool implementations
security/ Secrets (ChaCha20), sandbox backends (landlock, firejail, ...)
cron.zig Cron scheduler with JSON persistence
health.zig Component health registry
tunnel.zig Tunnel vtable (cloudflare, ngrok, tailscale, custom)
peripherals.zig Hardware peripheral vtable (serial, Arduino, RPi, Nucleo)
runtime.zig Runtime vtable (native, docker, WASM)
skillforge.zig Skill discovery (GitHub), evaluation, integration
...
Implement a vtable interface, submit a PR:
- New
Provider->src/providers/ - New
Channel->src/channels/ - New
Tool->src/tools/ - New
Memorybackend ->src/memory/ - New
Tunnel->src/tunnel.zig - New
Sandboxbackend ->src/security/ - New
Peripheral->src/peripherals.zig - New
Skill->~/.nullclaw/workspace/skills/<name>/
nullclaw is a pure open-source software project. It has no token, no cryptocurrency, no blockchain component, and no financial instrument of any kind. This project is not affiliated with any token or financial product.
MIT — see LICENSE
nullclaw — Null overhead. Null compromise. Deploy anywhere. Swap anything.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for nullclaw
Similar Open Source Tools
nullclaw
NullClaw is the smallest fully autonomous AI assistant infrastructure, a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc. It features an impossibly small 678 KB static binary with no runtime or framework overhead, near-zero memory usage, instant startup, true portability across different CPU architectures, and a feature-complete stack with 22+ providers, 11 channels, and 18+ tools. The tool is lean by default, secure by design, fully swappable with core systems as vtable interfaces, and offers no lock-in with OpenAI-compatible provider support and pluggable custom endpoints.
headroom
Headroom is a tool designed to optimize the context layer for Large Language Models (LLMs) applications by compressing redundant boilerplate outputs. It intercepts context from tool outputs, logs, search results, and intermediate agent steps, stabilizes dynamic content like timestamps and UUIDs, removes low-signal content, and preserves original data for retrieval only when needed by the LLM. It ensures provider caching works efficiently by aligning prompts for cache hits. The tool works as a transparent proxy with zero code changes, offering significant savings in token count and enabling reversible compression for various types of content like code, logs, JSON, and images. Headroom integrates seamlessly with frameworks like LangChain, Agno, and MCP, supporting features like memory, retrievers, agents, and more.
shodh-memory
Shodh-Memory is a cognitive memory system designed for AI agents to persist memory across sessions, learn from experience, and run entirely offline. It features Hebbian learning, activation decay, and semantic consolidation, packed into a single ~17MB binary. Users can deploy it on cloud, edge devices, or air-gapped systems to enhance the memory capabilities of AI agents.
AnyCrawl
AnyCrawl is a high-performance crawling and scraping toolkit designed for SERP crawling, web scraping, site crawling, and batch tasks. It offers multi-threading and multi-process capabilities for high performance. The tool also provides AI extraction for structured data extraction from pages, making it LLM-friendly and easy to integrate and use.
sparrow
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation
zeroclaw
ZeroClaw is a fast, small, and fully autonomous AI assistant infrastructure built with Rust. It features a lean runtime, cost-efficient deployment, fast cold starts, and a portable architecture. It is secure by design, fully swappable, and supports OpenAI-compatible provider support. The tool is designed for low-cost boards and small cloud instances, with a memory footprint of less than 5MB. It is suitable for tasks like deploying AI assistants, swapping providers/channels/tools, and pluggable everything.
Conduit
Conduit is a unified Swift 6.2 SDK for local and cloud LLM inference, providing a single Swift-native API that can target Anthropic, OpenRouter, Ollama, MLX, HuggingFace, and Apple’s Foundation Models without rewriting your prompt pipeline. It allows switching between local, cloud, and system providers with minimal code changes, supports downloading models from HuggingFace Hub for local MLX inference, generates Swift types directly from LLM responses, offers privacy-first options for on-device running, and is built with Swift 6.2 concurrency features like actors, Sendable types, and AsyncSequence.
vlmrun-hub
VLMRun Hub is a versatile tool for managing and running virtual machines in a centralized manner. It provides a user-friendly interface to easily create, start, stop, and monitor virtual machines across multiple hosts. With VLMRun Hub, users can efficiently manage their virtualized environments and streamline their workflow. The tool offers flexibility and scalability, making it suitable for both small-scale personal projects and large-scale enterprise deployments.
PraisonAI
Praison AI is a low-code, centralised framework that simplifies the creation and orchestration of multi-agent systems for various LLM applications. It emphasizes ease of use, customization, and human-agent interaction. The tool leverages AutoGen and CrewAI frameworks to facilitate the development of AI-generated scripts and movie concepts. Users can easily create, run, test, and deploy agents for scriptwriting and movie concept development. Praison AI also provides options for full automatic mode and integration with OpenAI models for enhanced AI capabilities.
ruby_llm-agents
RubyLLM::Agents is a production-ready Rails engine for building, managing, and monitoring LLM-powered AI agents. It seamlessly integrates with Rails apps, providing features like automatic execution tracking, cost analytics, budget controls, and a real-time dashboard. Users can build intelligent AI agents in Ruby using a clean DSL and support various LLM providers like OpenAI GPT-4, Anthropic Claude, and Google Gemini. The engine offers features such as agent DSL configuration, execution tracking, cost analytics, reliability with retries and fallbacks, budget controls, multi-tenancy support, async execution with Ruby fibers, real-time dashboard, streaming, conversation history, image operations, alerts, and more.
tools
Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.
api-for-open-llm
This project provides a unified backend interface for open large language models (LLMs), offering a consistent experience with OpenAI's ChatGPT API. It supports various open-source LLMs, enabling developers to seamlessly integrate them into their applications. The interface features streaming responses, text embedding capabilities, and support for LangChain, a tool for developing LLM-based applications. By modifying environment variables, developers can easily use open-source models as alternatives to ChatGPT, providing a cost-effective and customizable solution for various use cases.
ai-coders-context
The @ai-coders/context repository provides the Ultimate MCP for AI Agent Orchestration, Context Engineering, and Spec-Driven Development. It simplifies context engineering for AI by offering a universal process called PREVC, which consists of Planning, Review, Execution, Validation, and Confirmation steps. The tool aims to address the problem of context fragmentation by introducing a single `.context/` directory that works universally across different tools. It enables users to create structured documentation, generate agent playbooks, manage workflows, provide on-demand expertise, and sync across various AI tools. The tool follows a structured, spec-driven development approach to improve AI output quality and ensure reproducible results across projects.
augustus
Augustus is a Go-based LLM vulnerability scanner designed for security professionals to test large language models against a wide range of adversarial attacks. It integrates with 28 LLM providers, covers 210+ adversarial attacks including prompt injection, jailbreaks, encoding exploits, and data extraction, and produces actionable vulnerability reports. The tool is built for production security testing with features like concurrent scanning, rate limiting, retry logic, and timeout handling out of the box.
pipelock
Pipelock is an all-in-one security harness designed for AI agents, offering control over network egress, detection of credential exfiltration, scanning for prompt injection, and monitoring workspace integrity. It utilizes capability separation to restrict the agent process with secrets and employs a separate fetch proxy for web browsing. The tool runs a 7-layer scanner pipeline on every request to ensure security. Pipelock is suitable for users running AI agents like Claude Code, OpenHands, or any AI agent with shell access and API keys.
gollama
Gollama is a tool designed for managing Ollama models through a Text User Interface (TUI). Users can list, inspect, delete, copy, and push Ollama models, as well as link them to LM Studio. The application offers interactive model selection, sorting by various criteria, and actions using hotkeys. It provides features like sorting and filtering capabilities, displaying model metadata, model linking, copying, pushing, and more. Gollama aims to be user-friendly and useful for managing models, especially for cleaning up old models.
For similar tasks
explain-openclaw
Explain OpenClaw is a comprehensive documentation repository for the OpenClaw framework, a self-hosted AI assistant platform. It covers various aspects such as plain English explanations, technical architecture, deployment scenarios, privacy and safety measures, security audits, worst-case security scenarios, optimizations, and AI model comparisons. The repository serves as a living knowledge base with beginner-friendly explanations and detailed technical insights for contributors.
nullclaw
NullClaw is the smallest fully autonomous AI assistant infrastructure, a static Zig binary that fits on any $5 board, boots in milliseconds, and requires nothing but libc. It features an impossibly small 678 KB static binary with no runtime or framework overhead, near-zero memory usage, instant startup, true portability across different CPU architectures, and a feature-complete stack with 22+ providers, 11 channels, and 18+ tools. The tool is lean by default, secure by design, fully swappable with core systems as vtable interfaces, and offers no lock-in with OpenAI-compatible provider support and pluggable custom endpoints.
motorhead
Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.
MemGPT
MemGPT is a system that intelligently manages different memory tiers in LLMs in order to effectively provide extended context within the LLM's limited context window. For example, MemGPT knows when to push critical information to a vector database and when to retrieve it later in the chat, enabling perpetual conversations. MemGPT can be used to create perpetual chatbots with self-editing memory, chat with your data by talking to your local files or SQL database, and more.
polyfire-js
Polyfire is an all-in-one managed backend for AI apps that allows users to build AI applications directly from the frontend, eliminating the need for a separate backend. It simplifies the process by providing most backend services in just a few lines of code. With Polyfire, users can easily create chatbots, transcribe audio files, generate simple text, manage long-term memory, and generate images. The tool also offers starter guides and tutorials to help users get started quickly and efficiently.
GPTSwarm
GPTSwarm is a graph-based framework for LLM-based agents that enables the creation of LLM-based agents from graphs and facilitates the customized and automatic self-organization of agent swarms with self-improvement capabilities. The library includes components for domain-specific operations, graph-related functions, LLM backend selection, memory management, and optimization algorithms to enhance agent performance and swarm efficiency. Users can quickly run predefined swarms or utilize tools like the file analyzer. GPTSwarm supports local LM inference via LM Studio, allowing users to run with a local LLM model. The framework has been accepted by ICML2024 and offers advanced features for experimentation and customization.
DistServe
DistServe improves the performance of large language models serving by disaggregating the prefill and decoding computation. It allows setting parallelism configs and scheduling strategies for the two phases independently, handling KV-Cache communication and memory management automatically. Utilizes a high-performance C++ Transformer inference library SwiftTransformer with features like model/pipeline parallelism, FlashAttention, Continuous Batching, and PagedAttention. Supports GPT-2, OPT, and LLaMA2 models.
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.
