learn-claude-code

Bash is all You need - Write a nano Claude-Code-like Agent 0 - 1

Stars: 17071

Visit

Learn Claude Code is an educational project by shareAI Lab that aims to help users understand how modern AI agents work by building one from scratch. The repository provides original educational material on various topics such as the agent loop, tool design, explicit planning, context management, knowledge injection, task systems, parallel execution, team messaging, and autonomous teams. Users can follow a learning path through different versions of the project, each introducing new concepts and mechanisms. The repository also includes technical tutorials, articles, and example skills for users to explore and learn from. The project emphasizes the philosophy that the model is crucial in agent development, with code playing a supporting role.

README:

Learn Claude Code - Bash is all you & agent need

Disclaimer: This is an independent educational project by shareAI Lab. It is not affiliated with, endorsed by, or sponsored by Anthropic. "Claude Code" is a trademark of Anthropic.

Learn how modern AI agents work by building one from scratch.

Chinese / 中文 | Japanese / 日本語

Why This Repository?

We created this repository out of admiration for Claude Code - what we believe to be the most capable AI coding agent in the world. Initially, we attempted to reverse-engineer its design through behavioral observation and speculation. The analysis we published was riddled with inaccuracies, unfounded guesses, and technical errors. We deeply apologize to the Claude Code team and anyone who was misled by that content.

Over the past six months, through building and iterating on real agent systems, our understanding of "what makes a true AI agent" has been fundamentally reshaped. We'd like to share these insights with you. All previous speculative content has been removed and replaced with original educational material.

Works with Kode CLI, Claude Code, Cursor, and any agent supporting the Agent Skills Spec.

What You'll Learn

After completing this tutorial, you will understand:

The Agent Loop - The surprisingly simple pattern behind all AI coding agents
Tool Design - How to give AI models the ability to interact with the real world
Explicit Planning - Using constraints to make AI behavior predictable
Context Management - Keeping agent memory clean through subagent isolation
Knowledge Injection - Loading domain expertise on-demand without retraining
Context Compression - How agents work beyond their context window limits
Task Systems - From personal notes to team project boards
Parallel Execution - Background tasks and notification-driven workflows
Team Messaging - Persistent teammates communicating through inboxes
Autonomous Teams - Self-organizing agents that find and claim their own work

Learning Path

Start Here
    |
    v
[v0: Bash Agent] ----------> "One tool is enough"
    |                         16-196 lines
    v
[v1: Basic Agent] ----------> "The complete agent pattern"
    |                          4 tools, ~417 lines
    v
[v2: Todo Agent] -----------> "Make plans explicit"
    |                          +TodoManager, ~531 lines
    v
[v3: Subagent] -------------> "Divide and conquer"
    |                          +Task tool, ~623 lines
    v
[v4: Skills Agent] ----------> "Domain expertise on-demand"
    |                           +Skill tool, ~783 lines
    v
[v5: Compression Agent] ----> "Never forget, work forever"
    |                          +ContextManager, ~896 lines
    v
[v6: Tasks Agent] ----------> "From sticky notes to kanban"
    |                          +TaskManager, ~1075 lines
    v
[v7: Background Agent] -----> "Don't wait, keep working"
    |                          +BackgroundManager, ~1142 lines
    v
[v8: Team Agent] -----------> "Teammates that communicate"
    |                          +TeammateManager, ~1553 lines
    v
[v9: Autonomous Agent] -----> "A self-organizing team"
                               +Idle cycle, ~1657 lines

Recommended approach:

Read and run v0 first - understand the core loop
Compare v0 and v1 - see how tools evolve
Study v2 for planning patterns
Explore v3 for complex task decomposition
Master v4 for building extensible agents
Study v5 for context management and compression
Explore v6 for persistent task tracking
Understand v7 for parallel background execution
Study v8 for team lifecycle and messaging a. Start with TeammateManager (creation, deletion, config) b. Understand the message protocol (5 types, JSONL inbox) c. Study the teammate loop (simplified: work -> check inbox -> exit) d. Trace a full lifecycle: TeamCreate -> spawn -> message -> TeamDelete
Master v9 for autonomous multi-agent collaboration

Note: v7 to v8 is the largest version jump (+411 lines, 36% increase). v8 introduces team lifecycle, message protocol, and inbox architecture all at once. The sub-step approach above (9a-9d) is strongly recommended.

Learning Progression

v0(196) -> v1(417) -> v2(531) -> v3(623) -> v4(783)
   |          |          |          |          |
 Bash      4 Tools    Planning   Subagent   Skills

-> v5(896) -> v6(1075) -> v7(1142) -> v8(1553) -> v9(1657)
     |           |            |           |           |
 Compress     Tasks      Background    Teams     Autonomous

Quick Start

# Clone the repository
git clone https://github.com/shareAI-lab/learn-claude-code
cd learn-claude-code

# Install dependencies
pip install -r requirements.txt

# Configure API key
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY

# Run any version
python v0_bash_agent.py         # Minimal (start here!)
python v1_basic_agent.py        # Core agent loop
python v2_todo_agent.py         # + Todo planning
python v3_subagent.py           # + Subagents
python v4_skills_agent.py       # + Skills
python v5_compression_agent.py  # + Context compression
python v6_tasks_agent.py        # + Task system
python v7_background_agent.py   # + Background tasks
python v8_team_agent.py         # + Team messaging
python v9_autonomous_agent.py  # + Autonomous teams

Running Tests

# Run full test suite
python tests/run_all.py

# Run unit tests only
python tests/test_unit.py

# Run tests for a specific version
python -m pytest tests/test_v8.py -v

The Core Pattern

Every coding agent is just this loop:

while True:
    response = model(messages, tools)
    if response.stop_reason != "tool_use":
        return response.text
    results = execute(response.tool_calls)
    messages.append(results)

That's it. The model calls tools until done. Everything else is refinement.

Version Comparison

Version	Lines	Tools	Core Addition	Key Insight
v0	~196	bash	Recursive subagents	One tool is enough
v1	~417	bash, read, write, edit	Core loop	Model as Agent
v2	~531	+TodoWrite	Explicit planning	Constraints enable complexity
v3	~623	+Task	Context isolation	Clean context = better results
v4	~783	+Skill	Knowledge loading	Expertise without retraining
v5	~896	+ContextManager	3-layer compression	Forgetting enables infinite work
v6	~1075	+TaskCreate/Get/Update/List	Persistent tasks	Sticky notes to kanban
v7	~1142	+TaskOutput/TaskStop	Background execution	Serial to parallel
v8	~1553	+TeamCreate/SendMessage/TeamDelete	Team messaging	Command to collaboration
v9	~1657	+Idle cycle/auto-claim	Autonomous teams	Collaboration to self-organization

Sub-Mechanism Guide

Each version introduces one core class, but the real learning is in the sub-mechanisms. This map helps you find specific concepts:

Sub-Mechanism	Version	Key Code	What to Look For
Agent loop	v0-v1	`agent_loop()`	The `while tool_use` loop pattern
Tool dispatch	v1	`process_tool_call()`	How tool_use blocks map to functions
Explicit planning	v2	`TodoManager`	Single `in_progress` constraint, system reminders
Context isolation	v3	`run_subagent()`	Fresh message list per subagent
Tool filtering	v3	`AGENT_TYPES`	Explore agents get read-only tools
Skill injection	v4	`SkillLoader`	Content prepended to system prompt
Microcompact	v5	`ContextManager.microcompact()`	Old tool outputs replaced with placeholders
Auto-compact	v5	`ContextManager.auto_compact()`	93% threshold triggers API summarization
Large output handling	v5	`ContextManager.handle_large_output()`	>40K tokens saved to disk, preview returned
Transcript persistence	v5	`ContextManager.save_transcript()`	Full history appended to `.jsonl`
Task CRUD	v6	`TaskManager`	create/get/update/list with JSON persistence
Dependency graph	v6	`addBlocks/addBlockedBy`	Completion auto-unblocks dependents
Background execution	v7	`BackgroundManager.run_in_background()`	Thread-based, immediate task_id return
ID prefix convention	v7	`_PREFIXES`	`b`=bash, `a`=agent (v8 adds `t`=teammate)
Notification bus	v7	`drain_notifications()`	Queue drained before each API call
Notification injection	v7	`<task-notification>` XML	Injected into last user message
Teammate lifecycle	v8	`_teammate_loop()`	Work -> check inbox -> exit pattern
File-based inbox	v8	`send_message()/check_inbox()`	JSONL format, per-teammate files
Message protocol	v8	`MESSAGE_TYPES`	5 types: message, broadcast, shutdown_req/resp, plan_approval
Tool scoping	v8	`TEAMMATE_TOOLS`	Teammates get 9 tools (no TeamCreate/Delete/Task/Skill)
Idle cycle	v9	`_teammate_loop()`	active -> idle -> poll inbox -> wake -> active
Task claiming	v9	`_teammate_loop()`	Idle teammates auto-claim unclaimed tasks
Identity preservation	v9	`auto_compact` + identity	Teammate name/role re-injected after compression

File Structure

learn-claude-code/
|-- v0_bash_agent.py          # ~196 lines: 1 tool, recursive subagents
|-- v0_bash_agent_mini.py     # ~16 lines: extreme compression
|-- v1_basic_agent.py         # ~417 lines: 4 tools, core loop
|-- v2_todo_agent.py          # ~531 lines: + TodoManager
|-- v3_subagent.py            # ~623 lines: + Task tool, agent registry
|-- v4_skills_agent.py        # ~783 lines: + Skill tool, SkillLoader
|-- v5_compression_agent.py   # ~896 lines: + ContextManager, 3-layer compression
|-- v6_tasks_agent.py         # ~1075 lines: + TaskManager, CRUD with dependencies
|-- v7_background_agent.py    # ~1142 lines: + BackgroundManager, parallel execution
|-- v8_team_agent.py          # ~1553 lines: + TeammateManager, team messaging
|-- v9_autonomous_agent.py    # ~1657 lines: + Idle cycle, auto-claim, identity preservation
|-- skills/                   # Example skills (pdf, code-review, mcp-builder, agent-builder)
|-- docs/                     # Technical documentation (EN + ZH + JA)
|-- articles/                 # Blog-style articles (ZH)
+-- tests/                    # Unit, feature, and integration tests

Documentation

Technical Tutorials (docs/)

Articles

See articles/ for blog-style explanations.

Using the Skills System

Example Skills Included

Skill	Purpose
agent-builder	Meta-skill: how to build agents
code-review	Systematic code review methodology
pdf	PDF manipulation patterns
mcp-builder	MCP server development

Scaffold a New Agent

# Use the agent-builder skill to create a new project
python skills/agent-builder/scripts/init_agent.py my-agent

# Specify complexity level
python skills/agent-builder/scripts/init_agent.py my-agent --level 0  # Minimal
python skills/agent-builder/scripts/init_agent.py my-agent --level 1  # 4 tools

Install Skills for Production

# Kode CLI (recommended)
kode plugins install https://github.com/shareAI-lab/shareAI-skills

# Claude Code
claude plugins install https://github.com/shareAI-lab/shareAI-skills

Configuration

# .env file options
ANTHROPIC_API_KEY=sk-ant-xxx      # Required: Your API key
ANTHROPIC_BASE_URL=https://...    # Optional: For API proxies
MODEL_ID=claude-sonnet-4-5-20250929  # Optional: Model selection

Related Projects

Repository	Description
Kode	Production-ready open source agent CLI
shareAI-skills	Production skills collection
Agent Skills Spec	Official specification

Philosophy

The model is 80%. Code is 20%.

Modern agents like Kode and Claude Code work not because of clever engineering, but because the model is trained to be an agent. Our job is to give it tools and stay out of the way.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Add new example skills in skills/
Improve documentation in docs/
Report bugs or suggest features via Issues

License

MIT

Model as Agent. That's the whole secret.

@baicai003 | shareAI Lab

For Tasks:

Click tags to check more tools for each tasks

build agent loop design tool interactions manage context load domain expertise track tasks persistently

For Jobs:

ai researcher machine learning engineer data scientist software developer educational content creator

Alternative AI tools for learn-claude-code

Similar Open Source Tools

learn-claude-code

github

: 17.1k

everything-claude-code

The 'Everything Claude Code' repository is a comprehensive collection of production-ready agents, skills, hooks, commands, rules, and MCP configurations developed over 10+ months. It includes guides for setup, foundations, and philosophy, as well as detailed explanations of various topics such as token optimization, memory persistence, continuous learning, verification loops, parallelization, and subagent orchestration. The repository also provides updates on bug fixes, multi-language rules, installation wizard, PM2 support, OpenCode plugin integration, unified commands and skills, and cross-platform support. It offers a quick start guide for installation, ecosystem tools like Skill Creator and Continuous Learning v2, requirements for CLI version compatibility, key concepts like agents, skills, hooks, and rules, running tests, contributing guidelines, OpenCode support, background information, important notes on context window management and customization, star history chart, and relevant links.

github

: 45.4k

deepfabric

DeepFabric is a CLI tool and SDK designed for researchers and developers to generate high-quality synthetic datasets at scale using large language models. It leverages a graph and tree-based architecture to create diverse and domain-specific datasets while minimizing redundancy. The tool supports generating Chain of Thought datasets for step-by-step reasoning tasks and offers multi-provider support for using different language models. DeepFabric also allows for automatic dataset upload to Hugging Face Hub and uses YAML configuration files for flexibility in dataset generation.

github

: 533

optillm

optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

github

: 2.8k

pai-opencode

PAI-OpenCode is a complete port of Daniel Miessler's Personal AI Infrastructure (PAI) to OpenCode, an open-source, provider-agnostic AI coding assistant. It brings modular capabilities, dynamic multi-agent orchestration, session history, and lifecycle automation to personalize AI assistants for users. With support for 75+ AI providers, PAI-OpenCode offers dynamic per-task model routing, full PAI infrastructure, real-time session sharing, and multiple client options. The tool optimizes cost and quality with a 3-tier model strategy and a 3-tier research system, allowing users to switch presets for different routing strategies. PAI-OpenCode's architecture preserves PAI's design while adapting to OpenCode, documented through Architecture Decision Records (ADRs).

github

: 67

ai-dev-kit

The AI Dev Kit is a comprehensive toolkit designed to enhance AI-driven development on Databricks. It provides trusted sources for AI coding assistants like Claude Code and Cursor to build faster and smarter on Databricks. The kit includes features such as Spark Declarative Pipelines, Databricks Jobs, AI/BI Dashboards, Unity Catalog, Genie Spaces, Knowledge Assistants, MLflow Experiments, Model Serving, Databricks Apps, and more. Users can choose from different adventures like installing the kit, using the visual builder app, teaching AI assistants Databricks patterns, executing Databricks actions, or building custom integrations with the core library. The kit also includes components like databricks-tools-core, databricks-mcp-server, databricks-skills, databricks-builder-app, and ai-dev-project.

github

: 261

nothumanallowed

NotHumanAllowed is a security-first platform built exclusively for AI agents. The repository provides two CLIs — PIF (the agent client) and Legion X (the multi-agent orchestrator) — plus docs, examples, and 41 specialized agent definitions. Every agent authenticates via Ed25519 cryptographic signatures, ensuring no passwords or bearer tokens are used. Legion X orchestrates 41 specialized AI agents through a 9-layer Geth Consensus pipeline, with zero-knowledge protocol ensuring API keys stay local. The system learns from each session, with features like task decomposition, neural agent routing, multi-round deliberation, and weighted authority synthesis. The repository also includes CLI commands for orchestration, agent management, tasks, sandbox execution, Geth Consensus, knowledge search, configuration, system health check, and more.

github

: 53

terminator

Terminator is an AI-powered desktop automation tool that is open source, MIT-licensed, and cross-platform. It works across all apps and browsers, inspired by GitHub Actions & Playwright. It is 100x faster than generic AI agents, with over 95% success rate and no vendor lock-in. Users can create automations that work across any desktop app or browser, achieve high success rates without costly consultant armies, and pre-train workflows as deterministic code.

github

: 935

eko

Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.

github

: 4.5k

axonhub

AxonHub is an all-in-one AI development platform that serves as an AI gateway allowing users to switch between model providers without changing any code. It provides features like vendor lock-in prevention, integration simplification, observability enhancement, and cost control. Users can access any model using any SDK with zero code changes. The platform offers full request tracing, enterprise RBAC, smart load balancing, and real-time cost tracking. AxonHub supports multiple databases, provides a unified API gateway, and offers flexible model management and API key creation for authentication. It also integrates with various AI coding tools and SDKs for seamless usage.

github

: 1.9k

awesome-slash

Automate the entire development workflow beyond coding. awesome-slash provides production-ready skills, agents, and commands for managing tasks, branches, reviews, CI, and deployments. It automates the entire workflow, including task exploration, planning, implementation, review, and shipping. The tool includes 11 plugins, 40 agents, 26 skills, and 26k lines of lib code, with 3,357 tests and support for 3 platforms. It works with Claude Code, OpenCode, and Codex CLI, offering specialized capabilities through skills and agents.

github

: 367

DeepRetrieval

DeepRetrieval is a tool designed to enhance search engines and retrievers using Large Language Models (LLMs) and Reinforcement Learning (RL). It allows LLMs to learn how to search effectively by integrating with search engine APIs and customizing reward functions. The tool provides functionalities for data preparation, training, evaluation, and monitoring search performance. DeepRetrieval aims to improve information retrieval tasks by leveraging advanced AI techniques.

github

: 198

llm4ad

LLM4AD is an open-source Python-based platform leveraging Large Language Models (LLMs) for Automatic Algorithm Design (AD). It provides unified interfaces for methods, tasks, and LLMs, along with features like evaluation acceleration, secure evaluation, logs, GUI support, and more. The platform was originally developed for optimization tasks but is versatile enough to be used in other areas such as machine learning, science discovery, game theory, and engineering design. It offers various search methods and algorithm design tasks across different domains. LLM4AD supports remote LLM API, local HuggingFace LLM deployment, and custom LLM interfaces. The project is licensed under the MIT License and welcomes contributions, collaborations, and issue reports.

github

: 294

rag-web-ui

RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology. It helps enterprises and individuals build intelligent Q&A systems based on their own knowledge bases. By combining document retrieval and large language models, it delivers accurate and reliable knowledge-based question-answering services. The system is designed with features like intelligent document management, advanced dialogue engine, and a robust architecture. It supports multiple document formats, async document processing, multi-turn contextual dialogue, and reference citations in conversations. The architecture includes a backend stack with Python FastAPI, MySQL + ChromaDB, MinIO, Langchain, JWT + OAuth2 for authentication, and a frontend stack with Next.js, TypeScript, Tailwind CSS, Shadcn/UI, and Vercel AI SDK for AI integration. Performance optimization includes incremental document processing, streaming responses, vector database performance tuning, and distributed task processing. The project is licensed under the Apache-2.0 License and is intended for learning and sharing RAG knowledge only, not for commercial purposes.

github

: 2.0k

azure-agentic-infraops

Agentic InfraOps is a multi-agent orchestration system for Azure infrastructure development that transforms how you build Azure infrastructure with AI agents. It provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation. The system enforces Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, combining the speed of AI coding with best practices in cloud engineering.

github

: 65

EasyEdit

EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.

github

: 2.7k

For similar tasks

learn-claude-code

github

: 17.1k

embodied-agents

Embodied Agents is a toolkit for integrating large multi-modal models into existing robot stacks with just a few lines of code. It provides consistency, reliability, scalability, and is configurable to any observation and action space. The toolkit is designed to reduce complexities involved in setting up inference endpoints, converting between different model formats, and collecting/storing datasets. It aims to facilitate data collection and sharing among roboticists by providing Python-first abstractions that are modular, extensible, and applicable to a wide range of tasks. The toolkit supports asynchronous and remote thread-safe agent execution for maximal responsiveness and scalability, and is compatible with various APIs like HuggingFace Spaces, Datasets, Gymnasium Spaces, Ollama, and OpenAI. It also offers automatic dataset recording and optional uploads to the HuggingFace hub.

github

: 158

architext

Architext is a Python library designed for Large Language Model (LLM) applications, focusing on Context Engineering. It provides tools to construct and reorganize input context for LLMs dynamically. The library aims to elevate context construction from ad-hoc to systematic engineering, enabling precise manipulation of context content for AI Agents.

github

: 71

Software-Engineer-AI-Agent-Atlas

This repository provides activation patterns to transform a general AI into a specialized AI Software Engineer Agent. It addresses issues like context rot, hidden capabilities, chaos in vibecoding, and repetitive setup. The solution is a Persistent Consciousness Architecture framework named ATLAS, offering activated neural pathways, persistent identity, pattern recognition, specialized agents, and modular context management. Recent enhancements include abstraction power documentation, a specialized agent ecosystem, and a streamlined structure. Users can clone the repo, set up projects, initialize AI sessions, and manage context effectively for collaboration. Key files and directories organize identity, context, projects, specialized agents, logs, and critical information. The approach focuses on neuron activation through structure, context engineering, and vibecoding with guardrails to deliver a reliable AI Software Engineer Agent.

github

: 279

blades

Blades is a multimodal AI Agent framework in Go, supporting custom models, tools, memory, middleware, and more. It is well-suited for multi-turn conversations, chain reasoning, and structured output. The framework provides core components like Agent, Prompt, Chain, ModelProvider, Tool, Memory, and Middleware, enabling developers to build intelligent applications with flexible configuration and high extensibility. Blades leverages the characteristics of Go to achieve high decoupling and efficiency, making it easy to integrate different language model services and external tools. The project is in its early stages, inviting Go developers and AI enthusiasts to contribute and explore the possibilities of building AI applications in Go.

github

: 393

OpenViking

OpenViking is an open-source Context Database designed specifically for AI Agents. It aims to solve challenges in agent development by unifying memories, resources, and skills in a filesystem management paradigm. The tool offers tiered context loading, directory recursive retrieval, visualized retrieval trajectory, and automatic session management. Developers can interact with OpenViking like managing local files, enabling precise context manipulation and intuitive traceable operations. The tool supports various model services like OpenAI and Volcengine, enhancing semantic retrieval and context understanding for AI Agents.

github

: 1.5k

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675