handit.ai

handit.ai

🧠 Open-source optimization engine for LLM agents. Track logs, evaluate behavior, generate insights, and improve agent performance through manual versioning and analysis. Built to make AI actually work.

Stars: 177

Visit
 screenshot

Handit.ai is an autonomous engineer tool designed to fix AI failures 24/7. It catches failures, writes fixes, tests them, and ships PRs automatically. It monitors AI applications, detects issues, generates fixes, tests them against real data, and ships them as pull requestsβ€”all automatically. Users can write JavaScript, TypeScript, Python, and more, and the tool automates what used to require manual debugging and firefighting.

README:

Handit logo Handit logo (dark)

πŸ”₯ The Autonomous Engineer That Fixes Your AI 24/7 πŸ”₯

Handit catches failures, writes fixes, tests them, and ships PRs, automatically. Like having an on-call engineer dedicated to your AI, except it works 24/7.

npm version pypi version license GitHub stars Discord

πŸš€ Quick Start β€’ πŸ“‹ Core Features β€’ πŸ“š Docs β€’ πŸ“… Schedule a Call


🎯 What is handit.ai?

handit.ai solves AI reliability.

Modern AI applications are fragile – they hallucinate, break schemas, leak PII, and fail silently. When your AI fails at 2am, customers complain, and you're debugging blind. Did the model change? Is a tool broken? Is there a logic error? Without visibility, you're playing whack-a-mole with quality issues.

handit.ai is your autonomous engineer that monitors your AI 24/7, detects issues, generates fixes, tests them against real data, and ships them as pull requestsβ€”all automatically.

Write JavaScript, TypeScript, Python, and more. What used to take manual debugging and firefighting now happens automatically with handit.ai.



πŸš€ Quick Start

Get your autonomous engineer up and running in under 5 minutes:

1. Install the handit CLI

npm install -g @handit.ai/cli

2. Start the Setup Process

Navigate to your AI project directory and run:

handit-cli setup

The CLI will guide you through connecting your autonomous engineer:

  • πŸ”§ Connect your handit.ai account
  • πŸ“± Install the handit SDK in your project
  • πŸ”‘ Configure your API key for monitoring
  • 🧠 Connect evaluation models (OpenAI, Together AI, etc.)
  • πŸ”— Connect your GitHub repository for automated PRs

3. Verify Your Setup

βœ… Check your dashboard: Go to dashboard.handit.ai - you should see:

  • Tracing data flowing in real-time
  • Quality scores for evaluated interactions
  • Agent Performance showing baseline metrics

βœ… Confirm GitHub integration: Check your repository - you should see:

  • handit app installed in repository settings
  • Ready for PRs - your autonomous engineer can now create pull requests

That's it! Your autonomous engineer is now monitoring your AI, evaluating quality, and ready to create pull requests with fixes whenever issues are detected.

Manual Setup (Advanced)

Need custom control? Add monitoring decorators manually to your agent functions:

1. Install the SDK

# Python
pip install handit-ai

# JavaScript/TypeScript  
npm install @handit.ai/handit-ai

2. Add monitoring to your main agent function

Python:

from handit_ai import configure, tracing
import os

configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))

@tracing(agent="customer-service")
async def process_customer_request(message):
    # Your existing code here - unchanged
    intent = await classify_intent(message)
    response = await generate_response(intent)
    return response

JavaScript:

import { configure, startTracing, endTracing } from '@handit.ai/handit-ai';

configure({ HANDIT_API_KEY: process.env.HANDIT_API_KEY });

const processCustomerRequest = async (message) => {
  startTracing({ agent: "customer-service" });
  try {
    // Your existing code here - unchanged
    const intent = await classifyIntent(message);
    const response = await generateResponse(intent);
    return response;
  } finally {
    endTracing();
  }
};

That's it! Check dashboard.handit.ai to see your traces.


🎯 How It Works

πŸ” Detect - Real-Time Failure Detection

On-Call 24/7: Monitors every request, catches failures in real-time before customers complain.

  • Hallucinations and incorrect responses
  • Schema breaks and validation errors
  • PII leaks and security issues
  • Performance degradation and timeouts

🧠 Diagnose & Fix - Automated Fix Generation

Insights: Analyzes root causes, generates fixes and tests solutions on actual failure cases in production.

  • Prompt improvements and optimizations
  • Configuration changes and guardrails
  • Code fixes for logic errors
  • Model parameter adjustments

πŸ“ Ship - GitHub-Native Deployment

Opens PRs with proven fixes: You review and merge, or auto-deploy with guardrails.

  • Tested fixes with real performance data
  • Detailed explanations of changes
  • A/B testing results and metrics
  • Rollback capabilities

🎯 Examples

Self-improving AI agent that automatically converts messy, unstructured documents into clean, structured data and CSV tables. Perfect for processing invoices, purchase orders, contracts, medical reports, and any other document types. But here's the kicker - it actually gets better at its job over time.

Source Code β†’

Unstructured to Structured in action

Key Features: ✨

  • Schema Inference πŸ”: AI analyzes documents and creates optimal JSON structure
  • Data Extraction πŸ“Š: Maps document fields to schema with confidence scoring
  • CSV Generation πŸ“‹: Automatically creates organized tables for data visualization
  • Multimodal Support πŸ–ΌοΈ: Handles images, PDFs, and text files
  • Session Management πŸ—‚οΈ: Isolated processing for different document batches
  • Self-improvement 🧠: Handit observes every agent interaction, and if a failure is detected, it automatically fixes it

Technologies: πŸ› οΈ Python, LangGraph, LangChain, OpenAI, FastAPI, Pandas, Handit.ai


🌐 Language Support

Write your AI agents in your preferred language:

Language Status SDK Package
Python βœ… Stable handit-ai>=0.0.62
JavaScript βœ… Stable @handit.ai/handit-ai
TypeScript βœ… Stable @handit.ai/handit-ai
Go βœ… Available HTTP API integration
Any Stack/Framework βœ… Available HTTP API integration (n8n, Zapier, etc.)
Java, C#, Ruby, PHP βœ… Available REST API integration
LangChain & LangGraph βœ… Available Python/JS SDK
LlamaIndex, AutoGen βœ… Available Python/JS SDK + HTTP API
CrewAI, Swarm βœ… Available Python SDK + HTTP API

🎯 Real Results

See how teams eliminated their AI firefighting with handit.ai:

Aspe.ai

ASPE.ai was running a high-stakes agent that was silently failing every time. Within 48 hours of connecting handit, the system identified the issue, tested fixes, and deployed the new prompts.

  • +62.3% Accuracy improvement
  • +36% Response relevance
  • +97.8% Success rate

XBuild

XBuild's AI was suffering from prompt drift that tanked performance across key models. handit stepped in, ran automatic A/B tests, and deployed the top-performing versions.

  • +34.6% Accuracy improvement
  • +19.1% Success rate
  • +6600 Automatic evaluations

⚑ Features: Everything Your Autonomous Engineer Does

Handit isn't just another toolβ€”it's an autonomous team member handling your AI reliability 24/7.

πŸ” Real-Time Failure Detection

Never Miss a Failure: Catches hallucinations, schema breaks, PII leaks, and performance issues as they happen. No more finding out from angry customers.

πŸ€– Automated Fix Generation

Writes Production-Ready Code: Generates prompt improvements, config changes, and guardrails. Tests each fix against real failures before shipping.

πŸ“Š A/B Testing & Validation

Data-Driven Decisions: Every fix is tested on live data. See exact accuracy improvements, latency impacts, and success rates before deploying.

🧠 Fix Registry & Memory

Gets Smarter Over Time: Remembers every failure and successful fix. Instantly applies proven solutions to recurring issues. Your engineer's growing expertise.


🎯 How We Do It: Your Autonomous Engineer in Action

From failure to fix in productionβ€”fully automated, fully auditable, fully open-source.

πŸ” Detect

On-Call 24/7
Monitors every request, catches failures in real-time before customers complain.

🧠 Diagnose & Fix

Insights
Analyzes root causes, generates fixes and tests solutions on actual failure cases in production.

πŸš€ Ship

GitHub-Native
Opens PRs with proven fixes. You review and merge, or auto-deploy with guardrails.


πŸ“ˆ Effectiveness: Real Engineers. Real Results.

See how teams eliminated their AI firefighting with Handit.

🏒 ASPE.ai

ASPE.ai was running a high-stakes agent that was silently failing every time. Within 48 hours of connecting Handit, the system identified the issue, tested fixes, and deployed the new prompts.

  • +62.3% Accuracy
  • +36% Response relevance
  • +97.8% Success rate

🏒 XBuild

XBuild's AI was suffering from prompt drift that tanked performance across key models. Handit stepped in, ran automatic A/B tests, and deployed the top-performing versions.

  • +34.6% Accuracy
  • +19.1% Success rate
  • +6600 Automatic evaluations

πŸ› οΈ Advanced: Manual Setup

Advanced users only. If you need custom control over your autonomous engineer setup, you can manually add monitoring code instead of using the CLI.

When to use manual setup:

  • Custom deployment environments
  • Complex agent architectures
  • Need granular control over monitoring

Quick manual setup:

Troubleshooting

❌ CLI command not found?

  • Solution: Install Node.js first: node --version (should show v16+)
  • If still failing: npm uninstall -g @handit.ai/cli && npm install -g @handit.ai/cli

❌ "Authentication failed" during setup?

  • Solution: Check your Handit.ai account credentials at dashboard.handit.ai
  • If still failing: Try logging out and back in to your Handit account

❌ No traces appearing in dashboard?

  • Solution: Run handit-cli setup again to regenerate configuration
  • Check: Your generated code is actually being executed (not just imported)
  • Verify: API key was set correctly: echo $HANDIT_API_KEY

❌ Evaluations not running?

  • Solution: Re-run handit-cli evaluators-setup to verify model connections
  • Check: Model tokens have sufficient credits in your provider dashboard
  • Verify: Your AI is receiving traffic (evaluations only run on active agents)

❌ GitHub app installation failed?

  • Solution: Ensure you have admin access to the repository
  • Try: handit-cli github again to reinstall the app
  • Check: Repository permissions in GitHub Settings β†’ Applications

Need Help?


🎯 Examples

πŸ† ChessArena.ai - Full-Featured Production App

A complete chess platform benchmarking LLM performance with real-time evaluation.

Live Website β†’ | Source Code β†’

Built from scratch to production deployment, featuring:

πŸ” Authentication & user management
πŸ€– Multi-agent LLM evaluation (OpenAI, Claude, Gemini, Grok)
🐍 Python engine integration (Stockfish chess evaluation)
πŸ“Š Real-time streaming with live move updates and scoring
🎨 Modern React UI with interactive chess boards
πŸ”„ Event-driven workflows connecting TypeScript APIs to Python processors
πŸ“ˆ Live leaderboards with move-by-move quality scoring
πŸš€ Production deployment on Handit Cloud

πŸ“š More Examples

Example Description
AI Research Agent Web research with iterative analysis
Streaming Chatbot Real-time AI responses
Gmail Automation Smart email processing
GitHub PR Manager Automated PR workflows
Finance Agent Real-time market analysis

Features demonstrated: Multi-language workflows β€’ Real-time streaming β€’ AI integration β€’ Production deployment

View all 20+ examples β†’


🌐 Language Support

Write your AI agents in your preferred language:

Language Status SDK Package
Python βœ… Stable handit-ai>=0.0.62
JavaScript βœ… Stable @handit.ai/handit-ai
TypeScript βœ… Stable @handit.ai/handit-ai
Go βœ… Available HTTP API integration
Any Stack/Framework βœ… Available HTTP API integration (n8n, Zapier, etc.)
Java, C#, Ruby, PHP βœ… Available REST API integration
LangChain & LangGraph βœ… Available Python/JS SDK
LlamaIndex, AutoGen βœ… Available Python/JS SDK + HTTP API
CrewAI, Swarm βœ… Available Python SDK + HTTP API

πŸ† Trusted by Teams Who Ship Production AI

Open source because you need to trust what pushes to prod.

Trusted by Teams

Stop Being Your AI's On-Call Engineer
Let Handit handle the 2am failures while you focus on building features. Open source. GitHub-native. Starts working in minutes!


πŸ’¬ Get Help

🀝 Contributing

πŸš€ Roadmap

We're building Handit in the open, and we'd love for you to be a part of the journey.

Week Focus Status
1 Backend foundation + infrastructure βœ”οΈ Done
2 Prompt versioning βœ”οΈ Done
3 Auto-evaluation + insight generation βœ”οΈ Done
4 Deployment setup + UI + public release βœ”οΈ Done

We welcome contributions! Whether it's:

  • πŸ› Bug fixes and improvements
  • ✨ New features
  • πŸ“š Documentation and examples
  • 🌍 Language support additions
  • 🎨 Dashboard UI enhancements

πŸ‘₯ Contributors

Thanks to everyone helping bring Handit to life:

Want to appear here? Star the repo, follow along, and make your first PR πŸ™Œ


🌟 Ready to auto-improve your AI?

πŸš€ Get Started Now β€’ πŸ“– Read the Docs β€’ πŸ’¬ Join Discord β€’ πŸ“… Schedule a Call


Star History Chart

Built with ❀️ by the Handit team β€’ Star us if you find Handit useful! ⭐


🚧 Roadmap

We have a public roadmap for handit.ai. You can view it here.

Feel free to add comments to the issues, or create a new issue if you have a feature request.

Feature Status Link Description
Advanced Prompt Optimization Planned #485 Multi-model prompt optimization
Custom Evaluation Metrics Planned #495 User-defined evaluation criteria
Real-time Dashboard Planned #497 Live monitoring interface
Auto-deployment Planned #476 Automated deployment with guardrails
Multi-agent Support Planned #477 Complex agent orchestration
Custom Integrations Planned #480 Third-party tool integrations

πŸ“š Resources

  • πŸ“– Documentation - Complete guides and API reference
  • πŸ’¬ Discord - Community support and discussions
  • πŸ› GitHub Issues - Bug reports and feature requests
  • πŸ—ΊοΈ Roadmap - Upcoming features and progress
  • πŸŽ₯ Demo - See handit in action

🀝 Contributing

We welcome contributions! Check our Contributing Guide to get started.

Development Setup

# Clone the repository
git clone https://github.com/handit-ai/autonom.git
cd autonom

# Install dependencies
npm install

# Start development environment
npm run dev

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ†˜ Need Help?


Stop Being Your AI's On-Call Engineer

Let handit.ai handle the 2am failures while you focus on building features.

Get Started Free β€’ View on GitHub β€’ Join Discord

Open source. GitHub-native. Starts working in minutes.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for handit.ai

Similar Open Source Tools

For similar tasks

For similar jobs