
hound
Language-agnostic AI code security analysis that replicates the cognitive processes of expert auditors
Stars: 232

Hound is a security audit automation pipeline for AI-assisted code review that mirrors how expert auditors think, learn, and collaborate. It features graph-driven analysis, sessionized audits, provider-agnostic models, belief system and hypotheses, precise code grounding, and adaptive planning. The system employs a senior/junior auditor pattern where the Scout actively navigates the codebase and annotates knowledge graphs while the Strategist handles high-level planning and vulnerability analysis. Hound is optimized for small-to-medium sized projects like smart contract applications and is language-agnostic.
README:
Autonomous agents for code security auditing
Overview • Configuration • Workflow • Chatbot • Contributing
Hound is a security audit automation pipeline for AI-assisted code review that mirrors how expert auditors think, learn, and collaborate.
- Graph-driven analysis: adaptive architecture/access control/value-flow graphs with code-grounded annotations
- Senior/Junior loop: Strategist plans investigations; Scout executes targeted code reads
- Precise evidence: findings reference exact files, functions, and code spans
- Sessionized audits: resumable runs with coverage metrics and token usage
- Provider‑agnostic models: OpenAI, Anthropic, Google, XAI, plus mock for offline
Hound’s analysis loop is organized around graphs, beliefs, and focused investigations:
-
Relational Knowledge Graphs (adaptive, language‑agnostic)
- A model‑driven graph builder constructs and refines interconnected graphs of the system (architecture, access control, value flows, state). It adapts to the target scope and programming language without relying on brittle parsers or CFG generators.
- The graph evolves as the audit progresses: nodes accrue observations and assumptions; relationships are added or revised as new code is inspected.
- The agent reasons at a high level, then “zooms in” on a subgraph to pull only the precise code slices needed at that moment. Rather than pure embedding search, the graph provides structured context for targeted retrieval and reasoning.
-
Belief System and Hypotheses
- Instead of one‑shot judgments, Hound maintains beliefs (hypotheses) that evolve as evidence accumulates. Confidence adjusts up or down when new observations support or contradict an assumption.
- This lets the agent keep weak but plausible leads “alive” without overcommitting, then promote or prune them as the audit uncovers more code and context.
- The result is steadier calibration over longer runs: fewer premature rejections, better recall of subtle issues, and a transparent trail from initial hunch to conclusion.
-
Precise Code Grounding
- Every graph element and annotation links to exact files, functions, and call sites. Investigations retrieve only the relevant code spans, maintaining attention on concrete implementation details rather than broad semantic similarity.
-
Adaptive Planning
- Planning reacts to discoveries: finding one issue seeds targeted searches for related classes of bugs (e.g., privilege checks, reentrancy surfaces, value transfer patterns).
- Coverage tracking ensures systematic exploration while allowing strategic pivots toward the most promising areas.
The system employs a senior/junior auditor pattern: the Scout (junior) actively navigates the codebase and annotates the knowledge graphs as it explores, while the Strategist (senior) handles high‑level planning and vulnerability analysis, directing and refocusing the Scout as needed. This mirrors real audit teams where seniors guide and juniors investigate.
Codebase size considerations: While Hound is language-agnostic and can analyze any codebase, it's optimized for small-to-medium sized projects like typical smart contract applications. Large enterprise codebases may exceed context limits and require selective analysis of specific subsystems.
pip install -r requirements.txt
Set up your API keys, e.g.:
export OPENAI_API_KEY=your_key_here
Copy the example configuration and edit as needed:
cp hound/config.yaml.example hound/config.yaml
# then edit hound/config.yaml to select providers/models and options
Notes:
- Defaults work out-of-the-box; you can override many options via CLI flags.
- Keep API keys out of the repo;
API_KEYS.txt
is gitignored and can be sourced.
Note: Audit quality scales with time and model capability. Use longer runs and advanced models for more complete results.
Projects organize your audits and store all analysis data:
# Create a project from local code
./hound.py project create myaudit /path/to/code
# List all projects
./hound.py project ls
# View project details and coverage
./hound.py project info myaudit
Hound analyzes your codebase and builds aspect‑oriented knowledge graphs that serve as the foundation for all subsequent analysis.
Recommended (one‑liner):
# Auto-generate a default set of graphs (up to 5) and refine
# Strongly recommended: pass a whitelist of files (comma-separated)
./hound.py graph build myaudit --auto \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
# View generated graphs
./hound.py graph ls myaudit
Alternative (manual guidance):
# 1) Initialize the baseline SystemArchitecture graph
./hound.py graph build myaudit --init \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
# 2) Add a specific graph with your own description (exactly one graph)
./hound.py graph custom myaudit \
"Call graph focusing on function call relationships across modules" \
--iterations 2 \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
# (Repeat 'graph custom' for additional targeted graphs as needed)
Operational notes:
-
--auto
always includes the SystemArchitecture graph as the first graph. You do not need to run--init
in addition to--auto
. - If
--init
is used and aSystemArchitecture
graph already exists, initialization is skipped. Use--auto
to add more graphs, or remove existing graphs first if you want a clean re‑init. - When running
--auto
and graphs already exist, Hound asks for confirmation before updating/overwriting graphs (including SystemArchitecture). To clear graphs:
./hound.py graph rm myaudit --all # remove all graphs
./hound.py graph rm myaudit --name SystemArchitecture # remove one graph
- For large repos, you can constrain scope with
--files
(comma‑separated whitelist) alongside either approach.
Whitelists (strongly recommended):
- Always pass a whitelist of input files via
--files
. For the best results, the selected files should fit in the model’s available context window; whitelisting keeps the graph builder focused and avoids token overflows. - If you do not pass
--files
, Hound will consider all files in the repository. On large codebases this triggers sampling and may degrade coverage/quality. -
--files
expects a comma‑separated list of paths relative to the repo root.
Examples:
# Manual (small projects)
./hound.py graph build myaudit --auto \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
# Generate a whitelist automatically (recommended for larger projects)
python whitelist_builder.py \
--input /path/to/repo \
--limit-loc 20000 \
--output whitelists/myaudit
# Use the generated list (newline-separated) as a comma list for --files
./hound.py graph build myaudit --auto \
--files "$(tr '\n' ',' < whitelists/myaudit | sed 's/,$//')"
- Refine existing graphs (resume building):
You can resume/refine an existing graph without creating new ones using graph refine
. This skips discovery and saves updates incrementally.
# Refine a single graph by name (internal or display)
./hound.py graph refine myaudit SystemArchitecture \
--iterations 2 \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
# Refine all existing graphs
./hound.py graph refine myaudit --all --iterations 2 \
--files "src/A.sol,src/B.sol,src/utils/Lib.sol"
Notes on refinement:
- Argument order is
graph refine <project> [NAME]
. Example:./hound.py graph refine fider AuthorizationMap
. If you put the name first, it will be treated as the project. - Refinement uses the stored whitelist from the initial ingestion by default. Passing a new
--files
list will rebuild ingestion for that run with the new whitelist. - Refinement prioritizes connecting and improving existing structure. It minimizes new node creation and, when refining a single graph, only accepts new nodes that immediately connect to existing nodes (kept to a small number). For broader expansion, prefer
graph build --auto
.
What happens: Hound inspects the codebase and creates specialized graphs for different aspects (e.g., access control, value flows, state management). Each graph contains:
- Nodes: Key concepts, functions, and state variables
- Edges: Relationships between components
- Annotations: Observations and assumptions tied to specific code locations
- Code cards: Extracted code snippets linked to graph elements
These graphs enable Hound to reason about high-level patterns while maintaining precise code grounding.
The audit phase uses the senior/junior pattern with planning and investigation:
# Run a full audit with strategic planning (new session)
./hound.py agent audit myaudit
# Set time limit (in minutes)
./hound.py agent audit myaudit --time-limit 30
# Start with telemetry (connect the Chatbot UI to steer)
./hound.py agent audit myaudit --telemetry --time-limit 30
# Enable debug logging (captures all prompts/responses)
./hound.py agent audit myaudit --debug
# Attach to an existing session and continue where you left off
./hound.py agent audit myaudit --session <session_id>
Tip: When started with --telemetry
, you can connect the Chatbot UI and steer the audit interactively (see Chatbot section above).
Key parameters:
- --time-limit: Stop after N minutes (useful for incremental audits)
- --plan-n: Number of investigations per planning batch
- --session: Resume a specific session (continues coverage/planning)
-
--debug: Save all LLM interactions to
.hound_debug/
Audit duration and depth: Hound is designed to deliver increasingly complete results with longer audits. The analyze step can range from:
- Quick scan: 1 hour with fast models (gpt-4o-mini) for initial findings
- Standard audit: 4-8 hours with balanced models for comprehensive coverage
- Deep audit: Multiple days with advanced models (GPT-5) for exhaustive analysis
The quality and duration depend heavily on the models used. Faster models provide quick results but may miss subtle issues, while advanced reasoning models find deeper vulnerabilities but require more time.
What happens during an audit:
The audit is a dynamic, iterative process with continuous interaction between Strategist and Scout:
-
Initial Planning (Strategist)
- Reviews all knowledge graphs and annotations
- Identifies contradictions between assumptions and observations
- Creates a batch of prioritized investigations (default: 5)
- Focus areas: access control violations, value transfer risks, state inconsistencies
-
Investigation Loop (Scout + Strategist collaboration)
For each investigation in the batch:
- Scout explores: Loads relevant graph nodes, analyzes code
-
Scout escalates: When deep analysis needed, calls Strategist via
deep_think
- Strategist analyzes: Reviews Scout's collected context, forms vulnerability hypotheses
- Hypotheses form: Findings are added to global store
- Coverage updates: Tracks visited nodes and analyzed code
-
Adaptive Replanning
After completing a batch:
- Strategist reviews new findings and updated coverage
- Reorganizes priorities based on discoveries
- If vulnerability found, searches for related issues
- Plans next batch of investigations
- Continues until coverage goals met or no promising leads remain
-
Session Management
- Unique session ID tracks the entire audit lifecycle
- Coverage metrics show exploration progress
- All findings accumulate in hypothesis store
- Token usage tracked per model and investigation
Example output:
Planning Next Investigations...
1. [P10] Investigate role management bypass vulnerabilities
2. [P9] Check for reentrancy in value transfer functions
3. [P8] Analyze emergency function privilege escalation
Coverage Statistics:
Nodes visited: 23/45 (51.1%)
Cards analyzed: 12/30 (40.0%)
Hypotheses Status:
Total: 15
High confidence: 8
Confirmed: 3
Check audit progress and findings at any time during the audit. If you started the agent with --telemetry
, you can also monitor and steer via the Chatbot UI:
- Open http://127.0.0.1:5280 and attach to the running instance
- Watch live Activity, Plan, and Findings
- Use the Steer form to guide the next investigations
# View current hypotheses (findings)
./hound.py project ls-hypotheses myaudit
# See detailed hypothesis information
./hound.py project hypotheses myaudit --details
# List hypotheses with confidence ratings
./hound.py project hypotheses myaudit
# Check coverage statistics
./hound.py project coverage myaudit
# View session details
./hound.py project sessions myaudit --list
Understanding hypotheses: Each hypothesis represents a potential vulnerability with:
- Confidence score: 0.0-1.0 indicating likelihood of being a real issue
-
Status:
proposed
(initial),investigating
,confirmed
,rejected
- Severity: critical, high, medium, low
- Type: reentrancy, access control, logic error, etc.
- Annotations: Exact code locations and evidence
For specific concerns, run focused investigations without full planning:
# Investigate a specific concern
./hound.py agent investigate "Check for reentrancy in withdraw function" myaudit
# Quick investigation with fewer iterations
./hound.py agent investigate "Analyze access control in admin functions" myaudit \
--iterations 5
# Use specific models for investigation
./hound.py agent investigate "Review emergency functions" myaudit \
--model gpt-4o \
--strategist-model gpt-5
When to use targeted investigations:
- Following up on specific concerns after initial audit
- Testing a hypothesis about a particular vulnerability
- Quick checks before full audit
- Investigating areas not covered by automatic planning
Note: These investigations still update the hypothesis store and coverage tracking.
A reasoning model reviews all hypotheses and updates their status based on evidence:
# Run finalization with quality review
./hound.py finalize myaudit
# Customize confidence threshold
./hound.py finalize myaudit -t 0.7 --model gpt-4o
# Include all findings (not just confirmed)
# (Use on the report command, not finalize)
./hound.py report myaudit --include-all
What happens during finalization:
- A reasoning model (default: GPT-5) reviews each hypothesis
- Evaluates the evidence and code context
- Updates status to
confirmed
orrejected
based on analysis - Adjusts confidence scores based on evidence strength
- Prepares findings for report generation
Important: By default, only confirmed
findings appear in the final report. Use --include-all
to include all hypotheses regardless of status.
Create and manage proof-of-concept exploits for confirmed vulnerabilities:
# Generate PoC prompts for confirmed vulnerabilities
./hound.py poc make-prompt myaudit
# Generate for a specific hypothesis
./hound.py poc make-prompt myaudit --hypothesis hyp_12345
# Import existing PoC files
./hound.py poc import myaudit hyp_12345 exploit.sol test.js \
--description "Demonstrates reentrancy exploit"
# List all imported PoCs
./hound.py poc list myaudit
The PoC workflow:
-
make-prompt: Generates detailed prompts for coding agents (like Claude Code)
- Includes vulnerable file paths (project-relative)
- Specifies exact functions to target
- Provides clear exploit requirements
- Saves prompts to
poc_prompts/
directory
-
import: Links PoC files to specific vulnerabilities
- Files stored in
poc/[hypothesis-id]/
- Metadata tracks descriptions and timestamps
- Multiple files per vulnerability supported
- Files stored in
-
Automatic inclusion: Imported PoCs appear in reports with syntax highlighting
Produce comprehensive audit reports with all findings and PoCs:
# Generate HTML report (includes imported PoCs)
./hound.py report myaudit
# Include all hypotheses, not just confirmed
./hound.py report myaudit --include-all
# Export report to specific location
./hound.py report myaudit --output /path/to/report.html
Report contents:
- Executive summary: High-level overview and risk assessment
- System architecture: Understanding of the codebase structure
-
Findings: Detailed vulnerability descriptions (only
confirmed
by default) - Code snippets: Relevant vulnerable code with line numbers
- Proof-of-concepts: Any imported PoCs with syntax highlighting
- Severity distribution: Visual breakdown of finding severities
- Recommendations: Suggested fixes and improvements
Note: The report uses a professional dark theme and includes all imported PoCs automatically.
Each audit run operates under a session with comprehensive tracking and per-session planning:
- Planning is stored in a per-session PlanStore with statuses:
planned
,in_progress
,done
,dropped
,superseded
. - Existing
planned
items are executed first; Strategist only tops up new items to reach your--plan-n
. - On resume, any stale
in_progress
items are reset toplanned
; completed items remaindone
and are not duplicated. - Completed investigations, coverage, and hypotheses are fed back into planning to avoid repeats and guide prioritization.
# View session details
./hound.py project sessions myaudit <session_id>
# List and inspect sessions
./hound.py project sessions myaudit --list
./hound.py project sessions myaudit <session_id>
# Show planned investigations for a session (Strategist PlanStore)
./hound.py project plan myaudit <session_id>
# Session data includes:
# - Coverage statistics (nodes/cards visited)
# - Investigation history
# - Token usage by model
# - Planning decisions
# - Hypothesis formation
Sessions are stored in ~/.hound/projects/myaudit/sessions/
and contain:
-
session_id
: Unique identifier -
coverage
: Visited nodes and analyzed code -
investigations
: All executed investigations -
planning_history
: Strategic decisions made -
token_usage
: Detailed API usage metrics
Resume/attach to an existing session during an audit run by passing the session ID:
# Attach to a specific session and continue auditing under it
./hound.py agent audit myaudit --session <session_id>
When you attach to a session, its status is set to active
while the audit runs and finalized on completion (completed
or interrupted
if a time limit was hit). Any in_progress
plan items are reset to planned
so you can continue cleanly.
# Start an audit (creates a session automatically)
./hound.py agent audit myaudit
# List sessions to get the session id
./hound.py project sessions myaudit --list
# Show planned investigations for that session
./hound.py project plan myaudit <session_id>
# Attach later and continue planning/execution under the same session
./hound.py agent audit myaudit --session <session_id>
Hound ships with a lightweight web UI for steering and monitoring a running audit session. It discovers local runs via a simple telemetry registry and streams status/decisions live.
Prerequisites:
- Set API keys (at least
OPENAI_API_KEY
):source ../API_KEYS.txt
or export manually - Install Python deps in this submodule:
pip install -r requirements.txt
- Start the agent with telemetry enabled
# From the hound/ directory
./hound.py agent audit myaudit --telemetry --debug
# Notes
# - The --telemetry flag exposes a local SSE/control endpoint and registers the run
# - Optional: ensure the registry dir matches the chatbot by setting:
# export HOUND_REGISTRY_DIR="$HOME/.local/state/hound/instances"
- Launch the chatbot server
# From the hound/ directory
python chatbot/run.py
# Optional: customize host/port
HOST=0.0.0.0 PORT=5280 python chatbot/run.py
Open the UI: http://127.0.0.1:5280
- Select the running instance and stream activity
- The input next to “Start” lists detected instances as
project_path | instance_id
. - Click “Start” to attach; the UI auto‑connects the realtime channel and begins streaming decisions/results.
- The lower panel has tabs:
- Activity: live status/decisions
- Plan: current strategist plan (✓ done, ▶ active, • pending)
- Findings: hypotheses with confidence; you can Confirm/Reject manually
- Steer the audit
- Use the “Steer” form (e.g., “Investigate reentrancy across the whole app next”).
- Steering is queued at
<project>/.hound/steering.jsonl
and consumed exactly once when applied. - Broad, global instructions may preempt the current investigation and trigger immediate replanning.
Troubleshooting
- No instances in dropdown: ensure you started the agent with
--telemetry
. - Wrong or stale project shown: clear the input; the UI defaults to the most recent alive instance.
- Registry mismatch: confirm both processes print the same
Using registry dir:
or setHOUND_REGISTRY_DIR
for both. - Raw API: open
/api/instances
in the browser to inspect entries (includesalive
flag and registry path).
Hypotheses are the core findings that accumulate across sessions:
# List hypotheses with confidence scores
./hound.py project hypotheses myaudit
# View with full details
./hound.py project hypotheses myaudit --details
# Update hypothesis status
./hound.py project set-hypothesis-status myaudit hyp_12345 confirmed
# Reset hypotheses (creates backup)
./hound.py project reset-hypotheses myaudit
# Force reset without confirmation
./hound.py project reset-hypotheses myaudit --force
Hypothesis statuses:
- proposed: Initial finding, needs review
- investigating: Under active investigation
- confirmed: Verified vulnerability
- rejected: False positive
- resolved: Fixed in code
Override default models per component:
# Use different models for each role
./hound.py agent audit myaudit \
--platform openai --model gpt-4o-mini \ # Scout
--strategist-platform anthropic --strategist-model claude-3-opus # Strategist
Capture all LLM interactions for analysis:
# Enable debug logging
./hound.py agent audit myaudit --debug
# Debug logs saved to .hound_debug/
# Includes HTML reports with all prompts and responses
Monitor audit progress and completeness:
# View coverage statistics
./hound.py project coverage myaudit
# Coverage shows:
# - Graph nodes visited vs total
# - Code cards analyzed vs total
# - Percentage completion
See CONTRIBUTING.md for development setup and guidelines.
Apache 2.0 with additional terms:
You may use Hound however you want, except selling it as an online service or as an appliance - that requires written permission from the author.
- See LICENSE for details.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for hound
Similar Open Source Tools

hound
Hound is a security audit automation pipeline for AI-assisted code review that mirrors how expert auditors think, learn, and collaborate. It features graph-driven analysis, sessionized audits, provider-agnostic models, belief system and hypotheses, precise code grounding, and adaptive planning. The system employs a senior/junior auditor pattern where the Scout actively navigates the codebase and annotates knowledge graphs while the Strategist handles high-level planning and vulnerability analysis. Hound is optimized for small-to-medium sized projects like smart contract applications and is language-agnostic.

Curie
Curie is an AI-agent framework designed for automated and rigorous scientific experimentation. It automates end-to-end workflow management, ensures methodical procedure, reliability, and interpretability, and supports ML research, system analysis, and scientific discovery. It provides a benchmark with questions from 4 Computer Science domains. Users can customize experiment agents and adapt to their own tasks by configuring base_config.json. Curie is suitable for hyperparameter tuning, algorithm behavior analysis, system performance benchmarking, and automating computational simulations.

extension-gen-ai
The Looker GenAI Extension provides code examples and resources for building a Looker Extension that integrates with Vertex AI Large Language Models (LLMs). Users can leverage the power of LLMs to enhance data exploration and analysis within Looker. The extension offers generative explore functionality to ask natural language questions about data and generative insights on dashboards to analyze data by asking questions. It leverages components like BQML Remote Models, BQML Remote UDF with Vertex AI, and Custom Fine Tune Model for different integration options. Deployment involves setting up infrastructure with Terraform and deploying the Looker Extension by creating a Looker project, copying extension files, configuring BigQuery connection, connecting to Git, and testing the extension. Users can save example prompts and configure user settings for the extension. Development of the Looker Extension environment includes installing dependencies, starting the development server, and building for production.

metta
Metta AI is an open-source research project focusing on the emergence of cooperation and alignment in multi-agent AI systems. It explores the impact of social dynamics like kinship and mate selection on learning and cooperative behaviors of AI agents. The project introduces a reward-sharing mechanism mimicking familial bonds and mate selection to observe the evolution of complex social behaviors among AI agents. Metta aims to contribute to the discussion on safe and beneficial AGI by creating an environment where AI agents can develop general intelligence through continuous learning and adaptation.

RA.Aid
RA.Aid is an AI software development agent powered by `aider` and advanced reasoning models like `o1`. It combines `aider`'s code editing capabilities with LangChain's agent-based task execution framework to provide an intelligent assistant for research, planning, and implementation of multi-step development tasks. It handles complex programming tasks by breaking them down into manageable steps, running shell commands automatically, and leveraging expert reasoning models like OpenAI's o1. RA.Aid is designed for everyday software development, offering features such as multi-step task planning, automated command execution, and the ability to handle complex programming tasks beyond single-shot code edits.

llm_aided_ocr
The LLM-Aided OCR Project is an advanced system that enhances Optical Character Recognition (OCR) output by leveraging natural language processing techniques and large language models. It offers features like PDF to image conversion, OCR using Tesseract, error correction using LLMs, smart text chunking, markdown formatting, duplicate content removal, quality assessment, support for local and cloud-based LLMs, asynchronous processing, detailed logging, and GPU acceleration. The project provides detailed technical overview, text processing pipeline, LLM integration, token management, quality assessment, logging, configuration, and customization. It requires Python 3.12+, Tesseract OCR engine, PDF2Image library, PyTesseract, and optional OpenAI or Anthropic API support for cloud-based LLMs. The installation process involves setting up the project, installing dependencies, and configuring environment variables. Users can place a PDF file in the project directory, update input file path, and run the script to generate post-processed text. The project optimizes processing with concurrent processing, context preservation, and adaptive token management. Configuration settings include choosing between local or API-based LLMs, selecting API provider, specifying models, and setting context size for local LLMs. Output files include raw OCR output and LLM-corrected text. Limitations include performance dependency on LLM quality and time-consuming processing for large documents.

well-architected-iac-analyzer
Well-Architected Infrastructure as Code (IaC) Analyzer is a project demonstrating how generative AI can evaluate infrastructure code for alignment with best practices. It features a modern web application allowing users to upload IaC documents, complete IaC projects, or architecture diagrams for assessment. The tool provides insights into infrastructure code alignment with AWS best practices, offers suggestions for improving cloud architecture designs, and can generate IaC templates from architecture diagrams. Users can analyze CloudFormation, Terraform, or AWS CDK templates, architecture diagrams in PNG or JPEG format, and complete IaC projects with supporting documents. Real-time analysis against Well-Architected best practices, integration with AWS Well-Architected Tool, and export of analysis results and recommendations are included.

manifold
Manifold is a powerful platform for workflow automation using AI models. It supports text generation, image generation, and retrieval-augmented generation, integrating seamlessly with popular AI endpoints. Additionally, Manifold provides robust semantic search capabilities using PGVector combined with the SEFII engine. It is under active development and not production-ready.

LLMBox
LLMBox is a comprehensive library designed for implementing Large Language Models (LLMs) with a focus on a unified training pipeline and comprehensive model evaluation. It serves as a one-stop solution for training and utilizing LLMs, offering flexibility and efficiency in both training and utilization stages. The library supports diverse training strategies, comprehensive datasets, tokenizer vocabulary merging, data construction strategies, parameter efficient fine-tuning, and efficient training methods. For utilization, LLMBox provides comprehensive evaluation on various datasets, in-context learning strategies, chain-of-thought evaluation, evaluation methods, prefix caching for faster inference, support for specific LLM models like vLLM and Flash Attention, and quantization options. The tool is suitable for researchers and developers working with LLMs for natural language processing tasks.

resume-job-matcher
Resume Job Matcher is a Python script that automates the process of matching resumes to a job description using AI. It leverages the Anthropic Claude API or OpenAI's GPT API to analyze resumes and provide a match score along with personalized email responses for candidates. The tool offers comprehensive resume processing, advanced AI-powered analysis, in-depth evaluation & scoring, comprehensive analytics & reporting, enhanced candidate profiling, and robust system management. Users can customize font presets, generate PDF versions of unified resumes, adjust logging level, change scoring model, modify AI provider, and adjust AI model. The final score for each resume is calculated based on AI-generated match score and resume quality score, ensuring content relevance and presentation quality are considered. Troubleshooting tips, best practices, contribution guidelines, and required Python packages are provided.

llms
LLMs is a universal LLM API transformation server designed to standardize requests and responses between different LLM providers such as Anthropic, Gemini, and Deepseek. It uses a modular transformer system to handle provider-specific API formats, supporting real-time streaming responses and converting data into standardized formats. The server transforms requests and responses to and from unified formats, enabling seamless communication between various LLM providers.

MetaGPT
MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

RainbowGPT
RainbowGPT is a versatile tool that offers a range of functionalities, including Stock Analysis for financial decision-making, MySQL Management for database navigation, and integration of AI technologies like GPT-4 and ChatGlm3. It provides a user-friendly interface suitable for all skill levels, ensuring seamless information flow and continuous expansion of emerging technologies. The tool enhances adaptability, creativity, and insight, making it a valuable asset for various projects and tasks.

distillKitPlus
DistillKitPlus is an open-source toolkit designed for knowledge distillation (KLD) in low computation resource settings. It supports logit distillation, pre-computed logits for memory-efficient training, LoRA fine-tuning integration, and model quantization for faster inference. The toolkit utilizes a JSON configuration file for project, dataset, model, tokenizer, training, distillation, LoRA, and quantization settings. Users can contribute to the toolkit and contact the developers for technical questions or issues.

nodejs-todo-api-boilerplate
An LLM-powered code generation tool that relies on the built-in Node.js API Typescript Template Project to easily generate clean, well-structured CRUD module code from text description. It orchestrates 3 LLM micro-agents (`Developer`, `Troubleshooter` and `TestsFixer`) to generate code, fix compilation errors, and ensure passing E2E tests. The process includes module code generation, DB migration creation, seeding data, and running tests to validate output. By cycling through these steps, it guarantees consistent and production-ready CRUD code aligned with vertical slicing architecture.
For similar tasks

hound
Hound is a security audit automation pipeline for AI-assisted code review that mirrors how expert auditors think, learn, and collaborate. It features graph-driven analysis, sessionized audits, provider-agnostic models, belief system and hypotheses, precise code grounding, and adaptive planning. The system employs a senior/junior auditor pattern where the Scout actively navigates the codebase and annotates knowledge graphs while the Strategist handles high-level planning and vulnerability analysis. Hound is optimized for small-to-medium sized projects like smart contract applications and is language-agnostic.

AutoAudit
AutoAudit is an open-source large language model specifically designed for the field of network security. It aims to provide powerful natural language processing capabilities for security auditing and network defense, including analyzing malicious code, detecting network attacks, and predicting security vulnerabilities. By coupling AutoAudit with ClamAV, a security scanning platform has been created for practical security audit applications. The tool is intended to assist security professionals with accurate and fast analysis and predictions to combat evolving network threats.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.