
evolving-agents
A toolkit for agent autonomy, evolution, and governance. Create agents that can understand requirements, evolve through experience, communicate effectively, and build new agents and tools - all while operating within governance guardrails.
Stars: 403

A toolkit for agent autonomy, evolution, and governance enabling agents to learn from experience, collaborate, communicate, and build new tools within governance guardrails. It focuses on autonomous evolution, agent self-discovery, governance firmware, self-building systems, and agent-centric architecture. The toolkit leverages existing frameworks to enable agent autonomy and self-governance, moving towards truly autonomous AI systems.
README:
Build intelligent AI agent ecosystems that orchestrate complex tasks based on high-level goals.
This toolkit provides a robust, production-grade framework for building autonomous AI agents and multi-agent systems. It uniquely focuses on enabling agents to understand requirements, design solutions (potentially using specialized design agents like ArchitectZero
), discover capabilities, evolve components, and orchestrate complex task execution based on high-level goals, all while operating within defined governance boundaries.
-
Goal-Oriented Orchestration: A central
SystemAgent
acts as the primary entry point, taking high-level goals and autonomously determining the steps needed, orchestrating component creation, communication, and execution. -
Intelligent Solution Design (Optional): Agents like
ArchitectZero
can analyze requirements and design detailed multi-component solutions, providing blueprints for theSystemAgent
. -
Internal Workflow Management: For complex tasks, the
SystemAgent
can internally generate, process, and execute multi-step workflow plans using specialized tools (GenerateWorkflowTool
,ProcessWorkflowTool
), abstracting this complexity from the caller. -
Semantic Capability Discovery: The
SmartAgentBus
allows agents to find and utilize capabilities based on natural language descriptions, enabling dynamic service discovery and routing via a logical "Data Bus". -
Ecosystem Management: The
SmartAgentBus
also provides a logical "System Bus" for managing agent registration, health, and discovery. -
Intelligent Component Management: The
SmartLibrary
provides persistent storage, semantic search (via vector embeddings), versioning, and evolution capabilities for agents and tools. -
Adaptive Evolution: Components can be evolved based on requirements, feedback, or performance data using various strategies (standard, conservative, aggressive, domain adaptation), orchestrated by the
SystemAgent
. - Multi-Framework Support: Seamlessly integrate agents built with different frameworks (e.g., BeeAI, OpenAI Agents SDK) through a flexible provider architecture.
-
Governance & Safety: Built-in
Firmware
injects rules, and guardrails (like theOpenAIGuardrailsAdapter
) ensure safe and compliant operation. -
Self-Building Potential: The architecture allows agents (like
ArchitectZero
andSystemAgent
) to collaboratively design and implement new agent systems based on user needs.
While many frameworks focus on building individual agents, the Evolving Agents Toolkit focuses on creating intelligent, self-improving agent ecosystems capable of handling complex tasks autonomously. Key differentiators include:
-
High-Level Goal Execution: Interact with the system via goals given to the
SystemAgent
, which then handles the "how" (planning, component management, execution). -
Internal Orchestration: Complex workflows (design -> generate -> process -> execute) are managed internally by the
SystemAgent
, abstracting the mechanics. -
Semantic Capability Network:
SmartAgentBus
creates a dynamic network where agents discover and interact based on function, not fixed names, via capability requests. -
Deep Component Lifecycle Management: Beyond creation, the
SmartLibrary
and evolution tools support searching, reusing, versioning, and adapting components intelligently. -
Agent-Driven Ecosystem: The
SystemAgent
isn't just a script runner; it's aReActAgent
using its own tools to manage the entire process, including complex workflow execution when needed. -
True Multi-Framework Integration: Provides abstractions (
Providers
,AgentFactory
) to treat agents from different SDKs as first-class citizens.
The SystemAgent
acts as the central nervous system and primary entry point. It's a ReActAgent
equipped with specialized tools to manage the entire ecosystem. It receives high-level goals and autonomously plans and executes the necessary steps.
Example: Prompting the SystemAgent
with a high-level goal
# Define the high-level task for the System Agent
invoice_content = "..." # Load or define the invoice text here
high_level_prompt = f"""
**Goal:** Accurately process the provided invoice document and return structured, verified data.
**Functional Requirements:**
- Extract key fields: Invoice #, Date, Vendor, Bill To, Line Items (Description, Quantity, Unit Price, Item Total), Subtotal, Tax Amount, Shipping (if present), Total Due, Payment Terms, Due Date.
- Verify calculations: The sum of line item totals should match the Subtotal. The sum of Subtotal, Tax Amount, and Shipping (if present) must match the Total Due. Report any discrepancies.
**Non-Functional Requirements:**
- High accuracy is critical.
- Output must be a single, valid JSON object containing the extracted data and a 'verification' section (status: 'ok'/'failed', discrepancies: list).
**Input Data:**
{invoice_content}
**Action:** Achieve this goal using the best approach available. Create, evolve, or reuse components as needed. Return ONLY the final JSON result.
"""
# Execute the task via the SystemAgent
final_result_obj = await system_agent.run(high_level_prompt)
# Process the final result (assuming extract_json_from_response is defined elsewhere)
# final_json_result = extract_json_from_response(final_result_obj.result.text)
# print(final_json_result)
SystemAgent Internal Process (Conceptual):
When the SystemAgent
receives the high_level_prompt
, its internal ReAct loop orchestrates the following:
- Receives the high-level goal and input data.
- Analyzes the goal using its reasoning capabilities.
-
Uses Tools (
SearchComponentTool
,DiscoverAgentTool
) to find existing capabilities suitable for the task. -
Decides If Workflow Needed: If the task is complex or no single component suffices, it determines a multi-step plan is necessary.
-
(Optional) It might internally request a detailed design blueprint from
ArchitectZero
usingRequestAgentTool
. - It uses
GenerateWorkflowTool
internally to create an executable YAML workflow based on the design or its analysis. - It uses
ProcessWorkflowTool
internally to parse the YAML into a step-by-step execution plan.
-
(Optional) It might internally request a detailed design blueprint from
-
Executes the Plan: It iterates through the plan, using appropriate tools for each step:
-
CreateComponentTool
orEvolveComponentTool
forDEFINE
steps. - Internal
AgentFactory
calls during component creation. -
RequestAgentTool
forEXECUTE
steps, invoking other agents/tools via theSmartAgentBus
.
-
-
Returns Result: It returns the final result specified by the plan's
RETURN
step (or the result of a direct action if no complex workflow was needed).
(This internal complexity is hidden from the user interacting with the SystemAgent
).
ArchitectZero
is a specialized agent, typically invoked by the SystemAgent via the Agent Bus when a complex task requires a detailed plan before execution.
# Conceptual: SystemAgent requesting design from ArchitectZero via AgentBus
# This happens INTERNALLY within the SystemAgent's ReAct loop if needed.
design_request_prompt = f"""
Use RequestAgentTool to ask ArchitectZero to design an automated customer support system.
Requirements: Handle FAQs, escalate complex issues, use sentiment analysis.
Input for ArchitectZero: {{ "requirements": "Design customer support system..." }}
"""
# system_agent_internal_response = await system_agent.run(design_request_prompt)
# solution_design_json = extract_json_from_response(...)
# >> SystemAgent now has the design to proceed internally.
Note: End users typically interact with the SystemAgent
directly with their goal, not ArchitectZero
.
Stores agents, tools, and firmware definitions. Enables semantic search and lifecycle management.
# Semantic component discovery (often used internally by SystemAgent)
similar_tools = await smart_library.semantic_search(
query="Tool that can validate financial calculations in documents",
record_type="TOOL",
domain="finance",
threshold=0.6
)
# Evolve an existing component (often invoked by SystemAgent's EvolveComponentTool)
evolved_record = await smart_library.evolve_record(
parent_id="tool_id_abc",
new_code_snippet="# New improved Python code...",
description="Enhanced version with better error handling"
)
Enables dynamic, capability-based communication ("Data Bus") and provides system management functions ("System Bus").
# Register a component (System Bus operation, often via SystemAgent's tool)
await agent_bus.register_agent(
name="SentimentAnalyzerTool_v2",
agent_type="TOOL",
description="Analyzes text sentiment with high accuracy",
capabilities=[{ "id": "sentiment_analysis", "name": "Sentiment Analysis", ... }]
)
# Request a service based on capability (Data Bus operation, often via SystemAgent's RequestAgentTool)
result_payload = await agent_bus.request_capability(
capability="sentiment_analysis",
content={"text": "This service is amazing!"},
min_confidence=0.8
)
# >> result_payload might contain: {'agent_id': '...', 'agent_name': 'SentimentAnalyzerTool_v2', 'content': {'sentiment': 'positive', 'score': 0.95}, ...}
Complex tasks are handled via a structured workflow lifecycle orchestrated internally by the SystemAgent
.
-
Goal Intake:
SystemAgent
receives a high-level goal from the user/caller. -
Analysis & Planning (Internal):
SystemAgent
analyzes the goal and checks if existing components suffice. -
Design Query (Optional/Internal): If needed,
SystemAgent
requests a solution design (JSON) fromArchitectZero
via the Agent Bus. -
Workflow Generation (Internal): If a multi-step plan is required,
SystemAgent
uses itsGenerateWorkflowTool
to translate the design (or its internal analysis) into an executable YAML workflow string. -
Plan Processing (Internal):
SystemAgent
uses itsProcessWorkflowTool
to parse the YAML, validate, substitute parameters, and produce a structured execution plan. -
Plan Execution (Internal):
SystemAgent
's ReAct loop iterates through the plan, using its other tools (CreateComponentTool
,EvolveComponentTool
,RequestAgentTool
, etc.) to perform the action defined in each step (DEFINE
,CREATE
,EXECUTE
). -
Result Return:
SystemAgent
returns the final result to the caller.
The external caller interacts only at Step 1 and receives the result at Step 7, unaware of the internal workflow mechanics.
Existing components can be adapted or improved using the EvolveComponentTool
(typically invoked internally by SystemAgent
).
# Conceptual Example (within the SystemAgent's internal operation)
evolve_prompt = f"""
Use EvolveComponentTool to enhance agent 'id_123'.
Changes needed: Add support for processing PDF files directly.
Strategy: standard
"""
# evolve_result = await system_agent.run(evolve_prompt)
# >> evolve_result indicates success and provides ID of the new evolved agent version.
Integrate agents/tools from different SDKs via Providers
managed by the AgentFactory
(used internally by tools like CreateComponentTool
).
# Example: Creating agents from different frameworks via AgentFactory
# (AgentFactory is usually used internally by tools like CreateComponentTool)
# bee_record = await smart_library.find_record_by_name("BeeAgentName")
# openai_record = await smart_library.find_record_by_name("OpenAIAgentName")
# if bee_record:
# bee_agent_instance = await agent_factory.create_agent(bee_record)
# # >> Uses BeeAIProvider internally
# if openai_record:
# openai_agent_instance = await agent_factory.create_agent(openai_record)
# # >> Uses OpenAIAgentsProvider internally
Safety and operational rules are embedded via Firmware
.
-
Firmware
provides base rules + domain-specific constraints. - Prompts used by
CreateComponentTool
/EvolveComponentTool
include firmware content. -
OpenAIGuardrailsAdapter
converts firmware rules into runtime checks for OpenAI agents.
# Recommended: Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install from PyPI (when available)
# pip install evolving-agents-framework
# Or install from source
git clone https://github.com/matiasmolinas/evolving-agents.git
cd evolving-agents
pip install -r requirements.txt
pip install -e . # Install in editable mode
-
Set up Environment:
- Copy
.env.example
to.env
. - Add your
OPENAI_API_KEY
to the.env
file. - Configure other settings like
LLM_MODEL
if needed.
- Copy
-
Run the Comprehensive Demo: This demo initializes the framework and gives the
SystemAgent
a high-level goal to process an invoice, requiring design, component creation/evolution, and execution orchestrated internally.python examples/invoice_processing/architect_zero_comprehensive_demo.py
-
Explore Output: Check the generated files:
-
final_processing_output.json
: Contains the final structured result from the SystemAgent executing the task, along with the agent's full output log for debugging. -
smart_library_demo.json
: The state of the component library after the run (shows created/evolved components). -
smart_agent_bus_demo.json
: The agent registry state. -
agent_bus_logs_demo.json
: Logs of agent interactions via the bus. -
(Optional Debug)
architect_design_output.json
: The demo still saves the design blueprint generated internally by ArchitectZero for inspection.
-
Explore the examples/
directory:
-
invoice_processing/architect_zero_comprehensive_demo.py
: The flagship demo showing theSystemAgent
handling a complex invoice processing task based on a high-level goal, orchestrating design, generation, and execution internally. -
agent_evolution/
: Demonstrates creating and evolving agents/tools using both BeeAI and OpenAI frameworks. -
forms/
: Shows how the system can design and process conversational forms. -
autocomplete/
: Illustrates designing a context-aware autocomplete system. - (Add more examples as they are created)
The toolkit employs an agent-centric architecture. The SystemAgent
(a ReAct agent) is the main orchestrator, taking high-level goals. It leverages specialized tools to interact with core components like the SmartLibrary
(for component persistence and semantic search via ChromaDB) and the SmartAgentBus
(for capability-based routing and system management). For complex tasks, it internally manages the full workflow lifecycle, potentially requesting designs from agents like ArchitectZero
and using internal tools to generate, process, and execute plans. Multi-framework support is achieved through Providers
and Adapters
. Dependencies are managed via a DependencyContainer
.
For a detailed breakdown, see docs/ARCHITECTURE.md.
-
LLM Caching: Reduces API costs during development by caching completions and embeddings (
.llm_cache_demo/
). - Vector Search: Integrated ChromaDB for powerful semantic discovery of components.
- Modular Design: Core components are decoupled, facilitating extension and testing.
- Dependency Injection: Simplifies component wiring and initialization.
- Clear Logging: Provides insights into agent thinking and component interactions via bus logs and standard logging.
This project is licensed under the Apache License Version 2.0.
-
BeeAI Framework: Used for the core
ReActAgent
implementation and tool structures. - OpenAI Agents SDK: Integrated via providers for multi-framework support.
-
ChromaDB: Powers semantic search capabilities in the
SmartLibrary
andSmartAgentBus
. - Original Concept Contributors: Matias Molinas and Ismael Faro
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for evolving-agents
Similar Open Source Tools

evolving-agents
A toolkit for agent autonomy, evolution, and governance enabling agents to learn from experience, collaborate, communicate, and build new tools within governance guardrails. It focuses on autonomous evolution, agent self-discovery, governance firmware, self-building systems, and agent-centric architecture. The toolkit leverages existing frameworks to enable agent autonomy and self-governance, moving towards truly autonomous AI systems.

LLMBox
LLMBox is a comprehensive library designed for implementing Large Language Models (LLMs) with a focus on a unified training pipeline and comprehensive model evaluation. It serves as a one-stop solution for training and utilizing LLMs, offering flexibility and efficiency in both training and utilization stages. The library supports diverse training strategies, comprehensive datasets, tokenizer vocabulary merging, data construction strategies, parameter efficient fine-tuning, and efficient training methods. For utilization, LLMBox provides comprehensive evaluation on various datasets, in-context learning strategies, chain-of-thought evaluation, evaluation methods, prefix caching for faster inference, support for specific LLM models like vLLM and Flash Attention, and quantization options. The tool is suitable for researchers and developers working with LLMs for natural language processing tasks.

generative-fusion-decoding
Generative Fusion Decoding (GFD) is a novel shallow fusion framework that integrates Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR). GFD operates across mismatched token spaces of different models by mapping text token space to byte token space, enabling seamless fusion during the decoding process. It simplifies the complexity of aligning different model sample spaces, allows LLMs to correct errors in tandem with the recognition model, increases robustness in long-form speech recognition, and enables fusing recognition models deficient in Chinese text recognition with LLMs extensively trained on Chinese. GFD significantly improves performance in ASR and OCR tasks, offering a unified solution for leveraging existing pre-trained models through step-by-step fusion.

datadreamer
DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

llm-consortium
LLM Consortium is a plugin for the `llm` package that implements a model consortium system with iterative refinement and response synthesis. It orchestrates multiple learned language models to collaboratively solve complex problems through structured dialogue, evaluation, and arbitration. The tool supports multi-model orchestration, iterative refinement, advanced arbitration, database logging, configurable parameters, hundreds of models, and the ability to save and load consortium configurations.

DeepPavlov
DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.

RA.Aid
RA.Aid is an AI software development agent powered by `aider` and advanced reasoning models like `o1`. It combines `aider`'s code editing capabilities with LangChain's agent-based task execution framework to provide an intelligent assistant for research, planning, and implementation of multi-step development tasks. It handles complex programming tasks by breaking them down into manageable steps, running shell commands automatically, and leveraging expert reasoning models like OpenAI's o1. RA.Aid is designed for everyday software development, offering features such as multi-step task planning, automated command execution, and the ability to handle complex programming tasks beyond single-shot code edits.

llm-functions
LLM Functions is a project that enables the enhancement of large language models (LLMs) with custom tools and agents developed in bash, javascript, and python. Users can create tools for their LLM to execute system commands, access web APIs, or perform other complex tasks triggered by natural language prompts. The project provides a framework for building tools and agents, with tools being functions written in the user's preferred language and automatically generating JSON declarations based on comments. Agents combine prompts, function callings, and knowledge (RAG) to create conversational AI agents. The project is designed to be user-friendly and allows users to easily extend the capabilities of their language models.

hayhooks
Hayhooks is a tool that simplifies the deployment and serving of Haystack pipelines as REST APIs. It allows users to wrap their pipelines with custom logic and expose them via HTTP endpoints, including OpenAI-compatible chat completion endpoints. With Hayhooks, users can easily convert their Haystack pipelines into API services with minimal boilerplate code.

agents-starter
A starter template for building AI-powered chat agents using Cloudflare's Agent platform, powered by agents-sdk. It provides a foundation for creating interactive chat experiences with AI, complete with a modern UI and tool integration capabilities. Features include interactive chat interface with AI, built-in tool system with human-in-the-loop confirmation, advanced task scheduling, dark/light theme support, real-time streaming responses, state management, and chat history. Prerequisites include a Cloudflare account and OpenAI API key. The project structure includes components for chat UI implementation, chat agent logic, tool definitions, and helper functions. Customization guide covers adding new tools, modifying the UI, and example use cases for customer support, development assistant, data analysis assistant, personal productivity assistant, and scheduling assistant.

videodb-python
VideoDB Python SDK allows you to interact with the VideoDB serverless database. Manage videos as intelligent data, not files. It's scalable, cost-efficient & optimized for AI applications and LLM integration. The SDK provides functionalities for uploading videos, viewing videos, streaming specific sections of videos, searching inside a video, searching inside multiple videos in a collection, adding subtitles to a video, generating thumbnails, and more. It also offers features like indexing videos by spoken words, semantic indexing, and future indexing options for scenes, faces, and specific domains like sports. The SDK aims to simplify video management and enhance AI applications with video data.

code2prompt
Code2Prompt is a powerful command-line tool that generates comprehensive prompts from codebases, designed to streamline interactions between developers and Large Language Models (LLMs) for code analysis, documentation, and improvement tasks. It bridges the gap between codebases and LLMs by converting projects into AI-friendly prompts, enabling users to leverage AI for various software development tasks. The tool offers features like holistic codebase representation, intelligent source tree generation, customizable prompt templates, smart token management, Gitignore integration, flexible file handling, clipboard-ready output, multiple output options, and enhanced code readability.

openedai-speech
OpenedAI Speech is a free, private text-to-speech server compatible with the OpenAI audio/speech API. It offers custom voice cloning and supports various models like tts-1 and tts-1-hd. Users can map their own piper voices and create custom cloned voices. The server provides multilingual support with XTTS voices and allows fixing incorrect sounds with regex. Recent changes include bug fixes, improved error handling, and updates for multilingual support. Installation can be done via Docker or manual setup, with usage instructions provided. Custom voices can be created using Piper or Coqui XTTS v2, with guidelines for preparing audio files. The tool is suitable for tasks like generating speech from text, creating custom voices, and multilingual text-to-speech applications.

aicsimageio
AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.
For similar tasks

document-ai-samples
The Google Cloud Document AI Samples repository contains code samples and Community Samples demonstrating how to analyze, classify, and search documents using Google Cloud Document AI. It includes various projects showcasing different functionalities such as integrating with Google Drive, processing documents using Python, content moderation with Dialogflow CX, fraud detection, language extraction, paper summarization, tax processing pipeline, and more. The repository also provides access to test document files stored in a publicly-accessible Google Cloud Storage Bucket. Additionally, there are codelabs available for optical character recognition (OCR), form parsing, specialized processors, and managing Document AI processors. Community samples, like the PDF Annotator Sample, are also included. Contributions are welcome, and users can seek help or report issues through the repository's issues page. Please note that this repository is not an officially supported Google product and is intended for demonstrative purposes only.

step-free-api
The StepChat Free service provides high-speed streaming output, multi-turn dialogue support, online search support, long document interpretation, and image parsing. It offers zero-configuration deployment, multi-token support, and automatic session trace cleaning. It is fully compatible with the ChatGPT interface. Additionally, it provides seven other free APIs for various services. The repository includes a disclaimer about using reverse APIs and encourages users to avoid commercial use to prevent service pressure on the official platform. It offers online testing links, showcases different demos, and provides deployment guides for Docker, Docker-compose, Render, Vercel, and native deployments. The repository also includes information on using multiple accounts, optimizing Nginx reverse proxy, and checking the liveliness of refresh tokens.

unilm
The 'unilm' repository is a collection of tools, models, and architectures for Foundation Models and General AI, focusing on tasks such as NLP, MT, Speech, Document AI, and Multimodal AI. It includes various pre-trained models, such as UniLM, InfoXLM, DeltaLM, MiniLM, AdaLM, BEiT, LayoutLM, WavLM, VALL-E, and more, designed for tasks like language understanding, generation, translation, vision, speech, and multimodal processing. The repository also features toolkits like s2s-ft for sequence-to-sequence fine-tuning and Aggressive Decoding for efficient sequence-to-sequence decoding. Additionally, it offers applications like TrOCR for OCR, LayoutReader for reading order detection, and XLM-T for multilingual NMT.

searchGPT
searchGPT is an open-source project that aims to build a search engine based on Large Language Model (LLM) technology to provide natural language answers. It supports web search with real-time results, file content search, and semantic search from sources like the Internet. The tool integrates LLM technologies such as OpenAI and GooseAI, and offers an easy-to-use frontend user interface. The project is designed to provide grounded answers by referencing real-time factual information, addressing the limitations of LLM's training data. Contributions, especially from frontend developers, are welcome under the MIT License.

LLMs-at-DoD
This repository contains tutorials for using Large Language Models (LLMs) in the U.S. Department of Defense. The tutorials utilize open-source frameworks and LLMs, allowing users to run them in their own cloud environments. The repository is maintained by the Defense Digital Service and welcomes contributions from users.

LARS
LARS is an application that enables users to run Large Language Models (LLMs) locally on their devices, upload their own documents, and engage in conversations where the LLM grounds its responses with the uploaded content. The application focuses on Retrieval Augmented Generation (RAG) to increase accuracy and reduce AI-generated inaccuracies. LARS provides advanced citations, supports various file formats, allows follow-up questions, provides full chat history, and offers customization options for LLM settings. Users can force enable or disable RAG, change system prompts, and tweak advanced LLM settings. The application also supports GPU-accelerated inferencing, multiple embedding models, and text extraction methods. LARS is open-source and aims to be the ultimate RAG-centric LLM application.

EAGLE
Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs that enhance multimodal LLM perception using a mix of vision encoders and various input resolutions. The model features a channel-concatenation-based fusion for vision experts with different architectures and knowledge, supporting up to over 1K input resolution. It excels in resolution-sensitive tasks like optical character recognition and document understanding.

erag
ERAG is an advanced system that combines lexical, semantic, text, and knowledge graph searches with conversation context to provide accurate and contextually relevant responses. This tool processes various document types, creates embeddings, builds knowledge graphs, and uses this information to answer user queries intelligently. It includes modules for interacting with web content, GitHub repositories, and performing exploratory data analysis using various language models.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.