azure-agentic-infraops
Agentic InfraOps transforms Azure deployments for IT Pros. Using GitHub Copilot and AI agents, it converts requirements into diagrams, validated designs, and deploy-ready Bicep templates—aligned with Azure best practices. Includes real-time pricing, compliance, and automation.
Stars: 60
Agentic InfraOps is a multi-agent orchestration system for Azure infrastructure development that transforms how you build Azure infrastructure with AI agents. It provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation. The system enforces Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, combining the speed of AI coding with best practices in cloud engineering.
README:
A multi-agent orchestration system for Azure infrastructure development
Requirements → Architecture → Plan → Code → Deploy → Documentation
Quick Start »
·
Sample Outputs
·
Scenarios
·
Report Bug
What is Agentic InfraOps?
Agentic InfraOps transforms how you build Azure infrastructure with AI agents.
Instead of context-switching between requirements gathering, architecture decisions, Bicep authoring, and documentation, Agentic InfraOps provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation.
The system solves a critical challenge in AI-assisted infrastructure development: maintaining quality and compliance while moving quickly. By enforcing Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, you get the speed of AI coding combined with best practices in cloud engineering.
Built upon patterns from copilot-orchestra and Copilot-Atlas, adapted for Azure infrastructure workflows.
Key Features
The InfraOps Conductor orchestrates 7 specialized agents, each optimized for their specific role in the infrastructure development lifecycle.
Every architecture decision is evaluated against the 5 pillars of the Azure Well-Architected Framework: Security, Reliability, Performance, Cost Optimization, and Operational Excellence.
3 validation subagents (lint, what-if, review) provide quality gates before deployment—catching issues early when they're cheap to fix.
Built-in pause points for plan approval, pre-deployment review, and post-deployment verification keep you in control of the infrastructure development process.
Comprehensive artifacts at each phase (01-07) create an audit trail for reviewing all work completed and decisions made.
The majority of work is done in dedicated subagents, each with its own context window and dedicated prompt. This reduces hallucinations as the context fills up.
sequenceDiagram
participant User
participant Conductor
participant Requirements
participant Architect
participant Bicep
participant Deploy
User->>Conductor: Describe infrastructure project
Conductor->>Requirements: Gather requirements
Requirements-->>Conductor: Return 01-requirements.md
Conductor->>User: Present requirements
User-->>Conductor: Approve requirements
Conductor->>Architect: Assess architecture (WAF)
Architect-->>Conductor: Return 02-assessment.md + cost estimate
Conductor->>User: Present architecture
User-->>Conductor: Approve architecture
Note over Conductor: Step 3 (optional): Design diagrams & ADRs
Conductor->>Architect: Create implementation plan
Architect-->>Conductor: Return 04-plan.md + governance
Conductor->>User: Present plan
User-->>Conductor: Approve plan
Conductor->>Bicep: Generate Bicep templates
Bicep-->>Conductor: Return infra/bicep/{project}/
alt Validation passes
Conductor->>User: Present templates for deployment
User-->>Conductor: Approve for deployment
else Validation fails
Conductor->>Bicep: Revise with feedback
end
Conductor->>Deploy: Execute deployment (what-if first)
Deploy-->>Conductor: Return 06-deployment-summary.md
alt Deployment succeeds
Conductor->>User: Present deployment summary
User-->>Conductor: Verify deployment
else Deployment fails
Conductor->>User: Request guidance
end
Conductor->>User: Workflow complete + 07-* documentation suiteArchitecture Overview
The Agentic InfraOps system consists of specialized agents organized into three tiers:
| Agent | Persona | Role | Model |
|---|---|---|---|
| InfraOps Conductor | 🎼 Maestro | Master orchestrator managing the complete 7-step workflow | Claude Opus 4.6 |
- Coordinates all specialized agents through handoffs
- Manages 5 mandatory approval gates
- Handles user interactions and pause points
- Enforces the Requirements → Deploy → Docs cycle
| Step | Agent | Persona | Role | Model |
|---|---|---|---|---|
| 1 | requirements |
📜 Scribe | Captures infrastructure requirements | Claude Opus 4.6 |
| 2 | architect |
🏛️ Oracle | WAF assessment and design decisions | Claude Opus 4.6 |
| 3 | design |
🎨 Artisan | Diagrams and Architecture Decision Records | Claude Sonnet 4.5 |
| 4 | bicep-plan |
📐 Strategist | Implementation planning with governance | Claude Opus 4.6 |
| 5 | bicep-code |
⚒️ Forge | Generates AVM-first Bicep templates | Claude Sonnet 4.5 |
| 6 | deploy |
🚀 Envoy | Azure resource provisioning | Claude Sonnet 4.5 |
| 7 | — | 📚 — | As-built documentation (via skills) | — |
| Subagent | Role | When Invoked |
|---|---|---|
bicep-lint-subagent |
Syntax validation (bicep lint, bicep build) | Pre-deployment |
bicep-whatif-subagent |
Deployment preview (az deployment what-if) | Pre-deployment |
bicep-review-subagent |
Code review (AVM standards, security, naming) | Pre-deployment |
| Agent | Persona | Role |
|---|---|---|
diagnose |
🔍 Sentinel | Resource health assessment and troubleshooting |
How It Works
The Conductor agent follows a strict 7-step cycle for every infrastructure project:
- User Request — You describe the Azure infrastructure you want to build
-
Captures Requirements —
requirementsagent gathers functional, non-functional, and compliance requirements -
Output —
agent-output/{project}/01-requirements.md
-
WAF Assessment —
architectagent evaluates requirements against Well-Architected Framework - Cost Estimation — Azure Pricing MCP provides real-time SKU pricing
-
Output —
agent-output/{project}/02-architecture-assessment.md
-
Architecture Diagrams —
azure-diagramsskill generates Python-based diagrams -
Decision Records —
azure-adrskill creates Architecture Decision Records -
Output —
agent-output/{project}/03-des-*.md/.py/.png
- Governance Discovery — Discovers Azure Policy constraints in target subscription
-
Implementation Plan —
bicep-planagent creates detailed, phased implementation plan - GATE: Plan Approval — User reviews and approves before implementation
-
Output —
agent-output/{project}/04-implementation-plan.md
-
Bicep Generation —
bicep-codeagent creates AVM-first Bicep templates - Preflight Validation — Lint, what-if, and review subagents validate code
- GATE: Pre-Deploy — User reviews validation results
-
Output —
infra/bicep/{project}/with05-implementation-reference.md
-
Azure Provisioning —
deployagent executes deployment with what-if preview - GATE: Post-Deploy — User verifies deployed resources
-
Output —
agent-output/{project}/06-deployment-summary.md
-
As-Built Suite —
azure-artifactsskill generates comprehensive documentation -
Output —
agent-output/{project}/07-*.md(design doc, runbook, DR plan, inventory)
⚡ Quick Start
| Requirement | Details |
|---|---|
| 🐳 Docker Desktop | Or Podman, Colima, Rancher Desktop |
| 💻 VS Code | With Dev Containers extension |
| 🤖 GitHub Copilot | Active subscription with Chat extension |
| ☁️ Azure subscription | Optional for learning, required for deployment |
git clone https://github.com/jonathan-vella/azure-agentic-infraops.git
cd azure-agentic-infraops
code .Press F1 → Dev Containers: Reopen in Container
⏱️ First build takes 2-3 minutes. All tools are pre-installed.
⚠️ Required Setting: In your VS Code User Settings (Ctrl+,), enable:{ "chat.customAgentInSubagent.enabled": true }Without this, the Conductor cannot delegate to specialized agents.
Press Ctrl+Shift+I → Select InfraOps Conductor from the agent dropdown
Create a web app with Azure App Service, Key Vault, and SQL Database
The Conductor will guide you through all 7 steps with approval gates. Say yes to continue, or
provide feedback to refine.
Usage Examples
User: Create an e-commerce platform with AKS, Cosmos DB, and Redis caching
Conductor:
├─ @requirements → 01-requirements.md (functional, NFRs, compliance)
├─ @architect → 02-architecture-assessment.md (WAF analysis, cost estimate)
│ └─ Azure Pricing MCP (real-time SKU pricing)
├─ azure-diagrams skill → 03-des-diagram.py/.png
├─ @bicep-plan → 04-implementation-plan.md (governance constraints)
│
│ [GATE 1: User approves plan]
│
├─ @bicep-code → infra/bicep/ecommerce/
│ ├─ @bicep-lint-subagent → Syntax validation ✓
│ ├─ @bicep-whatif-subagent → What-if preview ✓
│ └─ @bicep-review-subagent → AVM compliance ✓
│
│ [GATE 2: User approves pre-deployment]
│
├─ @deploy → 06-deployment-summary.md
│
│ [GATE 3: User verifies deployment]
│
└─ azure-artifacts skill → 07-*.md (design doc, runbook, DR plan)
You can also invoke agents directly for specific tasks:
# Gather requirements only
Ctrl+Shift+A → requirements → "Capture requirements for a static web app"
# WAF assessment only
Ctrl+Shift+A → architect → "Assess the requirements in 01-requirements.md"
# Diagnose existing resources
Ctrl+Shift+A → diagnose → "Check health of my App Service apps"Skills (Reusable Capabilities)
8 skills provide reusable capabilities across agents:
| Skill | Purpose | Output |
|---|---|---|
azure-adr |
Architecture Decision Records | 03-des-adr-*.md |
azure-artifacts |
Template H2 structures, styling, generation rules |
01-07 artifacts |
azure-defaults |
Azure conventions, naming, AVM, WAF, pricing, tags | — |
azure-diagrams |
Architecture diagrams (700+ Azure icons) |
.py + .png
|
docs-writer |
Repo-aware documentation maintenance | — |
git-commit |
Conventional commit messages | — |
github-operations |
GitHub issues, PRs, CLI, Actions, releases | — |
make-skill-template |
Create new skills from template | — |
Generated Artifacts
| Phase | Artifact | Description |
|---|---|---|
| 1 | 01-requirements.md |
Functional, non-functional, compliance requirements |
| 2 | 02-architecture-assessment.md |
WAF analysis, SKU recommendations, cost estimate |
| 3 | 03-des-*.md/.py/.png |
Diagrams, ADRs, cost estimates |
| 4 | 04-implementation-plan.md |
Phased implementation plan with governance |
| 4 | 04-governance-constraints.md |
Azure Policy discovery results |
| 5 | 05-implementation-reference.md |
Bicep module inventory and validation status |
| 6 | 06-deployment-summary.md |
Deployed resources and verification |
| 7 | 07-design-document.md |
Technical design documentation |
| 7 | 07-operations-runbook.md |
Day-2 operations procedures |
| 7 | 07-backup-dr-plan.md |
Backup and disaster recovery plan |
| 7 | 07-resource-inventory.md |
Complete resource inventory |
Explore complete workflow outputs in agent-output/:
| Project | Description | Highlights |
|---|---|---|
| e2e-conductor-test | End-to-end Conductor validation | Full 7-step workflow |
| static-webapp | Static Web App with Functions | Production-ready pattern |
🧩 MCP Integration
The core enabler behind "agents with real Azure context":
| Feature | Description |
|---|---|
| RBAC-Aware | Tools operate within your existing Azure permissions |
| Broad Coverage | 40+ Azure service areas: platform, monitoring, governance |
| Day-0 to Day-2 | Discovery, validation, and troubleshooting workflows |
Real-time Azure retail pricing for cost-aware SKU decisions. Pre-configured in this repo.
📁 Project Structure
├── 📁 .github/
│ ├── 📁 agents/ # 8 main agents + 3 validation subagents
│ │ ├── infraops-conductor.agent.md # 🎼 Maestro - Master orchestrator
│ │ ├── requirements.agent.md # 📜 Scribe - Requirements capture
│ │ ├── architect.agent.md # 🏛️ Oracle - WAF assessment
│ │ ├── design.agent.md # 🎨 Artisan - Diagrams/ADRs
│ │ ├── bicep-plan.agent.md # 📐 Strategist - Planning
│ │ ├── bicep-code.agent.md # ⚒️ Forge - Bicep generation
│ │ ├── deploy.agent.md # 🚀 Envoy - Deployment
│ │ ├── diagnose.agent.md # 🔍 Sentinel - Diagnostics
│ │ └── 📁 _subagents/ # Validation subagents
│ ├── 📁 instructions/ # Guardrails and coding standards
└── 📁 skills/ # 8 reusable skills
├── 📁 agent-output/ # Generated artifacts per project
├── 📁 docs/ # Documentation and guides
├── 📁 infra/bicep/ # Generated Bicep templates
├── 📁 mcp/azure-pricing-mcp/ # 💰 Pricing MCP add-on
└── 📁 scenarios/ # 9 hands-on learning scenarios
Configuration
Required (in devcontainer.json or User Settings):
{
"chat.customAgentInSubagent.enabled": true,
"chat.agentFilesLocations": {
".github/agents": true,
".github/agents/_subagents": true
},
"chat.agentSkillsLocations": {
".github/skills": true
}
}Recommended (in .vscode/settings.json):
{
"github.copilot.chat.responsesApiReasoningEffort": "high",
"chat.thinking.style": "detailed"
}Each agent is defined in a .agent.md file that you can modify:
-
Adjust AI Model — Change the
modelfield in frontmatter - Modify Instructions — Edit the main section to change behavior
-
Add Tools — Extend the
toolsarray for additional capabilities
Best Practices
- Use the Conductor for complete workflows — Let it orchestrate the full 7-step cycle
- Review artifacts at each gate — The approval points are designed for human oversight
- Leverage preflight validation — Let the subagents catch issues before deployment
- Trust the WAF process — The architect agent enforces best practices
- Commit frequently — After each approved phase, commit the artifacts
- Delegate appropriately — Use direct agent invocation for focused tasks
🎯 Scenarios
9 hands-on scenarios from beginner to advanced (15-45 min each):
| Level | Scenarios |
|---|---|
| Beginner | Bicep baseline, diagrams as code |
| Intermediate | Documentation generation, service validation, troubleshooting, SBOM |
| Advanced | Full agentic workflow, async coding agent, orchestration test |
📋 Requirements
| Requirement | Details |
|---|---|
| VS Code | With GitHub Copilot |
| Dev Container | Docker Desktop or Codespaces |
| Azure subscription | For deployments (optional for learning) |
Included in Dev Container:
- ✅ Azure CLI with Bicep extension
- ✅ PowerShell 7+ and Python 3.10+
- ✅ All required VS Code extensions
- ✅ Pricing MCP add-on (auto-configured)
- ✅ Python diagrams library (auto-installed)
🤝 Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Acknowledgments
This project builds upon the excellent work of:
- copilot-orchestra by ShepAlderson — Foundation for multi-agent orchestration patterns
- Github-Copilot-Atlas by bigguy345 — Inspiration for context conservation and parallel execution
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for azure-agentic-infraops
Similar Open Source Tools
azure-agentic-infraops
Agentic InfraOps is a multi-agent orchestration system for Azure infrastructure development that transforms how you build Azure infrastructure with AI agents. It provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation. The system enforces Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, combining the speed of AI coding with best practices in cloud engineering.
everything-claude-code
The 'Everything Claude Code' repository is a comprehensive collection of production-ready agents, skills, hooks, commands, rules, and MCP configurations developed over 10+ months. It includes guides for setup, foundations, and philosophy, as well as detailed explanations of various topics such as token optimization, memory persistence, continuous learning, verification loops, parallelization, and subagent orchestration. The repository also provides updates on bug fixes, multi-language rules, installation wizard, PM2 support, OpenCode plugin integration, unified commands and skills, and cross-platform support. It offers a quick start guide for installation, ecosystem tools like Skill Creator and Continuous Learning v2, requirements for CLI version compatibility, key concepts like agents, skills, hooks, and rules, running tests, contributing guidelines, OpenCode support, background information, important notes on context window management and customization, star history chart, and relevant links.
axonhub
AxonHub is an all-in-one AI development platform that serves as an AI gateway allowing users to switch between model providers without changing any code. It provides features like vendor lock-in prevention, integration simplification, observability enhancement, and cost control. Users can access any model using any SDK with zero code changes. The platform offers full request tracing, enterprise RBAC, smart load balancing, and real-time cost tracking. AxonHub supports multiple databases, provides a unified API gateway, and offers flexible model management and API key creation for authentication. It also integrates with various AI coding tools and SDKs for seamless usage.
paiml-mcp-agent-toolkit
PAIML MCP Agent Toolkit (PMAT) is a zero-configuration AI context generation system with extreme quality enforcement and Toyota Way standards. It allows users to analyze any codebase instantly through CLI, MCP, or HTTP interfaces. The toolkit provides features such as technical debt analysis, advanced monitoring, metrics aggregation, performance profiling, bottleneck detection, alert system, multi-format export, storage flexibility, and more. It also offers AI-powered intelligence for smart recommendations, polyglot analysis, repository showcase, and integration points. PMAT enforces quality standards like complexity ≤20, zero SATD comments, test coverage >80%, no lint warnings, and synchronized documentation with commits. The toolkit follows Toyota Way development principles for iterative improvement, direct AST traversal, automated quality gates, and zero SATD policy.
sf-skills
sf-skills is a collection of reusable skills for Agentic Salesforce Development, enabling AI-powered code generation, validation, testing, debugging, and deployment. It includes skills for development, quality, foundation, integration, AI & automation, DevOps & tooling. The installation process is newbie-friendly and includes an installer script for various CLIs. The skills are compatible with platforms like Claude Code, OpenCode, Codex, Gemini, Amp, Droid, Cursor, and Agentforce Vibes. The repository is community-driven and aims to strengthen the Salesforce ecosystem.
new-api
New API is a next-generation large model gateway and AI asset management system that provides a wide range of features, including a new UI interface, multi-language support, online recharge function, key query for usage quota, compatibility with the original One API database, model charging by usage count, channel weighted randomization, data dashboard, token grouping and model restrictions, support for various authorization login methods, support for Rerank models, OpenAI Realtime API, Claude Messages format, reasoning effort setting, content reasoning, user-specific model rate limiting, request format conversion, cache billing support, and various model support such as gpts, Midjourney-Proxy, Suno API, custom channels, Rerank models, Claude Messages format, Dify, and more.
ai-dev-kit
The AI Dev Kit is a comprehensive toolkit designed to enhance AI-driven development on Databricks. It provides trusted sources for AI coding assistants like Claude Code and Cursor to build faster and smarter on Databricks. The kit includes features such as Spark Declarative Pipelines, Databricks Jobs, AI/BI Dashboards, Unity Catalog, Genie Spaces, Knowledge Assistants, MLflow Experiments, Model Serving, Databricks Apps, and more. Users can choose from different adventures like installing the kit, using the visual builder app, teaching AI assistants Databricks patterns, executing Databricks actions, or building custom integrations with the core library. The kit also includes components like databricks-tools-core, databricks-mcp-server, databricks-skills, databricks-builder-app, and ai-dev-project.
agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
deepfabric
DeepFabric is a CLI tool and SDK designed for researchers and developers to generate high-quality synthetic datasets at scale using large language models. It leverages a graph and tree-based architecture to create diverse and domain-specific datasets while minimizing redundancy. The tool supports generating Chain of Thought datasets for step-by-step reasoning tasks and offers multi-provider support for using different language models. DeepFabric also allows for automatic dataset upload to Hugging Face Hub and uses YAML configuration files for flexibility in dataset generation.
monoscope
Monoscope is an open-source monitoring and observability platform that uses artificial intelligence to understand and monitor systems automatically. It allows users to ingest and explore logs, traces, and metrics in S3 buckets, query in natural language via LLMs, and create AI agents to detect anomalies. Key capabilities include universal data ingestion, AI-powered understanding, natural language interface, cost-effective storage, and zero configuration. Monoscope is designed to reduce alert fatigue, catch issues before they impact users, and provide visibility across complex systems.
llm4s
LLM4S provides a simple, robust, and scalable framework for building Large Language Models (LLM) applications in Scala. It aims to leverage Scala's type safety, functional programming, JVM ecosystem, concurrency, and performance advantages to create reliable and maintainable AI-powered applications. The framework supports multi-provider integration, execution environments, error handling, Model Context Protocol (MCP) support, agent frameworks, multimodal generation, and Retrieval-Augmented Generation (RAG) workflows. It also offers observability features like detailed trace logging, monitoring, and analytics for debugging and performance insights.
claude-craft
Claude Craft is a comprehensive framework for AI-assisted development with Claude Code, providing standardized rules, agents, and commands across multiple technology stacks. It includes autonomous sprint capabilities, documentation accuracy improvements, CI hardening, and test coverage enhancements. With support for 10 technology stacks, 5 languages, 40 AI agents, 157 slash commands, and various project management features like BMAD v6 framework, Ralph Wiggum loop execution, skills, templates, checklists, and hooks system, Claude Craft offers a robust solution for project development and management. The tool also supports workflow methodology, development tracks, document generation, BMAD v6 project management, quality gates, batch processing, backlog migration, and Claude Code hooks integration.
motia
Motia is an AI agent framework designed for software engineers to create, test, and deploy production-ready AI agents quickly. It provides a code-first approach, allowing developers to write agent logic in familiar languages and visualize execution in real-time. With Motia, developers can focus on business logic rather than infrastructure, offering zero infrastructure headaches, multi-language support, composable steps, built-in observability, instant APIs, and full control over AI logic. Ideal for building sophisticated agents and intelligent automations, Motia's event-driven architecture and modular steps enable the creation of GenAI-powered workflows, decision-making systems, and data processing pipelines.
Edit-Banana
Edit Banana is a universal content re-editor that allows users to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction while preserving original diagram details and logical relationships. The platform offers advanced segmentation, fixed multi-round VLM scanning, high-quality OCR, user system with credits, multi-user concurrency, and a web interface. Users can upload images or PDFs to get editable DrawIO (XML) or PPTX files in seconds. The project structure includes components for segmentation, text extraction, frontend, models, and scripts, with detailed installation and setup instructions provided. The tool is open-source under the Apache License 2.0, allowing commercial use and secondary development.
eko
Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.
Automodel
Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.
For similar tasks
azure-agentic-infraops
Agentic InfraOps is a multi-agent orchestration system for Azure infrastructure development that transforms how you build Azure infrastructure with AI agents. It provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation. The system enforces Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, combining the speed of AI coding with best practices in cloud engineering.
speakeasy
Speakeasy is a tool that helps developers create production-quality SDKs, Terraform providers, documentation, and more from OpenAPI specifications. It supports a wide range of languages, including Go, Python, TypeScript, Java, and C#, and provides features such as automatic maintenance, type safety, and fault tolerance. Speakeasy also integrates with popular package managers like npm, PyPI, Maven, and Terraform Registry for easy distribution.
dify-docs
Dify Docs is a repository that houses the documentation website code and Markdown source files for docs.dify.ai. It contains assets, content, and data folders that are licensed under a CC-BY license.
PandaWiki
PandaWiki is a collaborative platform for creating and editing wiki pages. It allows users to easily collaborate on documentation, knowledge sharing, and information dissemination. With features like version control, user permissions, and rich text editing, PandaWiki simplifies the process of creating and managing wiki content. Whether you are working on a team project, organizing information for personal use, or building a knowledge base for your organization, PandaWiki provides a user-friendly and efficient solution for creating and maintaining wiki pages.
Roo-Code-Docs
Roo Code Docs is a website built using Docusaurus, a modern static website generator. It serves as a documentation platform for Roo Code, accessible at https://docs.roocode.com. The website provides detailed information and guides for users to navigate and utilize Roo Code effectively. With a clean and user-friendly interface, it offers a seamless experience for developers and users seeking information about Roo Code.
mdream
Mdream is a lightweight and user-friendly markdown editor designed for developers and writers. It provides a simple and intuitive interface for creating and editing markdown files with real-time preview. The tool offers syntax highlighting, markdown formatting options, and the ability to export files in various formats. Mdream aims to streamline the writing process and enhance productivity for individuals working with markdown documents.
MetaGPT
MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.
pmhub
PmHub is a smart project management system based on SpringCloud, SpringCloud Alibaba, and LLM. It aims to help students quickly grasp the architecture design and development process of microservices/distributed projects. PmHub provides a platform for students to experience the transformation from monolithic to microservices architecture, understand the pros and cons of both architectures, and prepare for job interviews. It offers popular technologies like SpringCloud-Gateway, Nacos, Sentinel, and provides high-quality code, continuous integration, product design documents, and an enterprise workflow system. PmHub is suitable for beginners and advanced learners who want to master core knowledge of microservices/distributed projects.
For similar jobs
TPI-LLM
TPI-LLM (Tensor Parallelism Inference for Large Language Models) is a system designed to bring LLM functions to low-resource edge devices, addressing privacy concerns by enabling LLM inference on edge devices with limited resources. It leverages multiple edge devices for inference through tensor parallelism and a sliding window memory scheduler to minimize memory usage. TPI-LLM demonstrates significant improvements in TTFT and token latency compared to other models, and plans to support infinitely large models with low token latency in the future.
KAI-Scheduler
KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler optimized for GPU resource allocation in AI and machine learning workloads. It supports batch scheduling, bin packing, spread scheduling, workload priority, hierarchical queues, resource distribution, fairness policies, workload consolidation, elastic workloads, dynamic resource allocation, GPU sharing, and works in both cloud and on-premise environments.
ai-containers
This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow and PyTorch optimized for Intel platforms. Scaling is done with python, Docker, kubernetes, kubeflow, cnvrg.io, Helm, and other container orchestration frameworks for use in the cloud and on-premise.
azure-agentic-infraops
Agentic InfraOps is a multi-agent orchestration system for Azure infrastructure development that transforms how you build Azure infrastructure with AI agents. It provides a structured 7-step workflow that coordinates specialized AI agents through a complete infrastructure development cycle: Requirements → Architecture → Design → Plan → Code → Deploy → Documentation. The system enforces Azure Well-Architected Framework (WAF) alignment and Azure Verified Modules (AVM) at every phase, combining the speed of AI coding with best practices in cloud engineering.
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources
kong
Kong, or Kong API Gateway, is a cloud-native, platform-agnostic, scalable API Gateway distinguished for its high performance and extensibility via plugins. It also provides advanced AI capabilities with multi-LLM support. By providing functionality for proxying, routing, load balancing, health checking, authentication (and more), Kong serves as the central layer for orchestrating microservices or conventional API traffic with ease. Kong runs natively on Kubernetes thanks to its official Kubernetes Ingress Controller.
AI-in-a-Box
AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.