ai-on-eks
AI on EKS - Tested AI/ML for Amazon Elastic Kubernetes Service
Stars: 162
AI on Amazon EKS (AIoEKS) is a repository offering optimized solutions for AI and ML workloads on Amazon EKS. It provides Terraform Blueprints with best practices for deploying robust solutions, advanced logging, and observability. Users can leverage the Ray ecosystem for distributed computing, NVIDIA Triton Server, vLLM, and TensorRT-LLM for model inference and optimization. The repository also supports high-performance NVIDIA GPUs, AWS Trainium for model training, and AWS Inferentia for cost-effective model inference at scale. AIoEKS simplifies the deployment and management of AI/ML workloads on Kubernetes, offering end-to-end logging and observability.
README:
(Pronounced: "AI on EKS")
π‘ Optimized Solutions for AI and ML on EKS
β οΈ This repository is under active development as we support the new infrastructure format. Please raise any issues you may encounter
Build, Scale, and Optimize AI/ML Platforms on Amazon EKS π
Welcome to AI on EKS, your gateway to scaling AI and ML workloads on Amazon EKS. Unlock the potential of AI with a rich collection of Terraform Blueprints featuring best practices for deploying robust solutions with advanced logging and observability.
Explore practical patterns for running AI/ML workloads on EKS, leveraging the power of the Ray ecosystem for distributed computing. Utilize advanced serving solutions like NVIDIA Triton Server, vLLM for efficient and scalable model inference, and TensorRT-LLM for optimizing deep learning models.
Take advantage of high-performance NVIDIA GPUs for intensive computational tasks and leverage AWSβs specialized hardware, including AWS Trainium for efficient model training and AWS Inferentia for cost-effective model inference at scale.
Note: AIoEKS is in active development. For upcoming features and enhancements, check out the issues section.
In this repository, you'll find a variety of deployment blueprints for creating AI/ML platforms with Amazon EKS clusters. These examples are just a small selection of the available blueprints - visit the AIoEKS website for the complete list of options.
π Inference-Ready Cluster π This solution enables supporting multiple inference patters on EKS
π Inference Charts π These charts support deploying various models on EKS
π JARK-Stack on EKS π This blueprint deploys JARK stack for AI workloads with NVIDIA GPUs.
π Generative AI on EKS π Collection of Generative AI Training and Inference LLM deployment patterns
π Envoy AI Gateway π Intelligent routing and management for AI/ML workloads with multi-model routing and rate limiting
π Multi-Model Routing π Route requests to different AI models based on headers
π Rate Limiting π Usage-based rate limiting with automatic tracking
For instructions on how to deploy AI on EKS patterns and run sample tests, visit the AIoEKS website.
Kubernetes is a widely adopted system for orchestrating containerized software at scale. As more users migrate their AI and machine learning workloads to Kubernetes, they often face the complexity of managing the Kubernetes ecosystem and selecting the right tools and configurations for their specific needs.
At AWS, we understand the challenges users encounter when deploying and scaling AI/ML workloads on Kubernetes. To simplify the process and enable users to quickly conduct proof-of-concepts and build clusters, we have developed AI on EKS (AIoEKS). AIoEKS offers opinionated open-source blueprints that provide end-to-end logging and observability, making it easier for users to deploy and manage Ray, vLLM, Kubeflow, MLFlow, Jupyter and other AI/ML workloads. With AIoEKS, users can confidently leverage the power of Kubernetes for their AI and machine learning needs without getting overwhelmed by its complexity.
AIoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the AI on EKS community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.
See CONTRIBUTING for more information.
This library is licensed under the Apache 2.0 License.
We welcome all individuals who are enthusiastic about AI on Kubernetes to become a part of this open source community. Your contributions and participation are invaluable to the success of this project.
Built with β€οΈ at AWS.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai-on-eks
Similar Open Source Tools
ai-on-eks
AI on Amazon EKS (AIoEKS) is a repository offering optimized solutions for AI and ML workloads on Amazon EKS. It provides Terraform Blueprints with best practices for deploying robust solutions, advanced logging, and observability. Users can leverage the Ray ecosystem for distributed computing, NVIDIA Triton Server, vLLM, and TensorRT-LLM for model inference and optimization. The repository also supports high-performance NVIDIA GPUs, AWS Trainium for model training, and AWS Inferentia for cost-effective model inference at scale. AIoEKS simplifies the deployment and management of AI/ML workloads on Kubernetes, offering end-to-end logging and observability.
llm-d
LLM-D is a machine learning model for sentiment analysis. It is designed to classify text data into positive, negative, or neutral sentiment categories. The model is trained on a large dataset of labeled text samples and uses natural language processing techniques to analyze and predict sentiment in new text inputs. LLM-D is a powerful tool for businesses and researchers looking to understand customer feedback, social media sentiment, and other text data sources. It can be easily integrated into existing applications or used as a standalone tool for sentiment analysis tasks.
metaflow
Metaflow is a user-friendly library designed to assist scientists and engineers in developing and managing real-world data science projects. Initially created at Netflix, Metaflow aimed to enhance the productivity of data scientists working on diverse projects ranging from traditional statistics to cutting-edge deep learning. For further information, refer to Metaflow's website and documentation.
awesome-openvino
Awesome OpenVINO is a curated list of AI projects based on the OpenVINO toolkit, offering a rich assortment of projects, libraries, and tutorials covering various topics like model optimization, deployment, and real-world applications across industries. It serves as a valuable resource continuously updated to maximize the potential of OpenVINO in projects, featuring projects like Stable Diffusion web UI, Visioncom, FastSD CPU, OpenVINO AI Plugins for GIMP, and more.
FedML
FedML is a unified and scalable machine learning library for running training and deployment anywhere at any scale. It is highly integrated with FEDML Nexus AI, a next-gen cloud service for LLMs & Generative AI. FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.
ServerlessLLM
ServerlessLLM is a fast, affordable, and easy-to-use library designed for multi-LLM serving, optimized for environments with limited GPU resources. It supports loading various leading LLM inference libraries, achieving fast load times, and reducing model switching overhead. The library facilitates easy deployment via Ray Cluster and Kubernetes, integrates with the OpenAI Query API, and is actively maintained by contributors.
SuperKnowa
SuperKnowa is a fast framework to build Enterprise RAG (Retriever Augmented Generation) Pipelines at Scale, powered by watsonx. It accelerates Enterprise Generative AI applications to get prod-ready solutions quickly on private data. The framework provides pluggable components for tackling various Generative AI use cases using Large Language Models (LLMs), allowing users to assemble building blocks to address challenges in AI-driven text generation. SuperKnowa is battle-tested from 1M to 200M private knowledge base & scaled to billions of retriever tokens.
llmariner
LLMariner is an extensible open source platform built on Kubernetes to simplify the management of generative AI workloads. It enables efficient handling of training and inference data within clusters, with OpenAI-compatible APIs for seamless integration with a wide range of AI-driven applications.
spring-ai-alibaba
Spring AI Alibaba is an AI application framework for Java developers that seamlessly integrates with Alibaba Cloud QWen LLM services and cloud-native infrastructures. It provides features like support for various AI models, high-level AI agent abstraction, function calling, and RAG support. The framework aims to simplify the development, evaluation, deployment, and observability of AI native Java applications. It offers open-source framework and ecosystem integrations to support features like prompt template management, event-driven AI applications, and more.
csghub
CSGHub is an open source platform for managing large model assets, including datasets, model files, and codes. It offers functionalities similar to a privatized Huggingface, managing assets in a manner akin to how OpenStack Glance manages virtual machine images. Users can perform operations such as uploading, downloading, storing, verifying, and distributing assets through various interfaces. The platform provides microservice submodules and standardized OpenAPIs for easy integration with users' systems. CSGHub is designed for large models and can be deployed On-Premise for offline operation.
cube
Cube is a semantic layer for building data applications, helping data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application. It works with SQL-enabled data sources, providing sub-second latency and high concurrency for API requests. Cube addresses SQL code organization, performance, and access control issues in data applications, enabling efficient data modeling, access control, and performance optimizations for various tools like embedded analytics, dashboarding, reporting, and data notebooks.
InferenceMAX
InferenceMAXβ’ is an open-source benchmarking tool designed to track real-time performance improvements in popular open-source inference frameworks and models. It runs a suite of benchmarks every night to capture progress in near real-time, providing a live indicator of inference performance. The tool addresses the challenge of rapidly evolving software ecosystems by benchmarking the latest software packages, ensuring that benchmarks do not go stale. InferenceMAXβ’ is supported by industry leaders and contributors, providing transparent and reproducible benchmarks that help the ML community make informed decisions about hardware and software performance.
oci-data-science-ai-samples
The Oracle Cloud Infrastructure Data Science and AI services Examples repository provides demos, tutorials, and code examples showcasing various features of the OCI Data Science service and AI services. It offers tools for data scientists to develop and deploy machine learning models efficiently, with features like Accelerated Data Science SDK, distributed training, batch processing, and machine learning pipelines. Whether you're a beginner or an experienced practitioner, OCI Data Science Services provide the resources needed to build, train, and deploy models easily.
ianvs
Ianvs is a distributed synergy AI benchmarking project incubated in KubeEdge SIG AI. It aims to test the performance of distributed synergy AI solutions following recognized standards, providing end-to-end benchmark toolkits, test environment management tools, test case control tools, and benchmark presentation tools. It also collaborates with other organizations to establish comprehensive benchmarks and related applications. The architecture includes critical components like Test Environment Manager, Test Case Controller, Generation Assistant, Simulation Controller, and Story Manager. Ianvs documentation covers quick start, guides, dataset descriptions, algorithms, user interfaces, stories, and roadmap.
CSGHub
CSGHub is an open source, trustworthy large model asset management platform that can assist users in governing the assets involved in the lifecycle of LLM and LLM applications (datasets, model files, codes, etc). With CSGHub, users can perform operations on LLM assets, including uploading, downloading, storing, verifying, and distributing, through Web interface, Git command line, or natural language Chatbot. Meanwhile, the platform provides microservice submodules and standardized OpenAPIs, which could be easily integrated with users' own systems. CSGHub is committed to bringing users an asset management platform that is natively designed for large models and can be deployed On-Premise for fully offline operation. CSGHub offers functionalities similar to a privatized Huggingface(on-premise Huggingface), managing LLM assets in a manner akin to how OpenStack Glance manages virtual machine images, Harbor manages container images, and Sonatype Nexus manages artifacts.
CodeFuse-muAgent
CodeFuse-muAgent is a Multi-Agent framework designed to streamline Standard Operating Procedure (SOP) orchestration for agents. It integrates toolkits, code libraries, knowledge bases, and sandbox environments for rapid construction of complex Multi-Agent interactive applications. The framework enables efficient execution and handling of multi-layered and multi-dimensional tasks.
For similar tasks
ai-on-eks
AI on Amazon EKS (AIoEKS) is a repository offering optimized solutions for AI and ML workloads on Amazon EKS. It provides Terraform Blueprints with best practices for deploying robust solutions, advanced logging, and observability. Users can leverage the Ray ecosystem for distributed computing, NVIDIA Triton Server, vLLM, and TensorRT-LLM for model inference and optimization. The repository also supports high-performance NVIDIA GPUs, AWS Trainium for model training, and AWS Inferentia for cost-effective model inference at scale. AIoEKS simplifies the deployment and management of AI/ML workloads on Kubernetes, offering end-to-end logging and observability.
topsha
LocalTopSH is an AI Agent Framework designed for companies and developers who require 100% on-premise AI agents with data privacy. It supports various OpenAI-compatible LLM backends and offers production-ready security features. The framework allows simple deployment using Docker compose and ensures that data stays within the user's network, providing full control and compliance. With cost-effective scaling options and compatibility in regions with restrictions, LocalTopSH is a versatile solution for deploying AI agents on self-hosted infrastructure.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.
