vast-python
Vast.ai python and cli api client
Stars: 106
This repository contains the open source python command line interface for vast.ai. The CLI has all the main functionality of the vast.ai website GUI and uses the same underlying REST API. The main functionality is self-contained in the script file vast.py, with additional invoice generating commands in vast_pdf.py. Users can interact with the vast.ai platform through the CLI to manage instances, create templates, manage teams, and perform various cloud-related tasks.
README:
This repository contains the open source python command line interface for vast.ai.
This CLI has all of the main functionality of the vast.ai website GUI and uses the
same underlying REST API. Most of the functionality is self-contained in the single
script file vast.py
, although the invoice generating commands
require installing an additional second script called vast_pdf.py
.
You should probably create a subdirectory in which to put this script and related files if you
haven't already. You can call it whatever you like but I'll refer to it as "vid" for "Vast Install Directory".
So just enter mkdir vid
to create the directory. Once you've created the directory just change your working directory to it with cd vid
. After you've
done that the quickest way to get started is to download the vast.py
script using the wget
command.
wget https://raw.githubusercontent.com/vast-ai/vast-python/master/vast.py; chmod +x vast.py;
You can verify that the script is working by doing ./vast.py --help
. You should see a list of the available
commands. In order to proceed further you will need to login to the vast.ai website and get your api-key.
Go to https://vast.ai/console/cli/. Copy the command under
the heading "Login / Set API Key" and run it. The command will be something like
./vast.py set api-key xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
where the xxxx...
is your api-key (a long hexadecimal number). Note that if the script is
named "vast" in this command on the website and your installed script is named "vast.py"
you will need to change the name of the script in the command you run. The set api-key
command saves your api-key in a hidden file in your home directory. Do not share your
api-key with anyone as it authenticates your other vast commands to your account.
To see how the API works you can use it to find machines for rent. vast.py search offers
. In this
form the command will show all available offers. To get more specific results try narrowing the search.
There is a large online help page on how to do this. Bring up the help by doing vast.py search offers --help
.
There are many parameters that can be used to filter the results. The search command supports
all of the filters and sort options that the website GUI uses. To find Turing GPU instances
(compute capability 7.0 or higher):
./vast.py search offers 'compute_cap > 700 '
To find instances with a reliability score >= 0.99 and at least 4 gpus, ordering by num of gpus descending:
./vast.py search offers 'reliability > 0.99 num_gpus>=4' -o 'num_gpus-'
The output of this command at the time of this writing is
ID CUDA Num Model PCIE_BW vCPUs RAM Storage $/hr DLPerf DLP/$ Nvidia Driver Version Net_up Net_down R Max_Days machine_id
1596177 11.4 10x GTX_1080 5.5 48.0 257.9 4628 2.0000 73.0 36.5 470.63.01 653.3 854.5 99.5 - 638
2459430 11.5 8x RTX_A5000 9.1 128.0 515.8 3094 4.0000 209.4 52.3 495.46 1844.2 2669.6 99.7 12.0 4384
2459380 11.4 8x RTX_3070 6.3 12.0 64.0 710 1.4200 67.2 47.3 470.86 0.0 0.0 99.8 - 4102
2456624 11.4 8x RTX_2080_Ti 10.7 32.0 257.9 1653 2.8000 126.4 45.2 470.82.00 14.6 214.2 99.8 28.7 3047
2456622 11.4 8x RTX_2080_Ti 10.8 32.0 128.9 1651 2.8000 127.1 45.4 470.82.00 14.9 214.7 99.1 28.7 1569
2456600 11.5 8x RTX_2080_Ti 10.9 48.0 256.6 1704 2.4000 125.5 52.3 495.29.05 169.0 169.8 99.7 25.7 4058
2455617 11.2 8x RTX_3090 21.7 64.0 515.8 6165 6.4000 261.1 40.8 460.67 477.6 707.2 99.8 28.7 2980
2454397 11.2 8x A100_SXM4 22.4 128.0 2064.1 21568 13.2000 300.1 22.7 460.106.00 708.7 1119.8 99.2 - 4762
2405590 11.4 8x RTX_2080_Ti 11.2 48.0 257.9 1629 3.8000 125.5 33.0 470.82.00 389.4 608.8 100.0 1.8 2776
2364579 11.4 8x A100_PCIE 18.5 128.0 515.8 4813 14.8000 278.8 18.8 470.74 472.4 699.0 99.9 28.7 3459
2281839 11.2 8x Tesla_V100 11.8 72.0 483.1 1171 5.6000 193.6 34.6 460.67 493.0 697.8 100.0 28.7 2744
2281832 11.2 8x A100_PCIE 17.7 64.0 515.9 5821 14.8000 276.7 18.7 460.91.03 478.2 655.5 99.9 28.7 2901
2452630 11.4 7x RTX_3090 6.3 28.0 64.0 61 3.5000 165.5 47.3 470.86 84.6 84.4 99.3 3.8 4420
2342561 11.4 7x RTX_3090 6.1 96.0 257.6 1664 4.5500 149.2 32.8 470.82.00 476.9 671.7 99.4 1.7 4202
2237983 11.4 7x RTX_3090 12.5 32.0 257.6 3228 3.1500 204.5 64.9 470.86 194.4 183.8 99.1 - 4207
2459511 11.4 6x RTX_3090 6.2 - 128.8 812 2.8200 150.2 53.2 470.94 374.4 271.4 99.0 6.7 3129
2448342 11.5 6x RTX_A6000 12.4 64.0 515.7 6695 3.6000 169.8 47.2 495.29.05 668.6 1082.6 99.6 - 3624
2437565 11.4 6x RTX_3090 23.0 16.0 128.8 1676 5.4000 196.8 36.5 470.94 34.1 131.5 99.4 - 4238
2332973 11.2 6x RTX_3090 11.9 48.0 193.3 1671 3.3000 180.3 54.6 460.84 582.1 737.6 99.9 25.6 3552
2459459 11.5 4x RTX_3090 23.1 32.0 257.8 1363 2.0000 131.2 65.6 495.46 1954.7 2725.8 99.6 12.0 3059
2459428 11.5 4x RTX_A5000 24.6 64.0 515.8 1547 2.0000 104.9 52.4 495.46 1844.2 2669.6 99.7 12.0 4384
2459368 11.4 4x RTX_3090 25.3 48.0 64.2 133 1.3967 130.5 93.4 470.86 0.0 0.0 99.4 - 4637
2458968 11.6 4x RTX_3090 11.7 16.0 128.5 752 1.4000 79.8 57.0 510.39.01 797.8 842.7 99.9 4.0 2555
2458878 11.6 4x RTX_3090 11.6 36.0 128.5 1531 1.4000 81.9 58.5 510.39.01 757.1 807.6 99.9 4.0 3646
2458845 11.6 4x RTX_3090 3.1 12.0 128.5 369 1.4000 92.4 66.0 510.39.01 725.7 852.2 99.8 4.0 700
2458838 11.6 4x RTX_3090 5.7 48.0 128.9 624 1.4000 85.3 60.9 510.39.01 574.9 731.7 99.8 4.0 2217
2454395 11.2 4x A100_SXM4 22.9 64.0 2064.1 10784 6.6000 150.0 22.7 460.106.00 708.7 1119.8 99.2 - 4762
2452632 11.4 4x RTX_3090 6.3 16.0 64.0 35 2.0000 123.5 61.8 470.86 84.6 84.4 99.3 3.8 4420
2450275 11.4 4x RTX_3080_Ti 12.5 32.0 128.7 817 1.8000 128.8 71.6 470.82.00 278.3 350.4 99.7 - 4260
2449210 11.5 4x RTX_3090 11.2 48.0 128.9 324 2.0000 89.7 44.9 495.29.05 688.3 775.4 99.8 - 2764
2445175 11.4 4x RTX_3090 11.9 32.0 257.6 1530 2.0000 135.4 67.7 470.86 868.6 887.1 99.7 25.9 3055
2444916 11.4 4x RTX_3090 11.9 16.0 128.7 1576 1.4000 131.8 94.2 470.82.00 39.4 402.3 99.9 - 3759
2437188 11.4 4x Tesla_P100 11.7 24.0 95.2 2945 0.7200 44.8 62.2 470.82.00 10.9 76.2 99.5 0.1 3969
2437179 11.4 4x Tesla_P100 11.7 32.0 192.1 3070 0.7200 44.8 62.3 470.82.00 11.1 66.0 99.2 0.0 4159
2431606 11.4 4x RTX_3090 17.9 32.0 110.7 330 1.8400 134.3 73.0 470.82.01 584.6 813.4 99.7 4.4 4079
2419191 11.4 4x RTX_2080_Ti 6.3 32.0 64.4 837 2.0000 64.7 32.4 470.63.01 40.5 205.9 99.7 - 162
2405589 11.4 4x RTX_2080_Ti 10.8 24.0 257.9 815 1.9000 62.8 33.0 470.82.00 389.4 608.8 100.0 1.8 2776
2392087 11.4 4x RTX_A6000 10.8 32.0 515.9 1247 1.8000 64.5 35.8 470.94 669.9 705.4 99.1 10.9 4782
2377227 11.2 4x RTX_3090 6.3 24.0 64.3 1638 2.0000 128.3 64.1 460.32.03 37.8 145.0 99.7 3.0 2672
2349173 11.4 4x RTX_3090 23.2 48.0 128.7 1475 2.0000 107.4 53.7 470.86 33.2 84.2 99.8 47.3 3949
2338635 11.4 4x RTX_3090 23.0 32.0 128.5 3151 1.6000 108.8 68.0 470.86 33.8 86.4 99.6 47.4 3948
2303959 11.2 4x RTX_3090 11.7 28.0 128.8 791 2.1200 131.3 61.9 460.32.03 519.7 570.7 99.5 - 3042
2281830 11.2 4x A100_PCIE 18.1 32.0 515.9 2910 7.4000 143.6 19.4 460.91.03 478.2 655.5 99.9 28.7 2901
2193726 11.4 4x RTX_3090 12.4 32.0 128.8 1646 3.6000 153.9 42.8 470.82.01 33.3 137.5 99.5 - 3434
1737692 11.2 4x RTX_3070 6.3 28.0 128.5 656 2.8000 37.5 13.4 460.91.03 452.6 703.2 99.6 - 3510
To create an instance of type 2459368 (using an ID from the search) with the vastai/tensorflow image and 32 GB of disk storage
./vast.py create instance 2459368 --image vastai/tensorflow --disk 32
If you followed the instructions in Quickstart you have already installed the script that contains
most of the CLI functionality. If you wish to print PDF format invoices you will need a few other
things. First, you'll need the vast_pdf.py script. This can be found in this repository in the main
directory at vast_pdf.py. This script should be present in the same directory as the
vast.py
script. It makes use of a third party library called Borb to create the PDF invoices. To install
this run the command pip3 install borb
The CLI API is all contained in a python script called vast.py
.
This script can be called with various commands as arguments. Commands follow
a simple "verb-object" pattern. As an example, consider "show machines". To run this
command we type ./vast.py show machines
usage: vast.py [-h] [--url URL] [--retry RETRY] [--raw] [--explain] [--api-key API_KEY] command ...
positional arguments:
command command to run. one of:
help print this help message
attach ssh Attach an ssh key to an instance. This will allow you to connect to the instance with the ssh key.
cancel copy Cancel a remote copy in progress, specified by DST id
cancel sync Cancel a remote copy in progress, specified by DST id
change bid Change the bid price for a spot/interruptible instance
copy Copy directories between instances and/or local
cloud copy Copy files/folders to and from cloud providers
create api-key Create a new api-key with restricted permissions. Can be sent to other users and teammates
create ssh-key Create a new ssh-key
create autoscaler Create a new autoscale group
create endpoint Create a new endpoint group
create instance Create a new instance
create subaccount Create a subaccount
create team Create a new team
create team-role Add a new role to your
create template Create a new template
delete api-key Remove an api-key
delete ssh-key Remove an ssh-key
delete autoscaler Delete an autoscaler group
delete endpoint Delete an endpoint group
destroy instance Destroy an instance (irreversible, deletes data)
destroy instances Destroy a list of instances (irreversible, deletes data)
destroy team Destroy your team
detach ssh Detach an ssh key from an instance
execute Execute a (constrained) remote command on a machine
invite team-member Invite a team member
label instance Assign a string label to an instance
logs Get the logs for an instance
prepay instance Deposit credits into reserved instance.
reboot instance Reboot (stop/start) an instance
recycle instance Recycle (destroy/create) an instance
remove team-member Remove a team member
remove team-role Remove a role from your team
reports Get the user reports for a given machine
reset api-key Reset your api-key (get new key from website).
start instance Start a stopped instance
start instances Start a list of instances
stop instance Stop a running instance
stop instances Stop a list of instances
search benchmarks Search for benchmark results using custom query
search invoices Search for benchmark results using custom query
search offers Search for instance types using custom query
search templates Search for template results using custom query
set api-key Set api-key (get your api-key from the console/CLI)
set user Update user data from json file
ssh-url ssh url helper
scp-url scp url helper
show api-key Show an api-key
show api-keys List your api-keys associated with your account
show ssh-keys List your ssh keys associated with your account
show autoscalers Display user's current autoscaler groups
show endpoints Display user's current endpoint groups
show connections Displays user's cloud connections
show deposit Display reserve deposit info for an instance
show earnings Get machine earning history reports
show invoices Get billing history reports
show instance Display user's current instances
show instances Display user's current instances
show ipaddrs Display user's history of ip addresses
show user Get current user data
show subaccounts Get current subaccounts
show team-members Show your team members
show team-role Show your team role
show team-roles Show roles for a team
transfer credit Transfer credits to another account
update autoscaler Update an existing autoscale group
update endpoint Update an existing endpoint group
update team-role Update an existing team role
update ssh-key Update an existing ssh key
generate pdf-invoices
cleanup machine [Host] Remove all expired storage instances from the machine, freeing up space.
list machine [Host] list a machine for rent
list machines [Host] list machines for rent
remove defjob [Host] Delete default jobs
set defjob [Host] Create default jobs for a machine
set min-bid [Host] Set the minimum bid/rental price for a machine
schedule maint [Host] Schedule upcoming maint window
cancel maint [Host] Cancel maint window
show machines [Host] Show hosted machines
unlist machine [Host] Unlist a listed machine
launch instance Launch the top instance from the search offers based on the given parameters
options:
-h, --help show this help message and exit
--url URL server REST api url
--retry RETRY retry limit
--raw output machine-readable json
--explain output verbose explanation of mapping of CLI calls to HTTPS API endpoints
--api-key API_KEY api key. defaults to using the one stored in ~/.vast_api_key
Use 'vast COMMAND --help' for more info about a command
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for vast-python
Similar Open Source Tools
vast-python
This repository contains the open source python command line interface for vast.ai. The CLI has all the main functionality of the vast.ai website GUI and uses the same underlying REST API. The main functionality is self-contained in the script file vast.py, with additional invoice generating commands in vast_pdf.py. Users can interact with the vast.ai platform through the CLI to manage instances, create templates, manage teams, and perform various cloud-related tasks.
aws-lex-web-ui
The AWS Lex Web UI is a sample Amazon Lex web interface that provides a chatbot UI component for integration into websites. It supports voice and text interactions, Lex response cards, and programmable configuration using JavaScript. The interface can be used as a full-page chatbot UI or embedded as a widget. It offers mobile-ready responsive UI, seamless voice-text switching, and interactive messaging support. The project includes CloudFormation templates for easy deployment and customization. Users can modify configurations, integrate the UI into existing sites, and deploy using various methods like CloudFormation, pre-built libraries, or npm installation.
AppAgent
AppAgent is a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.
quick-start-connectors
Cohere's Build-Your-Own-Connector framework allows integration of Cohere's Command LLM via the Chat API endpoint to any datastore/software holding text information with a search endpoint. Enables user queries grounded in proprietary information. Use-cases include question/answering, knowledge working, comms summary, and research. Repository provides code for popular datastores and a template connector. Requires Python 3.11+ and Poetry. Connectors can be built and deployed using Docker. Environment variables set authorization values. Pre-commits for linting. Connectors tailored to integrate with Cohere's Chat API for creating chatbots. Connectors return documents as JSON objects for Cohere's API to generate answers with citations.
langgraph-studio
LangGraph Studio is a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. It offers visual graphs and state editing to better understand agent workflows and iterate faster. Users can collaborate with teammates using LangSmith to debug failure modes. The tool integrates with LangSmith and requires Docker installed. Users can create and edit threads, configure graph runs, add interrupts, and support human-in-the-loop workflows. LangGraph Studio allows interactive modification of project config and graph code, with live sync to the interactive graph for easier iteration on long-running agents.
nanoPerplexityAI
nanoPerplexityAI is an open-source implementation of a large language model service that fetches information from Google. It involves a simple architecture where the user query is checked by the language model, reformulated for Google search, and an answer is generated and saved in a markdown file. The tool requires minimal setup and is designed for easy visualization of answers.
cluster-toolkit
Cluster Toolkit is an open-source software by Google Cloud for deploying AI/ML and HPC environments on Google Cloud. It allows easy deployment following best practices, with high customization and extensibility. The toolkit includes tutorials, examples, and documentation for various modules designed for AI/ML and HPC use cases.
LLM_Web_search
LLM_Web_search project gives local LLMs the ability to search the web by outputting a specific command. It uses regular expressions to extract search queries from model output and then utilizes duckduckgo-search to search the web. LangChain's Contextual compression and Okapi BM25 or SPLADE are used to extract relevant parts of web pages in search results. The extracted results are appended to the model's output.
chronon
Chronon is a platform that simplifies and improves ML workflows by providing a central place to define features, ensuring point-in-time correctness for backfills, simplifying orchestration for batch and streaming pipelines, offering easy endpoints for feature fetching, and guaranteeing and measuring consistency. It offers benefits over other approaches by enabling the use of a broad set of data for training, handling large aggregations and other computationally intensive transformations, and abstracting away the infrastructure complexity of data plumbing.
lumigator
Lumigator is an open-source platform developed by Mozilla.ai to help users select the most suitable language model for their specific needs. It supports the evaluation of summarization tasks using sequence-to-sequence models such as BART and BERT, as well as causal models like GPT and Mistral. The platform aims to make model selection transparent, efficient, and empowering by providing a framework for comparing LLMs using task-specific metrics to evaluate how well a model fits a project's needs. Lumigator is in the early stages of development and plans to expand support to additional machine learning tasks and use cases in the future.
mahilo
Mahilo is a flexible framework for creating multi-agent systems that can interact with humans while sharing context internally. It allows developers to set up complex agent networks for various applications, from customer service to emergency response simulations. Agents can communicate with each other and with humans, making the system efficient by handling context from multiple agents and helping humans stay focused on specific problems. The system supports Realtime API for voice interactions, WebSocket-based communication, flexible communication patterns, session management, and easy agent definition.
vector-vein
VectorVein is a no-code AI workflow software inspired by LangChain and langflow, aiming to combine the powerful capabilities of large language models and enable users to achieve intelligent and automated daily workflows through simple drag-and-drop actions. Users can create powerful workflows without the need for programming, automating all tasks with ease. The software allows users to define inputs, outputs, and processing methods to create customized workflow processes for various tasks such as translation, mind mapping, summarizing web articles, and automatic categorization of customer reviews.
Powerpointer-For-Local-LLMs
PowerPointer For Local LLMs is a PowerPoint generator that uses python-pptx and local llm's via the Oobabooga Text Generation WebUI api to create beautiful and informative presentations. It runs locally on your computer, eliminating privacy concerns. The tool allows users to select from 7 designs, make placeholders for images, and easily customize presentations within PowerPoint. Users provide information for the PowerPoint, which is then used to generate text using optimized prompts and the text generation webui api. The generated text is converted into a PowerPoint presentation using the python-pptx library.
dataherald
Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over structured data. It allows you to set up an API from your database that can answer questions in plain English. You can use Dataherald to: * Allow business users to get insights from the data warehouse without going through a data analyst * Enable Q+A from your production DBs inside your SaaS application * Create a ChatGPT plug-in from your proprietary data
llama-on-lambda
This project provides a proof of concept for deploying a scalable, serverless LLM Generative AI inference engine on AWS Lambda. It leverages the llama.cpp project to enable the usage of more accessible CPU and RAM configurations instead of limited and expensive GPU capabilities. By deploying a container with the llama.cpp converted models onto AWS Lambda, this project offers the advantages of scale, minimizing cost, and maximizing compute availability. The project includes AWS CDK code to create and deploy a Lambda function leveraging your model of choice, with a FastAPI frontend accessible from a Lambda URL. It is important to note that you will need ggml quantized versions of your model and model sizes under 6GB, as your inference RAM requirements cannot exceed 9GB or your Lambda function will fail.
GlaDOS
This project aims to create a real-life version of GLaDOS, an aware, interactive, and embodied AI entity. It involves training a voice generator, developing a 'Personality Core,' implementing a memory system, providing vision capabilities, creating 3D-printable parts, and designing an animatronics system. The software architecture focuses on low-latency voice interactions, utilizing a circular buffer for data recording, text streaming for quick transcription, and a text-to-speech system. The project also emphasizes minimal dependencies for running on constrained hardware. The hardware system includes servo- and stepper-motors, 3D-printable parts for GLaDOS's body, animations for expression, and a vision system for tracking and interaction. Installation instructions cover setting up the TTS engine, required Python packages, compiling llama.cpp, installing an inference backend, and voice recognition setup. GLaDOS can be run using 'python glados.py' and tested using 'demo.ipynb'.
For similar tasks
vast-python
This repository contains the open source python command line interface for vast.ai. The CLI has all the main functionality of the vast.ai website GUI and uses the same underlying REST API. The main functionality is self-contained in the script file vast.py, with additional invoice generating commands in vast_pdf.py. Users can interact with the vast.ai platform through the CLI to manage instances, create templates, manage teams, and perform various cloud-related tasks.
hof
Hof is a CLI tool that unifies data models, schemas, code generation, and a task engine. It allows users to augment data, config, and schemas with CUE to improve consistency, generate multiple Yaml and JSON files, explore data or config with a TUI, and run workflows with automatic task dependency inference. The tool uses CUE to power the DX and implementation, providing a language for specifying schemas, configuration, and writing declarative code. Hof offers core features like code generation, data model management, task engine, CUE cmds, creators, modules, TUI, and chat for better, scalable results.
obsidian-systemsculpt-ai
SystemSculpt AI is a comprehensive AI-powered plugin for Obsidian, integrating advanced AI capabilities into note-taking, task management, knowledge organization, and content creation. It offers modules for brain integration, chat conversations, audio recording and transcription, note templates, and task generation and management. Users can customize settings, utilize AI services like OpenAI and Groq, and access documentation for detailed guidance. The plugin prioritizes data privacy by storing sensitive information locally and offering the option to use local AI models for enhanced privacy.
For similar jobs
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources
kong
Kong, or Kong API Gateway, is a cloud-native, platform-agnostic, scalable API Gateway distinguished for its high performance and extensibility via plugins. It also provides advanced AI capabilities with multi-LLM support. By providing functionality for proxying, routing, load balancing, health checking, authentication (and more), Kong serves as the central layer for orchestrating microservices or conventional API traffic with ease. Kong runs natively on Kubernetes thanks to its official Kubernetes Ingress Controller.
AI-in-a-Box
AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.
awsome-distributed-training
This repository contains reference architectures and test cases for distributed model training with Amazon SageMaker Hyperpod, AWS ParallelCluster, AWS Batch, and Amazon EKS. The test cases cover different types and sizes of models as well as different frameworks and parallel optimizations (Pytorch DDP/FSDP, MegatronLM, NemoMegatron...).
generative-ai-cdk-constructs
The AWS Generative AI Constructs Library is an open-source extension of the AWS Cloud Development Kit (AWS CDK) that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of AWS Generative AI CDK Constructs is to help developers build generative AI solutions using pattern-based definitions for their architecture. The patterns defined in AWS Generative AI CDK Constructs are high level, multi-service abstractions of AWS CDK constructs that have default configurations based on well-architected best practices. The library is organized into logical modules using object-oriented techniques to create each architectural pattern model.
model_server
OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.
dify-helm
Deploy langgenius/dify, an LLM based chat bot app on kubernetes with helm chart.