
Automodel
DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-accelerated, and memory efficiency.
Stars: 62

Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.
README:
NeMo Framework is NVIDIA's GPU accelerated, end-to-end training framework for large language models (LLMs), multi-modal models and speech models. It enables seamless scaling of training (both pretraining and post-training) workloads from single GPU to thousand-node clusters for both π€Hugging Face/PyTorch and Megatron models. It includes a suite of libraries and recipe collections to help users train models from end to end. The AutoModel library ("NeMo AutoModel") provides GPU-accelerated PyTorch training for π€Hugging Face models on Day-0. Users can start training and fine-tuning models instantly without conversion delays, scale effortlessly with PyTorch-native parallelisms, optimized custom kernels, and memory-efficient recipes-all while preserving the original checkpoint format for seamless use across the Hugging Face ecosystem.
β οΈ Note: NeMo AutoModel is under active development. New features, improvements, and documentation updates are released regularly. We are working toward a stable release, so expect the interface to solidify over time. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.
β Available now | π Coming in 25.09
-
β HuggingFace Integration - Works with 1-70B models (Qwen, Llama).
-
β Distributed Training - Fully Sharded Data Parallel (FSDP2) support.
-
β Environment Support - Support for SLURM and interactive training.
-
β Learning Algorithms - SFT (Supervised Fine-Tuning), and PEFT (Parameter Efficient Fine-Tuning).
-
β Large Model Support - Native PyTorch support for models up to 70B parameters.
-
β Advanced Parallelism - PyTorch native FSDP2, TP, CP, and SP for efficient training.
-
β Sequence Packing - Sequence packing in both DTensor and MCore for huge training perf gains.
-
β DCP - Distributed Checkpoint support with SafeTensors output.
-
β HSDP - Hybrid Sharding Data Parallelism based on FSDP2.
-
π Pipeline Support - Torch-native support for pipelining composable with FSDP2 and DTensor (3D Parallelism).
-
π Pre-training - Support for model pre-training, including DeepSeekV3, GPT-OSS and Qwen3 (Coder-480B-A35B, etc).
-
π Knowledge Distillation - Support for knowledge distillation with LLMs; VLM support will be added post 25.09.
NeMo AutoModel provides native support for a wide range of models available on the Hugging Face Hub, enabling efficient fine-tuning for various domains. Below is a comprehensive list of all supported models with their available recipes:
To get started quickly, NeMo AutoModel provides a collection of ready-to-use recipes for common LLM and VLM fine-tuning tasks. Simply select the recipe that matches your model and training setup (e.g., single-GPU, multi-GPU, or multi-node).
Domain | Model Family | Model ID | Recipes |
---|---|---|---|
LLM | LLaMA | meta-llama/Llama-3.2-1B |
SFT, PEFT |
meta-llama/Llama-3.2-3B-Instruct |
SFT, PEFT | ||
meta-llama/Llama-3.1-8B |
FP8 | ||
LLM | Mistral | mistralai/Mistral-7B-v0.1 |
SFT, PEFT, FP8 |
mistralai/Mistral-Nemo-Base-2407 |
SFT, PEFT, FP8 | ||
mistralai/Mixtral-8x7B-Instruct-v0.1 |
PEFT | ||
LLM | Qwen | Qwen/Qwen2.5-7B |
SFT, PEFT, FP8 |
Qwen/Qwen3-0.6B |
SFT, PEFT | ||
Qwen/QwQ-32B |
SFT, PEFT | ||
LLM | Gemma | google/gemma-3-270m |
SFT, PEFT |
google/gemma-2-9b-it |
SFT, PEFT, FP8 | ||
google/gemma-7b |
SFT, PEFT | ||
LLM | Phi | microsoft/phi-2 |
SFT, PEFT |
microsoft/Phi-3-mini-4k-instruct |
SFT, PEFT | ||
microsoft/phi-4 |
SFT, PEFT, FP8 | ||
LLM | Seed | ByteDance-Seed/Seed-Coder-8B-Instruct |
SFT, PEFT, FP8 |
ByteDance-Seed/Seed-OSS-36B-Instruct |
SFT, PEFT | ||
LLM | Baichuan | baichuan-inc/Baichuan2-7B-Chat |
SFT, PEFT, FP8 |
VLM | Gemma | google/gemma-3-4b-it |
SFT, PEFT |
google/gemma-3n-e4b-it |
SFT, PEFT |
And more: Check out more LLM and VLM examples! Any causal LM on Hugging Face Hub can be used with the base recipe template!
To run a NeMo AutoModel recipe, you need a recipe script (e.g., LLM, VLM) and a YAML config file (e.g., LLM, VLM):
# Command invocation format:
uv run <recipe_script_path> --config <yaml_config_path>
# LLM example: multi-GPU with FSDP2
uv run torchrun --nproc-per-node=8 recipes/llm_finetune/finetune.py --config recipes/llm_finetune/llama/llama3_2_1b_hellaswag.yaml
# VLM example: single GPU fine-tuning (Gemma-3-VL) with LoRA
uv run recipes/vlm_finetune/finetune.py --config recipes/vlm_finetune/gemma3/gemma3_vl_3b_cord_v2_peft.yaml
- Day-0 Hugging Face Support: Instantly fine-tune any model from the Hugging Face Hub
- Lightning Fast Performance: Custom CUDA kernels and memory optimizations deliver 2β5Γ speedups
- Large-Scale Distributed Training: Built-in FSDP2 and nvFSDP for seamless multi-node scaling
- Vision-Language Model Ready: Native support for VLMs (Qwen2-VL, Gemma-3-VL, etc)
- Advanced PEFT Methods: LoRA and extensible PEFT system out of the box
- Seamless HF Ecosystem: Fine-tuned models work perfectly with Transformers pipeline, VLM, etc.
- Robust Infrastructure: Distributed checkpointing with integrated logging and monitoring
- Optimized Recipes: Pre-built configurations for common models and datasets
- Flexible Configuration: YAML-based configuration system for reproducible experiments
- FP8 Precision: Native FP8 training & inference for higher throughput and lower memory use
- INT4 / INT8 Quantization: Turn-key quantization workflows for ultra-compact, low-memory training
NeMo AutoModel is offered both as a standard Python package installable via pip and as a ready-to-run NeMo Framework Docker container.
# We use `uv` for package management and environment isolation.
pip3 install uv
# If you cannot install at the system level, you can install for your user with
# pip3 install --user uv
Run every command with uv run
. It auto-installs the virtual environment from the lock file and keeps it up to date, so you never need to activate a venv manually. Example: uv run recipes/llm_finetune/finetune.py
. If you prefer to install NeMo Automodel explicitly, please follow the instructions below.
# Install the latest stable release from PyPI
# We first need to initialize the virtual environment using uv
uv venv
uv pip install nemo_automodel # or: uv pip install --upgrade nemo_automodel
# Install the latest NeMo Automodel from the GitHub repo (best for development).
# We first need to initialize the virtual environment using uv
uv venv
# We can now install from source
uv pip install git+https://github.com/NVIDIA-NeMo/Automodel.git
uv run python -c "import nemo_automodel; print('β
NeMo AutoModel ready')"
distributed:
_target_: nemo_automodel.distributed.nvfsdp.NVFSDPManager
dp_size: 8
tp_size: 1
cp_size: 1
peft:
peft_fn: nemo_automodel._peft.lora.apply_lora_to_linear_modules
match_all_linear: True
dim: 8
alpha: 32
use_triton: True
model:
_target_: nemo_automodel._transformers.NeMoAutoModelForImageTextToText.from_pretrained
pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
processor:
_target_: transformers.AutoProcessor.from_pretrained
pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
min_pixels: 200704
max_pixels: 1003520
checkpoint:
enabled: true
checkpoint_dir: ./checkpoints
save_consolidated: true # HF-compatible safetensors
model_save_format: safetensors
NeMo-Automodel/
βββ nemo_automodel/ # Core library
β βββ _peft/ # PEFT implementations (LoRA)
β βββ _transformers/ # HF model integrations
β βββ checkpoint/ # Distributed checkpointing
β βββ datasets/ # Dataset loaders
β β βββ llm/ # LLM datasets (HellaSwag, SQuAD, etc.)
β β βββ vlm/ # VLM datasets (CORD-v2, rdr etc.)
β βββ distributed/ # FSDP2, nvFSDP, parallelization
β βββ loss/ # Optimized loss functions
β βββ training/ # Training recipes and utilities
βββ recipes/ # Ready-to-use training recipes
β βββ llm/ # LLM fine-tuning recipes
β βββ vlm/ # VLM fine-tuning recipes
βββ tests/ # Comprehensive test suite
We welcome contributions! Please see our Contributing Guide for details.
NVIDIA NeMo AutoModel is licensed under the Apache License 2.0.
- Documentation: https://docs.nvidia.com/nemo-framework/user-guide/latest/automodel/index.html
- Hugging Face Hub: https://huggingface.co/models
- Issues: https://github.com/NVIDIA-NeMo/Automodel/issues
- Discussions: https://github.com/NVIDIA-NeMo/Automodel/discussions
Made with β€οΈ by NVIDIA
Accelerating AI for everyone
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Automodel
Similar Open Source Tools

Automodel
Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.

deeppowers
Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

AI_Spectrum
AI_Spectrum is a versatile machine learning library that provides a wide range of tools and algorithms for building and deploying AI models. It offers a user-friendly interface for data preprocessing, model training, and evaluation. With AI_Spectrum, users can easily experiment with different machine learning techniques and optimize their models for various tasks. The library is designed to be flexible and scalable, making it suitable for both beginners and experienced data scientists.

WorkflowAI
WorkflowAI is a powerful tool designed to streamline and automate various tasks within the workflow process. It provides a user-friendly interface for creating custom workflows, automating repetitive tasks, and optimizing efficiency. With WorkflowAI, users can easily design, execute, and monitor workflows, allowing for seamless integration of different tools and systems. The tool offers advanced features such as conditional logic, task dependencies, and error handling to ensure smooth workflow execution. Whether you are managing project tasks, processing data, or coordinating team activities, WorkflowAI simplifies the workflow management process and enhances productivity.

DocsGPT
DocsGPT is an open-source documentation assistant powered by GPT models. It simplifies the process of searching for information in project documentation by allowing developers to ask questions and receive accurate answers. With DocsGPT, users can say goodbye to manual searches and quickly find the information they need. The tool aims to revolutionize project documentation experiences and offers features like live previews, Discord community, guides, and contribution opportunities. It consists of a Flask app, Chrome extension, similarity search index creation script, and a frontend built with Vite and React. Users can quickly get started with DocsGPT by following the provided setup instructions and can contribute to its development by following the guidelines in the CONTRIBUTING.md file. The project follows a Code of Conduct to ensure a harassment-free community environment for all participants. DocsGPT is licensed under MIT and is built with LangChain.

trubrics-sdk
Trubrics-sdk is a software development kit designed to facilitate the integration of analytics features into applications. It provides a set of tools and functionalities that enable developers to easily incorporate analytics capabilities, such as data collection, analysis, and reporting, into their software products. The SDK streamlines the process of implementing analytics solutions, allowing developers to focus on building and enhancing their applications' functionality and user experience. By leveraging trubrics-sdk, developers can quickly and efficiently integrate robust analytics features, gaining valuable insights into user behavior and application performance.

God-Level-AI
A drill of scientific methods, processes, algorithms, and systems to build stories & models. An in-depth learning resource for humans. This repository is designed for individuals aiming to excel in the field of Data and AI, providing video sessions and text content for learning. It caters to those in leadership positions, professionals, and students, emphasizing the need for dedicated effort to achieve excellence in the tech field. The content covers various topics with a focus on practical application.

chatmcp
Chatmcp is a chatbot framework for building conversational AI applications. It provides a flexible and extensible platform for creating chatbots that can interact with users in a natural language. With Chatmcp, developers can easily integrate chatbot functionality into their applications, enabling users to communicate with the system through text-based conversations. The framework supports various natural language processing techniques and allows for the customization of chatbot behavior and responses. Chatmcp simplifies the development of chatbots by providing a set of pre-built components and tools that streamline the creation process. Whether you are building a customer support chatbot, a virtual assistant, or a chat-based game, Chatmcp offers the necessary features and capabilities to bring your conversational AI ideas to life.

sciml.ai
SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

terminal-bench
Terminal Bench is a simple command-line benchmark tool for Unix-like systems. It allows users to easily compare the performance of different commands or scripts by measuring their execution time. The tool provides detailed statistics and visualizations to help users analyze the results. With Terminal Bench, users can optimize their scripts and commands for better performance and efficiency.

memori
Memori is a lightweight and user-friendly memory management tool for developers. It helps in tracking memory usage, detecting memory leaks, and optimizing memory allocation in software projects. With Memori, developers can easily monitor and analyze memory consumption to improve the performance and stability of their applications. The tool provides detailed insights into memory usage patterns and helps in identifying areas for optimization. Memori is designed to be easy to integrate into existing projects and offers a simple yet powerful interface for managing memory resources effectively.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

trae-agent
Trae-agent is a Python library for building and training reinforcement learning agents. It provides a simple and flexible framework for implementing various reinforcement learning algorithms and experimenting with different environments. With Trae-agent, users can easily create custom agents, define reward functions, and train them on a variety of tasks. The library also includes utilities for visualizing agent performance and analyzing training results, making it a valuable tool for both beginners and experienced researchers in the field of reinforcement learning.

RD-Agent
RD-Agent is a tool designed to automate critical aspects of industrial R&D processes, focusing on data-driven scenarios to streamline model and data development. It aims to propose new ideas ('R') and implement them ('D') automatically, leading to solutions of significant industrial value. The tool supports scenarios like Automated Quantitative Trading, Data Mining Agent, Research Copilot, and more, with a framework to push the boundaries of research in data science. Users can create a Conda environment, install the RDAgent package from PyPI, configure GPT model, and run various applications for tasks like quantitative trading, model evolution, medical prediction, and more. The tool is intended to enhance R&D processes and boost productivity in industrial settings.

nmed2024
Nmed2024 is a GitHub repository that contains code for a neural network model designed for medical image analysis. The repository includes scripts for training the model, as well as pre-trained weights for quick deployment. The model is specifically tailored for detecting abnormalities in medical images, such as tumors or fractures. It utilizes deep learning techniques to achieve high accuracy and can be easily integrated into existing medical imaging systems. Researchers and developers in the healthcare industry can leverage this tool to enhance the efficiency and accuracy of medical image analysis tasks.

PulsarRPAPro
PulsarRPAPro is a powerful robotic process automation (RPA) tool designed to automate repetitive tasks and streamline business processes. It offers a user-friendly interface for creating and managing automation workflows, allowing users to easily automate tasks without the need for extensive programming knowledge. With features such as task scheduling, data extraction, and integration with various applications, PulsarRPAPro helps organizations improve efficiency and productivity by reducing manual work and human errors. Whether you are a small business looking to automate simple tasks or a large enterprise seeking to optimize complex processes, PulsarRPAPro provides the flexibility and scalability to meet your automation needs.
For similar tasks

autogen
AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

tracecat
Tracecat is an open-source automation platform for security teams. It's designed to be simple but powerful, with a focus on AI features and a practitioner-obsessed UI/UX. Tracecat can be used to automate a variety of tasks, including phishing email investigation, evidence collection, and remediation plan generation.

ciso-assistant-community
CISO Assistant is a tool that helps organizations manage their cybersecurity posture and compliance. It provides a centralized platform for managing security controls, threats, and risks. CISO Assistant also includes a library of pre-built frameworks and tools to help organizations quickly and easily implement best practices.

ck
Collective Mind (CM) is a collection of portable, extensible, technology-agnostic and ready-to-use automation recipes with a human-friendly interface (aka CM scripts) to unify and automate all the manual steps required to compose, run, benchmark and optimize complex ML/AI applications on any platform with any software and hardware: see online catalog and source code. CM scripts require Python 3.7+ with minimal dependencies and are continuously extended by the community and MLCommons members to run natively on Ubuntu, MacOS, Windows, RHEL, Debian, Amazon Linux and any other operating system, in a cloud or inside automatically generated containers while keeping backward compatibility - please don't hesitate to report encountered issues here and contact us via public Discord Server to help this collaborative engineering effort! CM scripts were originally developed based on the following requirements from the MLCommons members to help them automatically compose and optimize complex MLPerf benchmarks, applications and systems across diverse and continuously changing models, data sets, software and hardware from Nvidia, Intel, AMD, Google, Qualcomm, Amazon and other vendors: * must work out of the box with the default options and without the need to edit some paths, environment variables and configuration files; * must be non-intrusive, easy to debug and must reuse existing user scripts and automation tools (such as cmake, make, ML workflows, python poetry and containers) rather than substituting them; * must have a very simple and human-friendly command line with a Python API and minimal dependencies; * must require minimal or zero learning curve by using plain Python, native scripts, environment variables and simple JSON/YAML descriptions instead of inventing new workflow languages; * must have the same interface to run all automations natively, in a cloud or inside containers. CM scripts were successfully validated by MLCommons to modularize MLPerf inference benchmarks and help the community automate more than 95% of all performance and power submissions in the v3.1 round across more than 120 system configurations (models, frameworks, hardware) while reducing development and maintenance costs.

zenml
ZenML is an extensible, open-source MLOps framework for creating portable, production-ready machine learning pipelines. By decoupling infrastructure from code, ZenML enables developers across your organization to collaborate more effectively as they develop to production.

clearml
ClearML is a suite of tools designed to streamline the machine learning workflow. It includes an experiment manager, MLOps/LLMOps, data management, and model serving capabilities. ClearML is open-source and offers a free tier hosting option. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm. ClearML provides extensive logging capabilities, including source control info, execution environment, hyper-parameters, and experiment outputs. It also offers automation features, such as remote job execution and pipeline creation. ClearML is designed to be easy to integrate, requiring only two lines of code to add to existing scripts. It aims to improve collaboration, visibility, and data transparency within ML teams.

devchat
DevChat is an open-source workflow engine that enables developers to create intelligent, automated workflows for engaging with users through a chat panel within their IDEs. It combines script writing flexibility, latest AI models, and an intuitive chat GUI to enhance user experience and productivity. DevChat simplifies the integration of AI in software development, unlocking new possibilities for developers.

LLM-Finetuning-Toolkit
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. It allows users to control all elements of a typical experimentation pipeline - prompts, open-source LLMs, optimization strategy, and LLM testing - through a single YAML configuration file. The toolkit supports basic, intermediate, and advanced usage scenarios, enabling users to run custom experiments, conduct ablation studies, and automate fine-tuning workflows. It provides features for data ingestion, model definition, training, inference, quality assurance, and artifact outputs, making it a comprehensive tool for fine-tuning large language models.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.