SwanLab
⚡️SwanLab: your ML experiment notebook. 你的AI实验笔记本,日志记录与可视化AI训练全流程。
Stars: 698
SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.
README:
一个开源、现代化设计的深度学习训练跟踪与可视化工具
同时支持云端/离线使用,适配30+主流框架,与你的实验代码轻松集成
🔥SwanLab 在线版 · 📃 文档 · 报告问题 · 建议反馈 · 更新日志
👋 加入我们的微信群
-
2025.01.17:新增
sync_wandb
功能,文档,支持与Weights & Biases实验跟踪同步;大幅改进了日志渲染性能 -
2025.01.11:云端版大幅优化了项目表格的性能,并支持拖拽、排序、筛选等交互
-
2025.01.01:新增折线图持久化平滑、折线图拖拽式改变大小,优化图表浏览体验
-
2024.12.22:我们与 LLaMA Factory 完成了联合集成,现在你可以在LLaMA Factory中使用SwanLab来跟踪和可视化大模型微调实验,使用指引。
-
2024.12.15:硬件监控(0.4.0) 功能上线,支持CPU、NPU(Ascend)、GPU(Nvidia)的系统级信息记录与监控。
-
2024.11.26:环境选项卡-硬件部分支持识别华为昇腾NPU与鲲鹏CPU;云厂商部分支持识别青云基石智算。
SwanLab 是一款开源、轻量的 AI 模型训练跟踪与可视化工具,提供了一个跟踪、记录、比较、和协作实验的平台。
SwanLab 面向人工智能研究者,设计了友好的Python API 和漂亮的UI界面,并提供训练可视化、自动日志记录、超参数记录、实验对比、多人协同等功能。在SwanLab上,研究者能基于直观的可视化图表发现训练问题,对比多个实验找到研究灵感,并通过在线网页的分享与基于组织的多人协同训练,打破团队沟通的壁垒,提高组织训练效率。
以下是其核心特性列表:
1. 📊 实验指标与超参数跟踪: 极简的代码嵌入您的机器学习 pipeline,跟踪记录训练关键指标
- 支持云端使用(类似Weights & Biases),随时随地查看训练进展。手机看实验的方法
- 支持超参数记录与表格展示
- 支持的元数据类型:标量指标、图像、音频、文本、...
- 支持的图表类型:折线图、媒体图(图像、音频、文本)、...
- 后台自动记录:日志logging、硬件环境、Git 仓库、Python 环境、Python 库列表、项目运行目录
2. ⚡️ 全面的框架集成: PyTorch、🤗HuggingFace Transformers、PyTorch Lightning、🦙LLaMA Factory、MMDetection、Ultralytics、PaddleDetetion、LightGBM、XGBoost、Keras、Tensorboard、Weights&Biases、OpenAI、Swift、XTuner、Stable Baseline3、Hydra 在内的 30+ 框架
3. 💻 硬件监控: 支持实时记录与监控CPU、NPU(昇腾Ascend)、GPU(英伟达Nvidia)、内存的系统级硬件指标
4. 📦 实验管理: 通过专为训练场景设计的集中式仪表板,通过整体视图速览全局,快速管理多个项目与实验
4. 🆚 比较结果: 通过在线表格与对比图表比较不同实验的超参数和结果,挖掘迭代灵感
5. 👥 在线协作: 您可以与团队进行协作式训练,支持将实验实时同步在一个项目下,您可以在线查看团队的训练记录,基于结果发表看法与建议
6. ✉️ 分享结果: 复制和发送持久的 URL 来共享每个实验,方便地发送给伙伴,或嵌入到在线笔记中
7. 💻 支持自托管: 支持离线环境使用,自托管的社区版同样可以查看仪表盘与管理实验
[!IMPORTANT]
收藏项目,你将从 GitHub 上无延迟地接收所有发布通知~ ⭐️
来看看 SwanLab 的在线演示:
ResNet50 猫狗分类 | Yolov8-COCO128 目标检测 |
---|---|
跟踪一个简单的 ResNet50 模型在猫狗数据集上训练的图像分类任务。 | 使用 Yolov8 在 COCO128 数据集上进行目标检测任务,跟踪训练超参数和指标。 |
Qwen2 指令微调 | LSTM Google 股票预测 |
---|---|
跟踪 Qwen2 大语言模型的指令微调训练,完成简单的指令遵循。 | 使用简单的 LSTM 模型在 Google 股价数据集上训练,实现对未来股价的预测。 |
ResNeXt101 音频分类 | Qwen2-VL COCO数据集微调 |
---|---|
从ResNet到ResNeXt在音频分类任务上的渐进式实验过程 | 基于Qwen2-VL多模态大模型,在COCO2014数据集上进行Lora微调。 |
pip install swanlab
swanlab login
出现提示时,输入您的 API Key,按下回车,完成登陆。
import swanlab
# 初始化一个新的swanlab实验
swanlab.init(
project="my-first-ml",
config={'learning-rate': 0.003},
)
# 记录指标
for i in range(10):
swanlab.log({"loss": i, "acc": i})
大功告成!前往SwanLab查看你的第一个 SwanLab 实验。
自托管社区版支持离线查看 SwanLab 仪表盘。
在 swanlab.init 中设置logir
和mode
这两个参数,即可离线跟踪实验:
...
swanlab.init(
logdir='./logs',
mode='local',
)
...
-
参数
mode
设置为local
,关闭将实验同步到云端 -
参数
logdir
的设置是可选的,它的作用是指定了 SwanLab 日志文件的保存位置(默认保存在swanlog
文件夹下)- 日志文件会在跟踪实验的过程中被创建和更新,离线看板的启动也将基于这些日志文件
其他部分和云端使用完全一致。
打开终端,使用下面的指令,开启一个 SwanLab 仪表板:
swanlab watch ./logs
运行完成后,SwanLab 会给你 1 个本地的 URL 链接(默认是http://127.0.0.1:5092)
访问该链接,就可以在浏览器用离线看板查看实验了。
将你最喜欢的框架与 SwanLab 结合使用!
下面是我们已集成的框架列表,欢迎提交 Issue 来反馈你想要集成的框架。
基础框架
专有/微调框架
- PyTorch Lightning
- HuggingFace Transformers
- OpenMind
- LLaMA Factory
- Modelscope Swift
- Sentence Transformers
- Torchtune
- XTuner
- MMEngine
- FastAI
- LightGBM
- XGBoost
计算机视觉
强化学习
其他框架:
-
☁️ 支持在线使用: 通过 SwanLab 可以方便地将训练实验在云端在线同步与保存,便于远程查看训练进展、管理历史项目、分享实验链接、发送实时消息通知、多端看实验等。而 Tensorboard 是一个离线的实验跟踪工具。
-
👥 多人协作: 在进行多人、跨团队的机器学习协作时,通过 SwanLab 可以轻松管理多人的训练项目、分享实验链接、跨空间交流讨论。而 Tensorboard 主要为个人设计,难以进行多人协作和分享实验。
-
💻 持久、集中的仪表板: 无论你在何处训练模型,无论是在本地计算机上、在实验室集群还是在公有云的 GPU 实例中,你的结果都会记录到同一个集中式仪表板中。而使用 TensorBoard 需要花费时间从不同的机器复制和管理 TFEvent 文件。
-
💪 更强大的表格: 通过 SwanLab 表格可以查看、搜索、过滤来自不同实验的结果,可以轻松查看数千个模型版本并找到适合不同任务的最佳性能模型。 TensorBoard 不适用于大型项目。
-
Weights and Biases 是一个必须联网使用的闭源 MLOps 平台
-
SwanLab 不仅支持联网使用,也支持开源、免费、自托管的版本
- GitHub Issues:使用 SwanLab 时遇到的错误和问题
- 电子邮件支持:反馈关于使用 SwanLab 的问题
- 微信交流群:交流使用 SwanLab 的问题、分享最新的 AI 技术
如果你喜欢在工作中使用 SwanLab,请将 SwanLab 徽章添加到你的 README 中:
[![swanlab](https://img.shields.io/badge/powered%20by-SwanLab-438440)](https://github.com/swanhubx/swanlab)
如果您发现 SwanLab 对您的研究之旅有帮助,请考虑以下列格式引用:
@software{Zeyilin_SwanLab_2023,
author = {Zeyi Lin, Shaohong Chen, Kang Li, Qiushan Jiang, Zirui Cai, Kaifang Ji and {The SwanLab team}},
doi = {10.5281/zenodo.11100550},
license = {Apache-2.0},
title = {{SwanLab}},
url = {https://github.com/swanhubx/swanlab},
year = {2023}
}
考虑为 SwanLab 做出贡献吗?首先,请花点时间阅读 贡献指南。
同时,我们非常欢迎通过社交媒体、活动和会议的分享来支持 SwanLab,衷心感谢!
Contributors
本仓库遵循 Apache 2.0 License 开源协议
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for SwanLab
Similar Open Source Tools
SwanLab
SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.
99AI
99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.
Tianji
Tianji is a free, non-commercial artificial intelligence system developed by SocialAI for tasks involving worldly wisdom, such as etiquette, hospitality, gifting, wishes, communication, awkwardness resolution, and conflict handling. It includes four main technical routes: pure prompt, Agent architecture, knowledge base, and model training. Users can find corresponding source code for these routes in the tianji directory to replicate their own vertical domain AI applications. The project aims to accelerate the penetration of AI into various fields and enhance AI's core competencies.
MoneyPrinterPlus
MoneyPrinterPlus is a project designed to help users easily make money in the era of short videos. It leverages AI big model technology to batch generate various short videos, perform video editing, and automatically publish videos to popular platforms like Douyin, Kuaishou, Xiaohongshu, and Video Number. The tool covers a wide range of functionalities including integrating with major AI big model tools, supporting various voice types, offering video transition effects, enabling customization of subtitles, and more. It aims to simplify the process of creating and sharing videos to monetize traffic.
higress
Higress is an open-source cloud-native API gateway built on the core of Istio and Envoy, based on Alibaba's internal practice of Envoy Gateway. It is designed for AI-native API gateway, serving AI businesses such as Tongyi Qianwen APP, Bailian Big Model API, and Machine Learning PAI platform. Higress provides capabilities to interface with LLM model vendors, AI observability, multi-model load balancing/fallback, AI token flow control, and AI caching. It offers features for AI gateway, Kubernetes Ingress gateway, microservices gateway, and security protection gateway, with advantages in production-level scalability, stream processing, extensibility, and ease of use.
generative-ai-use-cases-jp
Generative AI (生成 AI) brings revolutionary potential to transform businesses. This repository demonstrates business use cases leveraging Generative AI.
simpletransformers
Simple Transformers is a library based on the Transformers library by HuggingFace, allowing users to quickly train and evaluate Transformer models with only 3 lines of code. It supports various tasks such as Information Retrieval, Language Models, Encoder Model Training, Sequence Classification, Token Classification, Question Answering, Language Generation, T5 Model, Seq2Seq Tasks, Multi-Modal Classification, and Conversational AI.
HivisionIDPhotos
HivisionIDPhoto is a practical algorithm for intelligent ID photo creation. It utilizes a comprehensive model workflow to recognize, cut out, and generate ID photos for various user photo scenarios. The tool offers lightweight cutting, standard ID photo generation based on different size specifications, six-inch layout photo generation, beauty enhancement (waiting), and intelligent outfit swapping (waiting). It aims to solve emergency ID photo creation issues.
LLM-Dojo
LLM-Dojo is an open-source platform for learning and practicing large models, providing a framework for building custom large model training processes, implementing various tricks and principles in the llm_tricks module, and mainstream model chat templates. The project includes an open-source large model training framework, detailed explanations and usage of the latest LLM tricks, and a collection of mainstream model chat templates. The term 'Dojo' symbolizes a place dedicated to learning and practice, borrowing its meaning from martial arts training.
FisherAI
FisherAI is a Chrome extension designed to improve learning efficiency. It supports automatic summarization, web and video translation, multi-turn dialogue, and various large language models such as gpt/azure/gemini/deepseek/mistral/groq/yi/moonshot. Users can enjoy flexible and powerful AI tools with FisherAI.
duix.ai
Duix is a silicon-based digital human SDK for intelligent interaction, providing users with instant virtual human interaction experience on devices like Android and iOS. The SDK offers intuitive effect display and supports user customization through open documentation. It is fully open-source, allowing developers to understand its workings, optimize, and innovate further.
easyAi
EasyAi is a lightweight, beginner-friendly Java artificial intelligence algorithm framework. It can be seamlessly integrated into Java projects with Maven, requiring no additional environment configuration or dependencies. The framework provides pre-packaged modules for image object detection and AI customer service, as well as various low-level algorithm tools for deep learning, machine learning, reinforcement learning, heuristic learning, and matrix operations. Developers can easily develop custom micro-models tailored to their business needs.
Chenyme-AAVT
Chenyme-AAVT is a user-friendly tool that provides automatic video and audio recognition and translation. It leverages the capabilities of Whisper, a powerful speech recognition model, to accurately identify speech in videos and audios. The recognized speech is then translated using ChatGPT or KIMI, ensuring high-quality translations. With Chenyme-AAVT, you can quickly generate字幕 files and merge them with the original video, making video translation a breeze. The tool supports various languages, allowing you to translate videos and audios into your desired language. Additionally, Chenyme-AAVT offers features such as VAD (Voice Activity Detection) to enhance recognition accuracy, GPU acceleration for faster processing, and support for multiple字幕 formats. Whether you're a content creator, translator, or anyone looking to make video translation more efficient, Chenyme-AAVT is an invaluable tool.
HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
AIMedia
AIMedia is a fully automated AI media software that automatically fetches hot news, generates news, and publishes on various platforms. It supports hot news fetching from platforms like Douyin, NetEase News, Weibo, The Paper, China Daily, and Sohu News. Additionally, it enables AI-generated images for text-only news to enhance originality and reading experience. The tool is currently commercialized with plans to support video auto-generation for platform publishing in the future. It requires a minimum CPU of 4 cores or above, 8GB RAM, and supports Windows 10 or above. Users can deploy the tool by cloning the repository, modifying the configuration file, creating a virtual environment using Conda, and starting the web interface. Feedback and suggestions can be submitted through issues or pull requests.
how-to-optim-algorithm-in-cuda
This repository documents how to optimize common algorithms based on CUDA. It includes subdirectories with code implementations for specific optimizations. The optimizations cover topics such as compiling PyTorch from source, NVIDIA's reduce optimization, OneFlow's elementwise template, fast atomic add for half data types, upsample nearest2d optimization in OneFlow, optimized indexing in PyTorch, OneFlow's softmax kernel, linear attention optimization, and more. The repository also includes learning resources related to deep learning frameworks, compilers, and optimization techniques.
For similar tasks
SwanLab
SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.
neptune-client
Neptune is a scalable experiment tracker for teams training foundation models. Log millions of runs, effortlessly monitor and visualize model training, and deploy on your infrastructure. Track 100% of metadata to accelerate AI breakthroughs. Log and display any framework and metadata type from any ML pipeline. Organize experiments with nested structures and custom dashboards. Compare results, visualize training, and optimize models quicker. Version models, review stages, and access production-ready models. Share results, manage users, and projects. Integrate with 25+ frameworks. Trusted by great companies to improve workflow.
aisuite
Aisuite is a simple, unified interface to multiple Generative AI providers. It allows developers to easily interact with various Language Model (LLM) providers like OpenAI, Anthropic, Azure, Google, AWS, and more through a standardized interface. The library focuses on chat completions and provides a thin wrapper around python client libraries, enabling creators to test responses from different LLM providers without changing their code. Aisuite maximizes stability by using HTTP endpoints or SDKs for making calls to the providers. Users can install the base package or specific provider packages, set up API keys, and utilize the library to generate chat completion responses from different models.
AI-Shortcuts
AI Shortcuts is a browser extension designed to enhance the efficiency of using AI websites. It allows users to quickly send messages, open frequently used AI sites, compare generation results from multiple sites, and access AI content without the need for registration or membership. Users can configure their most frequently used AI sites and easily query selected text on webpages. The extension also features a tab mode for comparing results across multiple AI sites.
metaflow
Metaflow is a user-friendly library designed to assist scientists and engineers in developing and managing real-world data science projects. Initially created at Netflix, Metaflow aimed to enhance the productivity of data scientists working on diverse projects ranging from traditional statistics to cutting-edge deep learning. For further information, refer to Metaflow's website and documentation.
mlflow
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:
* `MLflow Tracking
fasttrackml
FastTrackML is an experiment tracking server focused on speed and scalability, fully compatible with MLFlow. It provides a user-friendly interface to track and visualize your machine learning experiments, making it easy to compare different models and identify the best performing ones. FastTrackML is open source and can be easily installed and run with pip or Docker. It is also compatible with the MLFlow Python package, making it easy to integrate with your existing MLFlow workflows.
zenml
ZenML is an extensible, open-source MLOps framework for creating portable, production-ready machine learning pipelines. By decoupling infrastructure from code, ZenML enables developers across your organization to collaborate more effectively as they develop to production.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.