
albumentations
Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Stars: 13557

Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.
README:
Docs | Discord | Twitter | LinkedIn
Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.
Here is an example of how you can apply some pixel-level augmentations from Albumentations to create new images from the original one:
- Albumentations supports all common computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation.
- The library provides a simple unified API to work with all data types: images (RBG-images, grayscale images, multispectral images), segmentation masks, bounding boxes, and keypoints.
- The library contains more than 70 different augmentations to generate new training samples from the existing data.
- Albumentations is fast. We benchmark each new release to ensure that augmentations provide maximum speed.
- It works with popular deep learning frameworks such as PyTorch and TensorFlow. By the way, Albumentations is a part of the PyTorch ecosystem.
- Written by experts. The authors have experience both working on production computer vision systems and participating in competitive machine learning. Many core team members are Kaggle Masters and Grandmasters.
- The library is widely used in industry, deep learning research, machine learning competitions, and open source projects.
- Albumentations
Vladimir I. Iglovikov | Kaggle Grandmaster
Mikhail Druzhinin | Kaggle Expert
Alexander Buslaev — Computer Vision Engineer at Mapbox | Kaggle Master
Evegene Khvedchenya — Computer Vision Research Engineer at Piñata Farms | Kaggle Grandmaster
Albumentations requires Python 3.8 or higher. To install the latest version from PyPI:
pip install -U albumentations
Other installation options are described in the documentation.
The full documentation is available at https://albumentations.ai/docs/.
import albumentations as A
import cv2
# Declare an augmentation pipeline
transform = A.Compose([
A.RandomCrop(width=256, height=256),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.2),
])
# Read an image with OpenCV and convert it to the RGB colorspace
image = cv2.imread("image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Augment an image
transformed = transform(image=image)
transformed_image = transformed["image"]
Please start with the introduction articles about why image augmentation is important and how it helps to build better models.
If you want to use Albumentations for a specific task such as classification, segmentation, or object detection, refer to the set of articles that has an in-depth description of this task. We also have a list of examples on applying Albumentations for different use cases.
We have examples of using Albumentations along with PyTorch and TensorFlow.
Check the online demo of the library. With it, you can apply augmentations to different images and see the result. Also, we have a list of all available augmentations and their targets.
- A list of papers that cite Albumentations.
- A list of teams that were using Albumentations and took high places in machine learning competitions.
- Open source projects that use Albumentations.
Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged. The list of pixel-level transforms:
- AdvancedBlur
- Blur
- CLAHE
- ChannelDropout
- ChannelShuffle
- ChromaticAberration
- ColorJitter
- Defocus
- Downscale
- Emboss
- Equalize
- FDA
- FancyPCA
- FromFloat
- GaussNoise
- GaussianBlur
- GlassBlur
- HistogramMatching
- HueSaturationValue
- ISONoise
- ImageCompression
- InvertImg
- MedianBlur
- MotionBlur
- MultiplicativeNoise
- Normalize
- PixelDistributionAdaptation
- Posterize
- RGBShift
- RandomBrightnessContrast
- RandomFog
- RandomGamma
- RandomGravel
- RandomRain
- RandomShadow
- RandomSnow
- RandomSunFlare
- RandomToneCurve
- RingingOvershoot
- Sharpen
- Solarize
- Spatter
- Superpixels
- TemplateTransform
- ToFloat
- ToGray
- ToRGB
- ToSepia
- UnsharpMask
- ZoomBlur
Spatial-level transforms will simultaneously change both an input image as well as additional targets such as masks, bounding boxes, and keypoints. The following table shows which additional targets are supported by each transform.
Transform | Image | Mask | BBoxes | Keypoints |
---|---|---|---|---|
Affine | ✓ | ✓ | ✓ | ✓ |
BBoxSafeRandomCrop | ✓ | ✓ | ✓ | |
CenterCrop | ✓ | ✓ | ✓ | ✓ |
CoarseDropout | ✓ | ✓ | ✓ | |
Crop | ✓ | ✓ | ✓ | ✓ |
CropAndPad | ✓ | ✓ | ✓ | ✓ |
CropNonEmptyMaskIfExists | ✓ | ✓ | ✓ | ✓ |
D4 | ✓ | ✓ | ✓ | ✓ |
ElasticTransform | ✓ | ✓ | ✓ | |
Flip | ✓ | ✓ | ✓ | ✓ |
GridDistortion | ✓ | ✓ | ✓ | |
GridDropout | ✓ | ✓ | ||
HorizontalFlip | ✓ | ✓ | ✓ | ✓ |
Lambda | ✓ | ✓ | ✓ | ✓ |
LongestMaxSize | ✓ | ✓ | ✓ | ✓ |
MaskDropout | ✓ | ✓ | ||
Morphological | ✓ | ✓ | ||
NoOp | ✓ | ✓ | ✓ | ✓ |
OpticalDistortion | ✓ | ✓ | ✓ | |
PadIfNeeded | ✓ | ✓ | ✓ | ✓ |
Perspective | ✓ | ✓ | ✓ | ✓ |
PiecewiseAffine | ✓ | ✓ | ✓ | ✓ |
PixelDropout | ✓ | ✓ | ||
RandomCrop | ✓ | ✓ | ✓ | ✓ |
RandomCropFromBorders | ✓ | ✓ | ✓ | ✓ |
RandomGridShuffle | ✓ | ✓ | ✓ | |
RandomResizedCrop | ✓ | ✓ | ✓ | ✓ |
RandomRotate90 | ✓ | ✓ | ✓ | ✓ |
RandomScale | ✓ | ✓ | ✓ | ✓ |
RandomSizedBBoxSafeCrop | ✓ | ✓ | ✓ | |
RandomSizedCrop | ✓ | ✓ | ✓ | ✓ |
Resize | ✓ | ✓ | ✓ | ✓ |
Rotate | ✓ | ✓ | ✓ | ✓ |
SafeRotate | ✓ | ✓ | ✓ | ✓ |
ShiftScaleRotate | ✓ | ✓ | ✓ | ✓ |
SmallestMaxSize | ✓ | ✓ | ✓ | ✓ |
Transpose | ✓ | ✓ | ✓ | ✓ |
VerticalFlip | ✓ | ✓ | ✓ | ✓ |
XYMasking | ✓ | ✓ | ✓ |
Transforms that mix several images into one
Transform | Image | Mask | BBoxes | Keypoints | Global Label |
---|---|---|---|---|---|
MixUp | ✓ | ✓ | ✓ |
To run the benchmark yourself, follow the instructions in benchmark/README.md
Results for running the benchmark on the first 2000 images from the ImageNet validation set using an AMD Ryzen Threadripper 3970X CPU. The table shows how many images per second can be processed on a single core; higher is better.
Library | Version |
---|---|
Python | 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] |
albumentations | 1.4.1 |
imgaug | 0.4.0 |
torchvision | 0.17.1+rocm5.7 |
numpy | 1.26.4 |
opencv-python-headless | 4.9.0.80 |
scikit-image | 0.22.0 |
scipy | 1.12.0 |
pillow | 10.2.0 |
kornia | 0.7.2 |
augly | 1.0.0 |
albumentations 1.4.0 |
torchvision 0.17.1+rocm5.7 |
kornia 0.7.2 |
augly 1.0.0 |
imgaug 0.4.0 |
|
---|---|---|---|---|---|
HorizontalFlip | 9843 ± 2135 | 2436 ± 29 | 1014 ± 3 | 3663 ± 18 | 4884 ± 51 |
VerticalFlip | 9898 ± 18 | 2570 ± 37 | 1024 ± 4 | 5325 ± 13 | 8683 ± 5 |
Rotate | 610 ± 4 | 153 ± 2 | 204 ± 1 | 626 ± 3 | 499 ± 5 |
Affine | 1705 ± 67 | 159 ± 1 | 200 ± 1 | - | 663 ± 24 |
Equalize | 1061 ± 14 | 337 ± 1 | 77 ± 1 | - | 845 ± 33 |
RandomCrop64 | 203197 ± 2105 | 15931 ± 27 | 837 ± 2 | 21858 ± 362 | 5681 ± 96 |
RandomResizedCrop | 2998 ± 30 | 1160 ± 4 | 190 ± 1 | - | - |
ShiftRGB | 1400 ± 3 | - | 435 ± 1 | - | 1528 ± 6 |
Resize | 2581 ± 3 | 1239 ± 1 | 197 ± 1 | 431 ± 1 | 1728 ± 1 |
RandomGamma | 4556 ± 3 | 230 ± 1 | 205 ± 1 | - | 2282 ± 110 |
Grayscale | 7234 ± 4 | 1539 ± 7 | 444 ± 3 | 2606 ± 2 | 918 ± 42 |
ColorJitter | 452 ± 43 | 51 ± 1 | 50 ± 1 | 221 ± 1 | - |
RandomPerspective | 465 ± 1 | 121 ± 1 | 115 ± 1 | - | 433 ± 16 |
GaussianBlur | 2315 ± 9 | 106 ± 2 | 72 ± 1 | 161 ± 1 | 1213 ± 3 |
MedianBlur | 3711 ± 2 | - | 2 ± 1 | - | 566 ± 3 |
MotionBlur | 2763 ± 25 | - | 101 ± 4 | - | 508 ± 2 |
Posterize | 4238 ± 51 | 2581 ± 20 | 284 ± 4 | - | 1893 ± 9 |
JpegCompression | 208 ± 1 | - | - | 692 ± 4 | 435 ± 1 |
GaussianNoise | 64 ± 9 | - | - | 67 ± 1 | 212 ± 16 |
Elastic | 129 ± 1 | 3 ± 1 | 1 ± 1 | - | 128 ± 1 |
To create a pull request to the repository, follow the documentation at CONTRIBUTING.md
In some systems, in the multiple GPU regime, PyTorch may deadlock the DataLoader if OpenCV was compiled with OpenCL optimizations. Adding the following two lines before the library import may help. For more details https://github.com/pytorch/pytorch/issues/1355
cv2.setNumThreads(0)
cv2.ocl.setUseOpenCL(False)
If you find this library useful for your research, please consider citing Albumentations: Fast and Flexible Image Augmentations:
@Article{info11020125,
AUTHOR = {Buslaev, Alexander and Iglovikov, Vladimir I. and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A.},
TITLE = {Albumentations: Fast and Flexible Image Augmentations},
JOURNAL = {Information},
VOLUME = {11},
YEAR = {2020},
NUMBER = {2},
ARTICLE-NUMBER = {125},
URL = {https://www.mdpi.com/2078-2489/11/2/125},
ISSN = {2078-2489},
DOI = {10.3390/info11020125}
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for albumentations
Similar Open Source Tools

albumentations
Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.

gpupixel
GPUPixel is a real-time, high-performance image and video filter library written in C++11 and based on OpenGL/ES. It incorporates a built-in beauty face filter that achieves commercial-grade beauty effects. The library is extremely easy to compile and integrate with a small size, supporting platforms including iOS, Android, Mac, Windows, and Linux. GPUPixel provides various filters like skin smoothing, whitening, face slimming, big eyes, lipstick, and blush. It supports input formats like YUV420P, RGBA, JPEG, PNG, and output formats like RGBA and YUV420P. The library's performance on devices like iPhone and Android is optimized, with low CPU usage and fast processing times. GPUPixel's lib size is compact, making it suitable for mobile and desktop applications.

Botright
Botright is a tool designed for browser automation that focuses on stealth and captcha solving. It uses a real Chromium-based browser for enhanced stealth and offers features like browser fingerprinting and AI-powered captcha solving. The tool is suitable for developers looking to automate browser tasks while maintaining anonymity and bypassing captchas. Botright is available in async mode and can be easily integrated with existing Playwright code. It provides solutions for various captchas such as hCaptcha, reCaptcha, and GeeTest, with high success rates. Additionally, Botright offers browser stealth techniques and supports different browser functionalities for seamless automation.

cool-ai-stuff
This repository contains an uncensored list of free to use APIs and sites for several AI models. > _This list is mainly managed by @zukixa, the queen of zukijourney, so any decisions may have bias!~_ > > **Scroll down for the sites, APIs come first!** * * * > [!WARNING] > We are not endorsing _any_ of the listed services! Some of them might be considered controversial. We are not responsible for any legal, technical or any other damage caused by using the listed services. Data is provided without warranty of any kind. **Use these at your own risk!** * * * # APIs Table of Contents #### Overview of Existing APIs #### Overview of Existing APIs -- Top LLM Models Available #### Overview of Existing APIs -- Top Image Models Available #### Overview of Existing APIs -- Top Other Features & Models Available #### Overview of Existing APIs -- Available Donator Perks * * * ## API List:* *: This list solely covers all providers I (@zukixa) was able to collect metrics in. Any mistakes are not my responsibility, as I am either banned, or not aware of x API. \ 1: Last Updated 4/14/24 ### Overview of APIs: | Service | # of Users1 | Link | Stablity | NSFW Ok? | Open Source? | Owner(s) | Other Notes | | ----------- | ---------- | ------------------------------------------ | ------------------------------------------ | --------------------------- | ------------------------------------------------------ | -------------------------- | ----------------------------------------------------------------------------------------------------------- | | zukijourney| 4441 | D | High | On /unf/, not /v1/ | ✅, Here | @zukixa | Largest & Oldest GPT-4 API still continuously around. Offers other popular AI-related Bots too. | | Hyzenberg| 1234 | D | High | Forbidden | ❌ | @thatlukinhasguy & @voidiii | Experimental sister API to Zukijourney. Successor to HentAI | | NagaAI | 2883 | D | High | Forbidden | ❌ | @zentixua | Honorary successor to ChimeraGPT, the largest API in history (15k users). | | WebRaftAI | 993 | D | High | Forbidden | ❌ | @ds_gamer | Largest API by model count. Provides a lot of service/hosting related stuff too. | | KrakenAI | 388 | D | High | Discouraged | ❌ | @paninico | It is an API of all time. | | ShuttleAI | 3585 | D | Medium | Generally Permitted | ❌ | @xtristan | Faked GPT-4 Before 1, 2 | | Mandrill | 931 | D | Medium | Enterprise-Tier-Only | ❌ | @fredipy | DALL-E-3 access pioneering API. Has some issues with speed & stability nowadays. | oxygen | 742 | D | Medium | Donator-Only | ❌ | @thesketchubuser | Bri'ish 🤮 & Fren'sh 🤮 | | Skailar | 399 | D | Medium | Forbidden | ❌ | @aquadraws | Service is the personification of the word 'feature creep'. Lots of things announced, not much operational. |

rag-web-ui
RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology. It helps enterprises and individuals build intelligent Q&A systems based on their own knowledge bases. By combining document retrieval and large language models, it delivers accurate and reliable knowledge-based question-answering services. The system is designed with features like intelligent document management, advanced dialogue engine, and a robust architecture. It supports multiple document formats, async document processing, multi-turn contextual dialogue, and reference citations in conversations. The architecture includes a backend stack with Python FastAPI, MySQL + ChromaDB, MinIO, Langchain, JWT + OAuth2 for authentication, and a frontend stack with Next.js, TypeScript, Tailwind CSS, Shadcn/UI, and Vercel AI SDK for AI integration. Performance optimization includes incremental document processing, streaming responses, vector database performance tuning, and distributed task processing. The project is licensed under the Apache-2.0 License and is intended for learning and sharing RAG knowledge only, not for commercial purposes.

Prompt-Engineering-Holy-Grail
The Prompt Engineering Holy Grail repository is a curated resource for prompt engineering enthusiasts, providing essential resources, tools, templates, and best practices to support learning and working in prompt engineering. It covers a wide range of topics related to prompt engineering, from beginner fundamentals to advanced techniques, and includes sections on learning resources, online courses, books, prompt generation tools, prompt management platforms, prompt testing and experimentation, prompt crafting libraries, prompt libraries and datasets, prompt engineering communities, freelance and job opportunities, contributing guidelines, code of conduct, support for the project, and contact information.

dora
Dataflow-oriented robotic application (dora-rs) is a framework that makes creation of robotic applications fast and simple. Building a robotic application can be summed up as bringing together hardwares, algorithms, and AI models, and make them communicate with each others. At dora-rs, we try to: make integration of hardware and software easy by supporting Python, C, C++, and also ROS2. make communication low latency by using zero-copy Arrow messages. dora-rs is still experimental and you might experience bugs, but we're working very hard to make it stable as possible.

chat-master
ChatMASTER is a self-built backend conversation service based on AI large model APIs, supporting synchronous and streaming responses with perfect printer effects. It supports switching between mainstream models such as DeepSeek, Kimi, Doubao, OpenAI, Claude3, Yiyan, Tongyi, Xinghuo, ChatGLM, Shusheng, and more. It also supports loading local models and knowledge bases using Ollama and Langchain, as well as online API interfaces like Coze and Gitee AI. The project includes Java server-side, web-side, mobile-side, and management background configuration. It provides various assistant types for prompt output and allows creating custom assistant templates in the management background. The project uses technologies like Spring Boot, Spring Security + JWT, Mybatis-Plus, Lombok, Mysql & Redis, with easy-to-understand code and comprehensive permission control using JWT authentication system for multi-terminal support.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

nntrainer
NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.

SpinQuant
SpinQuant is a tool designed for LLM quantization with learned rotations. It focuses on optimizing rotation matrices to enhance the performance of quantized models, narrowing the accuracy gap to full precision models. The tool implements rotation optimization and PTQ evaluation with optimized rotation, providing arguments for model name, batch sizes, quantization bits, and rotation options. SpinQuant is based on the findings that rotation helps in removing outliers and improving quantization, with specific enhancements achieved through learning rotation with Cayley optimization.

TRACE
TRACE is a temporal grounding video model that utilizes causal event modeling to capture videos' inherent structure. It presents a task-interleaved video LLM model tailored for sequential encoding/decoding of timestamps, salient scores, and textual captions. The project includes various model checkpoints for different stages and fine-tuning on specific datasets. It provides evaluation codes for different tasks like VTG, MVBench, and VideoMME. The repository also offers annotation files and links to raw videos preparation projects. Users can train the model on different tasks and evaluate the performance based on metrics like CIDER, METEOR, SODA_c, F1, mAP, Hit@1, etc. TRACE has been enhanced with trace-retrieval and trace-uni models, showing improved performance on dense video captioning and general video understanding tasks.

kangaroo
Kangaroo is an AI-powered SQL client and admin tool for popular databases like SQLite, MySQL, PostgreSQL, etc. It supports various functionalities such as table design, query, model, sync, export/import, and more. The tool is designed to be comfortable, fun, and developer-friendly, with features like code intellisense and autocomplete. Kangaroo aims to provide a seamless experience for database management across different operating systems.

go-cyber
Cyber is a superintelligence protocol that aims to create a decentralized and censorship-resistant internet. It uses a novel consensus mechanism called CometBFT and a knowledge graph to store and process information. Cyber is designed to be scalable, secure, and efficient, and it has the potential to revolutionize the way we interact with the internet.

ZhiLight
ZhiLight is a highly optimized large language model (LLM) inference engine developed by Zhihu and ModelBest Inc. It accelerates the inference of models like Llama and its variants, especially on PCIe-based GPUs. ZhiLight offers significant performance advantages compared to mainstream open-source inference engines. It supports various features such as custom defined tensor and unified global memory management, optimized fused kernels, support for dynamic batch, flash attention prefill, prefix cache, and different quantization techniques like INT8, SmoothQuant, FP8, AWQ, and GPTQ. ZhiLight is compatible with OpenAI interface and provides high performance on mainstream NVIDIA GPUs with different model sizes and precisions.

web-builder
Web Builder is a low-code front-end framework based on Material for Angular, offering a rich component library for excellent digital innovation experience. It allows rapid construction of modern responsive UI, multi-theme, multi-language web pages through drag-and-drop visual configuration. The framework includes a beautiful admin theme, complete front-end solutions, and AI integration in the Pro version for optimizing copy, creating components, and generating pages with a single sentence.
For similar tasks

albumentations
Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.
For similar jobs

albumentations
Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.

clarifai-python
The Clarifai Python SDK offers a comprehensive set of tools to integrate Clarifai's AI platform to leverage computer vision capabilities like classification , detection ,segementation and natural language capabilities like classification , summarisation , generation , Q&A ,etc into your applications. With just a few lines of code, you can leverage cutting-edge artificial intelligence to unlock valuable insights from visual and textual content.

ailia-models
The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024

edenai-apis
Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !

Awesome-Segment-Anything
The Segment Anything Model (SAM) is a powerful tool that allows users to segment any object in an image with just a few clicks. This makes it a great tool for a variety of tasks, such as object detection, tracking, and editing. SAM is also very easy to use, making it a great option for both beginners and experienced users.