SimpleAICV_pytorch_training_examples

SimpleAICV:pytorch training and testing examples.

Stars: 429

Visit

SimpleAICV_pytorch_training_examples is a repository that provides simple training and testing examples for various computer vision tasks such as image classification, object detection, semantic segmentation, instance segmentation, knowledge distillation, contrastive learning, masked image modeling, OCR text detection, OCR text recognition, human matting, salient object detection, interactive segmentation, image inpainting, and diffusion model tasks. The repository includes support for multiple datasets and networks, along with instructions on how to prepare datasets, train and test models, and use gradio demos. It also offers pretrained models and experiment records for download from huggingface or Baidu-Netdisk. The repository requires specific environments and package installations to run effectively.

README:

📢 News!
My column
Introduction
All task training results
Environments
Download my pretrained models and experiments records
Prepare datasets
How to train or test a model
How to use gradio demo
Reference
Citation

📢 News!

2025/02/16: train light segment-anything model with bf16.

My column

https://www.zhihu.com/column/c_1692623656205897728

Introduction

This repository provides simple training and testing examples for following tasks:

task	support dataset	support network
Image classification task	CIFAR100 ImageNet1K(ILSVRC2012) ImageNet21K(Winter 2021 release)	Convformer DarkNet ResNet VAN ViT
Knowledge distillation task	ImageNet1K(ILSVRC2012)	DML loss(ResNet) KD loss(ResNet)
Masked image modeling task	ImageNet1K(ILSVRC2012)	MAE(ViT)
Object detection task	COCO2017 Objects365(v2,2020) VOC2007 and VOC2012	DETR DINO-DETR RetinaNet FCOS
Semantic segmentation task	ADE20K COCO2017	DeepLabv3+
Instance segmentation task	COCO2017	SOLOv2 YOLACT
Salient object detection task	combine dataset	pfan-segmentation
Human matting task	combine dataset	pfan-matting
OCR text detection task	combine dataset	DBNet
OCR text recognition task	combine dataset	CTC Model
Face detection task	combine dataset	RetinaFace
Face parsing task	FaceSynthetics CelebAMask-HQ	pfan-face-parsing sapiens_face_parsing
Human parsing task	LIP CIHP	pfan-human-parsing sapiens_human_parsing
Interactive segmentation task	combine dataset	SAM(segment-anything) light_sam light_sam_matting
Diffusion model task	CelebA-HQ CIFAR10 CIFAR100 FFHQ	DDPM DDIM

All task training results

Most experiments were trained on 2-8 RTX4090D GPUs, pytorch2.3, ubuntu22.04.

See all task training results in results.md.

Environments

1、This repository only supports running on ubuntu(verison>=22.04 LTS).

2、This repository only support one node one gpu/one node multi gpus mode with pytorch DDP training.

3、Please make sure your Python environment version>=3.9 and pytorch version>=2.0.

4、If you want to use torch.complie() function,using pytorch2.0/2.2/2.3,don't use pytorch2.1.

Use pip or conda to install those Packages in your Python environment:

torch
torchvision
pillow
numpy
Cython
pycocotools
opencv-python
scipy
einops
scikit-image
pyclipper
shapely
imagesize
nltk
tqdm
yapf
onnx
onnxruntime
onnxsim
thop==0.1.1.post2209072238
gradio==3.50.0
transformers==4.41.2
open-clip-torch==2.24.0

If you want to use xformers,install xformers Packge from offical github repository:

https://github.com/facebookresearch/xformers

If you want to use dino-detr model,install MultiScaleDeformableAttention Packge in your Python environment:

cd to simpleAICV/detection/compile_multiscale_deformable_attention,then run commands:

chmod +x make.sh
./make.sh

Download my pretrained models and experiments records

You can download all my pretrained models and experiments records/checkpoints from huggingface or Baidu-Netdisk.

If you only want to download all my pretrained models(model.state_dict()),you can download pretrained_models folder.

# huggingface
https://huggingface.co/zgcr654321/0.classification_training/tree/main
https://huggingface.co/zgcr654321/1.distillation_training/tree/main
https://huggingface.co/zgcr654321/2.masked_image_modeling_training/tree/main
https://huggingface.co/zgcr654321/3.detection_training/tree/main
https://huggingface.co/zgcr654321/4.semantic_segmentation_training/tree/main
https://huggingface.co/zgcr654321/5.instance_segmentation_training/tree/main
https://huggingface.co/zgcr654321/6.salient_object_detection_training/tree/main
https://huggingface.co/zgcr654321/7.human_matting_training/tree/main
https://huggingface.co/zgcr654321/8.ocr_text_detection_training/tree/main
https://huggingface.co/zgcr654321/9.ocr_text_recognition_training/tree/main
https://huggingface.co/zgcr654321/10.face_detection_training/tree/main
https://huggingface.co/zgcr654321/11.face_parsing_training/tree/main
https://huggingface.co/zgcr654321/12.human_parsing_training/tree/main
https://huggingface.co/zgcr654321/13.interactive_segmentation_training/tree/main
https://huggingface.co/zgcr654321/20.diffusion_model_training/tree/main
https://huggingface.co/zgcr654321/pretrained_models/tree/main

# Baidu-Netdisk
链接：https://pan.baidu.com/s/1yhEwaZhrb2NZRpJ5eEqHBw 
提取码：rgdo

Prepare datasets