Login

Awesome-CVPR2024-AIGC

A Collection of Papers and Codes for CVPR2024 AIGC

Stars: 251

Visit

screenshot

README:

Awesome-CVPR2024-AIGC

A Collection of Papers and Codes for CVPR2024 AIGC

整理汇总下今年CVPR AIGC相关的论文和代码，具体如下。

欢迎star，fork和PR~

Please feel free to star, fork or PR if helpful~

参考或转载请注明出处

CVPR2024官网：https://cvpr.thecvf.com/Conferences/2024

CVPR完整论文列表：https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

开会时间：2024年6月17日-6月21日

论文接收公布时间：2024年2月27日

【Contents】

1.图像生成(Image Generation/Image Synthesis)
2.图像编辑（Image Editing)
3.视频生成(Video Generation/Image Synthesis)
4.视频编辑(Video Editing)
5.3D生成(3D Generation/3D Synthesis)
6.3D编辑(3D Editing)
7.多模态大语言模型(Multi-Modal Large Language Model)
8.其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

Accelerating Diffusion Sampling with Optimized Time Steps

Paper: https://arxiv.org/abs/2402.17376
Code: https://github.com/scxue/DM-NonUniform

Adversarial Text to Continuous Image Generation

Paper: https://openreview.net/forum?id=9X3UZJSGIg9

Amodal Completion via Progressive Mixed Context Diffusion

Paper: https://arxiv.org/abs/2312.15540
Code: https://github.com/k8xu/amodal

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Paper: https://arxiv.org/abs/2403.10255
Code:

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Paper: https://arxiv.org/abs/2312.12471
Code: https://github.com/zkawfanx/Atlantis

Attention Calibration for Disentangled Text-to-Image Personalization

Paper: https://arxiv.org/abs/2403.18551
Code: https://github.com/Monalissaa/DisenDiff

CapHuman: Capture Your Moments in Parallel Universes

Paper: https://arxiv.org/abs/2402.18078
Code: https://github.com/VamosC/CapHuman

CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

Paper: https://arxiv.org/abs/2404.00521
Code:

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

Paper: https://arxiv.org/abs/2311.15773
Code:

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

Paper: https://arxiv.org/abs/2402.00627
Code: https://github.com/YanzuoLu/CFLD

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Paper: https://arxiv.org/abs/2310.01407
Code: https://github.com/fast-codi/CoDi

Condition-Aware Neural Network for Controlled Image Generation

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper: https://arxiv.org/abs/2404.01294
Code: https://github.com/cosmicman-cvpr2024/CosmicMan

Countering Personalized Text-to-Image Generation with Influence Watermarks

Cross Initialization for Personalized Text-to-Image Generation

Paper: https://arxiv.org/abs/2312.15905
Code:

Customization Assistant for Text-to-image Generation

Paper: https://arxiv.org/abs/2312.03045

DeepCache: Accelerating Diffusion Models for Free

Paper: https://arxiv.org/abs/2312.00858
Code: https://github.com/horseee/DeepCache

DemoFusion: Democratising High-Resolution Image Generation With No $

Paper: https://arxiv.org/abs/2311.16973
Code: https://github.com/PRIS-CV/DemoFusion

Desigen: A Pipeline for Controllable Design Template Generation

Paper: https://arxiv.org/abs/2403.09093
Code: https://github.com/whaohan/desigen

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Paper: https://arxiv.org/abs/2404.01342
Code:https://github.com/OpenGVLab/DiffAgent

Diffusion-driven GAN Inversion for Multi-Modal Facial Image Generation

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper: https://arxiv.org/abs/2402.19481
Code: https://github.com/mit-han-lab/distrifuser

Diversity-aware Channel Pruning for StyleGAN Compression

Paper: https://arxiv.org/abs/2403.13548
Code: https://github.com/jiwoogit/DCP-GAN

Discriminative Probing and Tuning for Text-to-Image Generation

Paper: https://www.arxiv.org/abs/2403.04321
Code: https://github.com/LgQu/DPT-T2I

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Paper:
Code: https://github.com/haofengl/DragNoise

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper: https://arxiv.org/abs/2402.09812
Code: https://github.com/KU-CVLAB/DreamMatcher

Dynamic Prompt Optimizing for Text-to-Image Generation

Paper: https://arxiv.org/abs/2404.04095
Code: https://github.com/Mowenyii/PAE

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Paper: https://arxiv.org/abs/2312.04655
Code: https://github.com/eclipse-t2i/eclipse-inference

Efficient Dataset Distillation via Minimax Diffusion

Paper: https://arxiv.org/abs/2311.15529
Code: https://github.com/vimar-gu/MinimaxDiffusion

ElasticDiffusion: Training-free Arbitrary Size Image Generation

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Paper: https://arxiv.org/abs/2401.04608
Code: https://github.com/JingyuanYY/EmoGen

Enabling Multi-Concept Fusion in Text-to-Image Models

Paper:
Code:

Exact Fusion via Feature Distribution Matching for Few-shot Image Generation

Paper:
Code:

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Paper: https://arxiv.org/abs/2403.06775
Code:

Fast ODE-based Sampling for Diffusion Models in Around 5 Steps

Paper: https://arxiv.org/abs/2312.00094
Code: https://github.com/zju-pi/diff-sampler

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Paper: https://arxiv.org/abs/2312.07536
Code: https://github.com/genforce/freecontrol

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Paper:
Code: https://github.com/aim-uofa/FreeCustom

Generalizable Tumor Synthesis

Generating Daylight-driven Architectural Design via Diffusion Models

Paper: https://arxiv.org/abs/2404.13353
Code:

Generative Unlearning for Any Identity

Paper: https://arxiv.org/abs/2405.09879
Code: https://github.com/JJuOn/GUIDE

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

Paper: https://arxiv.org/abs/2403.01693

High-fidelity Person-centric Subject-to-Image Synthesis

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

Paper:
Code: https://github.com/xiefan-guo/initno

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

Paper: https://arxiv.org/abs/2304.03411

InstanceDiffusion: Instance-level Control for Image Generation

Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper: https://arxiv.org/abs/2401.01952
Code:

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

Paper: https://arxiv.org/abs/2306.00973
Code: https://github.com/haoningwu3639/StoryGen

InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

Paper: https://arxiv.org/abs/2312.05849
Code: https://github.com/jiuntian/interactdiffusion

Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

Paper: https://arxiv.org/abs/2308.15692

Inversion-Free Image Editing with Natural Language

Paper: https://arxiv.org/abs/2312.04965
Code: https://github.com/sled-group/InfEdit

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

Paper:
Code: https://github.com/PanchengZhao/LAKE-RED

Learned representation-guided diffusion models for large-image generation

Learning Continuous 3D Words for Text-to-Image Generation

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

Paper: https://arxiv.org/abs/2311.15841
Code:

Learning Multi-dimensional Human Preference for Text-to-Image Generation

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Paper:
Code: https://github.com/ewrfcas/LeftRefill

MACE: Mass Concept Erasure in Diffusion Models

Paper: https://arxiv.org/abs/2402.05408
Code: https://github.com/Shilin-LU/MACE

MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

Paper: https://arxiv.org/abs/2308.10997

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

Paper: https://arxiv.org/abs/2403.04290

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

Paper: https://arxiv.org/abs/2402.05408
Code: https://github.com/limuloo/MIGC

MindBridge: A Cross-Subject Brain Decoding Framework

Paper: https://arxiv.org/abs/2404.07850
Code: https://github.com/littlepure2333/MindBridge

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

On the Scalability of Diffusion-based Text-to-Image Generation

Paper: https://arxiv.org/abs/2404.02883
Code:

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Paper: https://arxiv.org/abs/2404.07990
Code: https://github.com/Picsart-AI-Research/OpenBias

Personalized Residuals for Concept-Driven Text-to-Image Generation

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Paper: https://arxiv.org/abs/2404.15081
Code:

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper: https://arxiv.org/abs/2312.04461
Code: https://github.com/TencentARC/PhotoMaker

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

Paper:
Code: https://github.com/cszy98/PLACE

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Paper: https://arxiv.org/abs/2305.16223
Code: https://github.com/SHI-Labs/Prompt-Free-Diffusion

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

Paper: https://arxiv.org/abs/2311.17002
Code: https://github.com/ali-vilab/Ranni

Readout Guidance: Learning Control from Diffusion Features

Paper: https://arxiv.org/abs/2312.02150
Code: https://github.com/google-research/readout_guidance

Relation Rectification in Diffusion Model

Paper:
Code: https://github.com/WUyinwei-hah/RRNet

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

Paper: https://arxiv.org/abs/2404.05384
Code: https://github.com/SmilesDZgk/S-CFG

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Paper: https://arxiv.org/abs/2311.13602
Code: https://github.com/CyberAgentAILab/RALF

Rich Human Feedback for Text-to-Image Generation

Paper: https://arxiv.org/abs/2312.10240

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Paper: https://arxiv.org/abs/2401.08053
Code:

Self-correcting LLM-controlled Diffusion Models

Paper: https://arxiv.org/abs/2311.16090
Code: https://github.com/tsunghan-wu/SLD

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

Paper: https://arxiv.org/abs/2311.17216
Code: https://github.com/hangligit/InterpretDiffusion

Shadow Generation for Composite Image Using Diffusion Model

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper: https://arxiv.org/abs/2312.04410
Code: https://github.com/SHI-Labs/Smooth-Diffusion

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Paper: https://arxiv.org/abs/2312.16272
Code: https://github.com/Xiaojiu-z/SSR_Encoder

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Paper: https://arxiv.org/abs/2312.01725
Code: https://github.com/rlawjdghek/StableVITON

Structure-Guided Adversarial Training of Diffusion Models

Paper: https://arxiv.org/abs/2402.17563
Code:

Style Aligned Image Generation via Shared Attention

Paper: https://arxiv.org/abs/2312.02133
Code: https://github.com/google/style-aligned/

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Paper: https://arxiv.org/abs/2312.16476
Code: https://github.com/ximinng/SVGDreamer

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

Paper: https://arxiv.org/abs/2310.08129
Code: https://github.com/zzjchen/Tailored-Visions

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

Paper: https://arxiv.org/abs/2403.08381
Code: https://github.com/PangzeCheung/SingDiffusion

Taming Stable Diffusion for Text to 360∘ Panorama Image Generation

TextCraftor: Your Text Encoder Can be Image Quality Controller

Paper: https://arxiv.org/abs/2403.18978
Code:

Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

Paper: https://arxiv.org/abs/2403.06247

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

Paper: https://arxiv.org/abs/2311.16503
Code: https://github.com/ModelTC/TFMQ-DM

TokenCompose: Grounding Diffusion with Token-level Supervision

Paper: https://arxiv.org/abs/2312.03626
Code: https://github.com/mlpc-ucsd/TokenCompose

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

Paper: https://arxiv.org/abs/2403.05239

Towards Memorization-Free Diffusion Models

Paper: https://arxiv.org/abs/2404.00922
Code:

Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper: https://arxiv.org/abs/2311.09257

UniGS: Unified Representation for Image Generation and Segmentation

Paper: https://arxiv.org/abs/2312.01985
Code: https://github.com/qqlu/Entity

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper: https://arxiv.org/abs/2311.13231
Code: https://github.com/yk7333/d3po

ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models

Paper: https://arxiv.org/abs/2403.01807
Code: https://github.com/facebookresearch/ViewDiff

When StyleGAN Meets Stable Diffusion: a 𝒲+ Adapter for Personalized Image Generation

Paper: https://arxiv.org/abs/2311.17461
Code: https://github.com/csxmli2016/w-plus-adapter

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Paper: https://arxiv.org/abs/2312.02238
Code: https://github.com/showlab/X-Adapter

2.图像编辑(Image Editing)

An Edit Friendly DDPM Noise Space: Inversion and Manipulations

Paper: https://arxiv.org/abs/2304.06140
Code: https://github.com/inbarhub/DDPM_inversion

Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth

Paper:
Code: https://github.com/Snowfallingplum/CSD-MT

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Paper: https://arxiv.org/abs/2403.06951
Code: https://github.com/Tianhao-Qi/DEADiff_code

Deformable One-shot Face Stylization via DINO Semantic Guidance

Paper: https://arxiv.org/abs/2403.00459
Code: https://github.com/zichongc/DoesFS

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

Paper: https://arxiv.org/abs/2312.04364
Code: https://github.com/ChenDarYen/DemoCaricature

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

Paper:
Code: https://github.com/HansSunY/DiffAM

DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

Paper: https://arxiv.org/abs/2312.07409
Code: https://github.com/Kevin-thu/DiffMorpher

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Paper: https://arxiv.org/abs/2312.09168
Code: https://github.com/DiffusionLight/DiffusionLight

Diffusion Models Without Attention

Paper: https://arxiv.org/abs/2311.18257
Code: https://github.com/Kevin-thu/DiffMorpher

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Paper: https://arxiv.org/abs/2403.02981
Code: https://github.com/xuesong39/DAC

Edit One for All: Interactive Batch Image Editing

Paper: https://arxiv.org/abs/2401.10219
Code: https://github.com/thaoshibe/edit-one-for-all

Face2Diffusion for Fast and Editable Face Personalization

Paper: https://arxiv.org/abs/2403.05094
Code: https://github.com/mapooon/Face2Diffusion

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image

Image Sculpting: Precise Object Editing with 3D Geometry Control

Paper: https://arxiv.org/abs/2401.01702
Code: https://github.com/vision-x-nyu/image-sculpting

Inversion-Free Image Editing with Natural Language

Paper: hhttps://arxiv.org/abs/2312.04965
Code: https://github.com/sled-group/InfEdit

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

Paper: https://arxiv.org/abs/2303.17546
Code: https://github.com/Picsart-AI-Research/PAIR-Diffusion

Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper: https://arxiv.org/abs/2312.13964
Code: https://github.com/open-mmlab/PIA

FreeDrag: Feature Dragging for Reliable Point-based Image Editing

Paper: https://arxiv.org/abs/2307.04684
Code: https://github.com/LPengYang/FreeDrag

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper: https://arxiv.org/abs/2403.00483
Code:

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Paper: https://arxiv.org/abs/2312.06739
Code: https://github.com/TencentARC/SmartEdit

Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer

Paper: https://arxiv.org/abs/2312.09008
Code: https://github.com/jiwoogit/StyleID

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

Paper: https://arxiv.org/abs/2402.18848
Code:

Text-Driven Image Editing via Learnable Regions

Paper: https://arxiv.org/abs/2311.16432
Code: https://github.com/yuanze-lin/Learnable_Regions

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

Paper: https://arxiv.org/abs/2404.01089
Code: https://github.com/Gal4way/TPD

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Paper: https://arxiv.org/abs/2404.11120
Code: https://github.com/SherryXTChen/TiNO-Edit

UniHuman: A Unified Model For Editing Human Images in the Wild

Paper: https://arxiv.org/abs/2312.14985
Code: https://github.com/NannanLi999/UniHuman

ZONE: Zero-Shot Instruction-Guided Local Editing

Paper: https://arxiv.org/abs/2312.16794
Code: https://github.com/lsl001006/ZONE

3.视频生成(Video Generation/Video Synthesis)

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

Paper: https://arxiv.org/abs/2401.06578
Code: https://github.com/Akaneqwq/360DVD

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

Paper: https://arxiv.org/abs/2312.15770
Code: https://github.com/ali-vilab/VGen

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

Paper: https://arxiv.org/abs/2312.02813
Code: https://github.com/MCG-NJU/BIVDiff

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis

Paper: https://arxiv.org/abs/2403.17936
Code: https://github.com/m-hamza-mughal/convofusion

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Paper: https://arxiv.org/abs/2404.01862
Code: https://github.com/thuhcsi/S2G-MDDiffusion

Delving Deep into Diffusion Transformers for Image and Video Generation

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

DisCo: Disentangled Control for Realistic Human Dance Generation

Paper: https://arxiv.org/abs/2307.00040
Code: https://github.com/Wangt-CN/DisCo

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Generation

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Grid Diffusion Models for Text-to-Video Generation

Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

LAMP: Learn A Motion Pattern for Few-Shot Video Generation

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

Paper: https://arxiv.org/abs/2402.17364
Code: https://github.com/zhangzc21/DynTet

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives

Paper: https://arxiv.org/abs/2403.10518
Code: https://github.com/li-ronghui/LODGE

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Paper: https://arxiv.org/abs/2311.16498
Code: https://github.com/magic-research/magic-animate

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

Paper: https://arxiv.org/abs/2403.16510
Code: https://github.com/ICTMCG/Make-Your-Anchor

Make Your Dream A Vlog

Paper: https://arxiv.org/abs/2401.09414
Code: https://github.com/Vchitect/Vlogger

Make Pixels Dance: High-Dynamic Video Generation

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Paper: https://arxiv.org/abs/2311.16813
Code: https://github.com/wenyuqing/panacea

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/yzxing87/Seeing-and-Hearing

SimDA: Simple Diffusion Adapter for Efficient Video Generation

Paper: https://arxiv.org/abs/2308.09710
Code: https://github.com/ChenHsing/SimDA

Simple but Effective Text-to-Video Generation with Grid Diffusion Models

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

Paper: https://arxiv.org/abs/2403.14186
Code: https://github.com/jeolpyeoni/StyleCineGAN

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Paper: https://arxiv.org/abs/2311.17590
Code: https://github.com/ZiqiaoPeng/SyncTalk

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Paper: https://arxiv.org/abs/2311.17590
Code:

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Paper: https://arxiv.org/abs/2404.16306
Code: https://github.com/showlab/Tune-A-Video

VideoBooth: Diffusion-based Video Generation with Image Prompts

Paper: https://arxiv.org/abs/2312.00777
Code: https://github.com/Vchitect/VideoBooth

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Paper: https://arxiv.org/abs/2401.09047
Code: https://github.com/AILab-CVC/VideoCrafter

Video-P2P: Video Editing with Cross-attention Control

Paper: https://arxiv.org/abs/2303.04761
Code: https://github.com/dvlab-research/Video-P2P

4.视频编辑(Video Editing)

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Paper: https://arxiv.org/abs/2312.05856
Code: https://github.com/STEM-Inv/stem-inv

CAMEL: Causal Motion Enhancement tailored for lifting text-driven video editing

Paper:
Code: https://github.com/zhangguiwei610/CAMEL

CCEdit: Creative and Controllable Video Editing via Diffusion Models

Paper: https://arxiv.org/abs/2309.16496
Code: https://github.com/RuoyuFeng/CCEdit

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Paper: https://arxiv.org/abs/2308.07926
Code: https://github.com/qiuyu96/CoDeF

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Paper: https://arxiv.org/abs/2403.12962
Code: https://github.com/williamyang1991/FRESCO/tree/main

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

Paper: https://arxiv.org/abs/2312.04524
Code: https://github.com/rehg-lab/RAVE

VidToMe: Video Token Merging for Zero-Shot Video Editing

Paper: https://arxiv.org/abs/2312.10656
Code: https://github.com/lixirui142/VidToMe

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

5.3D生成(3D Generation/3D Synthesis)

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Paper: https://arxiv.org/abs/2310.08528
Code: https://github.com/hustvl/4DGaussians

Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling

Paper: https://arxiv.org/abs/2311.16096
Code: https://github.com/lizhe00/AnimatableGaussians

A Unified Approach for Text- and Image-guided 4D Scene Generation

Paper: https://arxiv.org/abs/2311.16854
Code: https://github.com/NVlabs/dream-in-4d

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

Paper: https://arxiv.org/abs/2312.02136
Code: https://github.com/zqh0253/BerfScene

CAD: Photorealistic 3D Generation via Adversarial Distillation

Paper: https://arxiv.org/abs/2312.06663
Code: https://github.com/raywzy/CAD

CAGE: Controllable Articulation GEneration

Paper: https://arxiv.org/abs/2312.09570
Code: https://github.com/3dlg-hcvc/cage

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Paper: https://arxiv.org/abs/2309.00610
Code: https://github.com/hzxie/CityDreamer

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

Paper: https://arxiv.org/abs/2401.09050
Code: https://github.com/sail-sg/Consistent3D

ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis

Paper: https://arxiv.org/abs/2311.17123
Code: https://github.com/gaoxiangjun/ConTex-Human

ControlRoom3D: Room Generation using Semantic Proxy Rooms

Paper： https://arxiv.org/abs/2312.05208

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Paper: https://arxiv.org/abs/2403.13667
Code: https://github.com/Carmenw1203/DanceCamera3D-Official

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis

Paper: https://arxiv.org/abs/2312.13016
Code: https://github.com/FreedomGu/DiffPortrait3D

DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation

Paper: https://arxiv.org/abs/2401.04747
Code: https://github.com/JeremyCJM/DiffSHEG

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Paper: https://arxiv.org/abs/2303.14207
Code: https://github.com/tangjiapeng/DiffuScene

Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features

Paper: https://arxiv.org/abs/2311.17024
Code: https://github.com/niladridutt/Diffusion-3D-Features

Diffusion Time-step Curriculum for One Image to 3D Generation

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

Paper: https://arxiv.org/abs/2304.00916
Code: https://github.com/yukangcao/DreamAvatar

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

Paper: https://arxiv.org/abs/2312.03611
Code: https://github.com/yhyang-myron/DreamComposer

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

Paper: https://arxiv.org/abs/2312.06439
Code: https://github.com/tyhuang0428/DreamControl

Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Paper: https://arxiv.org/abs/2312.04466
Code: https://github.com/kiranchhatre/amuse

EscherNet: A Generative Model for Scalable View Synthesis

Paper: https://arxiv.org/abs/2402.03908
Code: https://github.com/hzxie/city-dreamer

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

Paper: https://arxiv.org/abs/2310.08529
Code: https://github.com/hustvl/GaussianDreamer

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

Paper: https://arxiv.org/abs/2401.04092
Code: https://github.com/3DTopia/GPTEval3D

Gaussian Shell Maps for Efficient 3D Human Generation

Paper: https://arxiv.org/abs/2311.17857
Code: https://github.com/computational-imaging/GSM

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Paper: https://arxiv.org/abs/2312.15980
Code: https://github.com/byeongjun-park/HarmonyView

HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding

Paper: https://arxiv.org/abs/2312.03050

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Paper: https://arxiv.org/abs/2312.09067
Code: https://github.com/allenai/Holodeck

HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation

Paper: https://arxiv.org/abs/2310.01406

Interactive3D: Create What You Want by Interactive 3D Generation

Paper: https://hub.baai.ac.cn/paper/494efc8d-f4ed-4ca4-8469-b882f9489f5e

InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusio

Paper: https://arxiv.org/abs/2403.17422
Code: https://github.com/jyunlee/InterHandGen

Intrinsic Image Diffusion for Single-view Material Estimation

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

Paper: https://arxiv.org/abs/2403.16897
Code: https://github.com/junshutang/Make-It-Vivid

MoMask: Generative Masked Modeling of 3D Human Motions

Paper: https://arxiv.org/abs/2312.00063
Code: https://github.com/EricGuo5513/momask-codes

Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Paper: https://arxiv.org/abs/2312.06725
Code: https://github.com/huanngzh/EpiDiff

OED: Towards One-stage End-to-End Dynamic Scene Graph Generation

Paper: https://arxiv.org/abs/2405.16925

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Paper: https://arxiv.org/abs/2311.07885
Code: https://github.com/SUDO-AI-3D/One2345plus

Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering

Paper: https://arxiv.org/abs/2312.11360
Code: https://github.com/postech-ami/Paint-it

PEGASUS: Personalized Generative 3D Avatars with Composable Attributes

Paper: https://arxiv.org/abs/2402.10636
Code: https://github.com/snuvclab/pegasus

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Paper: https://arxiv.org/abs/2311.12198
Code: https://github.com/XPandora/PhysGaussian

RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.

Paper: https://arxiv.org/abs/2311.16918
Code: https://github.com/modelscope/richdreamer

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

Paper: https://arxiv.org/abs/2311.17261
Code: https://github.com/daveredrum/SceneTex

SceneWiz3D: Towards Text-guided 3D Scene Composition

Paper: https://arxiv.org/abs/2312.08885
Code: https://github.com/zqh0253/SceneWiz3D

SemCity: Semantic Scene Generation with Triplane Diffusion

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper: https://arxiv.org/abs/2312.06655
Code: https://github.com/liuff19/Sherpa3D

SIGNeRF: Scene Integrated Generation for Neural Radiance Fields

Paper: https://arxiv.org/abs/2401.01647
Code: https://github.com/cgtuebingen/SIGNeRF

Single Mesh Diffusion Models with Field Latents for Texture Generation

SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Paper: https://arxiv.org/abs/2311.15855
Code: https://github.com/SiTH-Diffusion/SiTH

SPAD: Spatially Aware Multiview Diffusers

Paper: https://arxiv.org/abs/2402.05235
Code: https://github.com/yashkant/spad

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Paper: https://arxiv.org/abs/2312.04963
Code: https://github.com/BiDiff/bidiff

Text-to-3D using Gaussian Splatting

Paper: https://arxiv.org/abs/2309.16585
Code: https://github.com/gsgen3d/gsgen

The More You See in 2D, the More You Perceive in 3D

Paper: https://arxiv.org/abs/2404.03652
Code: https://github.com/sap3d/sap3d

Tiger: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

Towards Realistic Scene Generation with LiDAR Diffusion Models

Paper: https://arxiv.org/abs/2404.00815
Code: https://github.com/hancyran/LiDAR-Diffusion

UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion

Paper: https://arxiv.org/abs/2404.06851
Code: https://github.com/weiqi-zhang/UDiFF

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Paper: https://arxiv.org/abs/2312.01305
Code: https://github.com/ubc-vision/vivid123

6.3D编辑(3D Editing)

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Paper: https://arxiv.org/abs/2311.14521
Code: https://github.com/buaacyw/GaussianEditor

GenN2N: Generative NeRF2NeRF Translation

Paper: https://arxiv.org/abs/2404.02788
Code: https://github.com/Lxiangyue/GenN2N

Makeup Prior Models for 3D Facial Makeup Estimation and Applications

Paper: https://arxiv.org/abs/2403.17761
Code: https://github.com/YangXingchao/makeup-priors

7.多模态大语言模型(Multi-Modal Large Language Models)

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper: https://arxiv.org/abs/2312.03818
Code: https://github.com/SunzeY/AlphaCLIP

Anchor-based Robust Finetuning of Vision-Language Models

Paper: https://arxiv.org/abs/2404.06244
Code: https://github.com/LixDemon/ARF

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

Paper: https://arxiv.org/abs/2403.11549
Code: https://github.com/JiazuoYu/MoE-Adapters4CL

Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction

Paper: https://arxiv.org/abs/2403.18447
Code: https://github.com/InhwanBae/LMTrajectory

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Paper: https://arxiv.org/abs/2311.08046
Code: https://github.com/PKU-YuanGroup/Chat-UniVi

Compositional Chain-of-Thought Prompting for Large Multimodal Models

Paper: https://arxiv.org/abs/2311.17076
Code: https://github.com/chancharikmitra/CCoT

Describing Differences in Image Sets with Natural Language

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Paper: https://arxiv.org/abs/2403.17589
Code: https://github.com/YBZh/DMN

Efficient Stitchable Task Adaptation

Paper: https://arxiv.org/abs/2311.17352
Code: https://github.com/ziplab/Stitched_LLaMA

Efficient Test-Time Adaptation of Vision-Language Models

Paper: https://arxiv.org/abs/2403.18293
Code: https://github.com/kdiAAA/TDA

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

FairCLIP: Harnessing Fairness in Vision-Language Learning

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Paper: https://arxiv.org/abs/2404.16123
Code:

Generative Multimodal Models are In-Context Learners

Paper: https://arxiv.org/abs/2312.13286
Code: https://github.com/baaivision/Emu/tree/main/Emu2

GLaMM: Pixel Grounding Large Multimodal Model

Paper: https://arxiv.org/abs/2311.03356
Code: https://github.com/mbzuai-oryx/groundingLMM

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Paper: https://arxiv.org/abs/2312.02980
Code: https://github.com/Pointcept/GPT4Point

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper: https://arxiv.org/abs/2312.14238
Code: https://github.com/OpenGVLab/InternVL

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

Paper: https://arxiv.org/abs/2404.00909
Code:

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

Paper: https://arxiv.org/abs/2312.02439
Code: https://github.com/sail-sg/CLoT

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Paper: https://arxiv.org/abs/2311.11860
Code: https://github.com/rshaojimmy/JiuTian

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Paper: https://arxiv.org/abs/2311.18651
Code: https://github.com/Open3DA/LL3DA

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Paper: https://arxiv.org/abs/2311.16922
Code: https://github.com/DAMO-NLP-SG/VCD

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper: https://arxiv.org/abs/2311.17049
Code: https://github.com/apple/ml-mobileclip

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Paper: https://arxiv.org/abs/2403.07839
Code:

Narrative Action Evaluation with Prompt-Guided Multimodal Interaction

Paper: https://arxiv.org/abs/2404.14471
Code: https://github.com/shiyi-zh0408/NAE_CVPR2024

OneLLM: One Framework to Align All Modalities with Language

Paper: https://arxiv.org/abs/2312.03700
Code: https://github.com/csuhan/OneLLM

One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models

Paper: https://arxiv.org/abs/2403.01849
Code: https://github.com/TreeLLi/APT

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Paper: https://arxiv.org/abs/2402.19479
Code: https://github.com/shikiw/OPERA

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper: https://arxiv.org/abs/2311.17911
Code: https://github.com/snap-research/Panda-70M

PixelLM: Pixel Reasoning with Large Multimodal Model

Paper: https://arxiv.org/abs/2312.02228
Code: https://github.com/MaverickRen/PixelLM

PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization

Paper: https://arxiv.org/abs/2404.09011
Code:

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Paper: https://arxiv.org/abs/2312.04302
Code: https://github.com/dvlab-research/Prompt-Highlighter

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Paper: https://arxiv.org/abs/2403.02781
Code: https://github.com/zhengli97/PromptKD

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper: https://arxiv.org/abs/2311.06783
Code: https://github.com/Q-Future/Q-Instruct

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Paper: https://arxiv.org/abs/2403.13263
Code: https://github.com/ivattyue/SC-Tune

SEED-Bench: Benchmarking Multimodal Large Language Models

Paper: https://arxiv.org/abs/2311.17092
Code: https://github.com/AILab-CVC/SEED-Bench

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

Paper: https://arxiv.org/abs/2404.01156
Code:

The Manga Whisperer: Automatically Generating Transcriptions for Comics

Paper: https://arxiv.org/abs/2401.10224
Code: https://github.com/ragavsachdeva/magi

UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

Paper: https://arxiv.org/abs/2403.12532
Code:

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper: https://arxiv.org/abs/2311.17982
Code: https://github.com/Vchitect/VBench

VideoChat: Chat-Centric Video Understanding

Paper: https://arxiv.org/abs/2305.06355
Code: https://github.com/OpenGVLab/Ask-Anything

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Paper: https://arxiv.org/abs/2312.00784
Code: https://github.com/mu-cai/ViP-LLaVA

ViTamin: Designing Scalable Vision Models in the Vision-language Era

Paper: https://arxiv.org/abs/2404.02132
Code: https://github.com/Beckschen/ViTamin

ViT-Lens: Towards Omni-modal Representations

Paper: https://github.com/TencentARC/ViT-Lens
Code: https://arxiv.org/abs/2308.10185

8.其他任务(Others)

AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

Paper: https://arxiv.org/abs/2401.17879
Code: https://github.com/jonasricker/aeroblade

Diff-BGM: A Diffusion Model for Video Background Music Generation

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

Paper: https://arxiv.org/abs/2310.11440
Code: https://github.com/evalcrafter/EvalCrafter

On the Content Bias in Fréchet Video Distance

Paper: https://arxiv.org/abs/2404.12391
Code: https://github.com/songweige/content-debiased-fvd

TexTile: A Differentiable Metric for Texture Tileability

Paper: https://arxiv.org/abs/2403.12961v1
Code: https://github.com/crp94/textile

持续更新~

参考

CVPR 2024 论文和开源项目合集(Papers with Code)

相关整理

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Awesome-CVPR2024-AIGC

Similar Open Source Tools

Awesome-CVPR2024-AIGC Screenshot

Awesome-CVPR2024-AIGC

github

: 251

LLMLanding Screenshot

LLMLanding

LLMLanding is a repository focused on practical implementation of large models, covering topics from theory to practice. It provides a structured learning path for training large models, including specific tasks like training 1B-scale models, exploring SFT, and working on specialized tasks such as code generation, NLP tasks, and domain-specific fine-tuning. The repository emphasizes a dual learning approach: quickly applying existing tools for immediate output benefits and delving into foundational concepts for long-term understanding. It offers detailed resources and pathways for in-depth learning based on individual preferences and goals, combining theory with practical application to avoid overwhelm and ensure sustained learning progress.

github

: 95

bedrock-book Screenshot

bedrock-book

This repository contains sample code for hands-on exercises related to the book 'Amazon Bedrock 生成AIアプリ開発入門'. It allows readers to easily access and copy the code. The repository also includes directories for each chapter's hands-on code, settings, and a 'requirements.txt' file listing necessary Python libraries. Updates and error fixes will be provided as needed. Users can report issues in the repository's 'Issues' section, and errata will be published on the SB Creative official website.

github

: 59

llm-resource Screenshot

llm-resource

llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.

github

: 309

Long-Novel-GPT Screenshot

Long-Novel-GPT

Long-Novel-GPT is a long novel generator based on large language models like GPT. It utilizes a hierarchical outline/chapter/text structure to maintain the coherence of long novels. It optimizes API calls cost through context management and continuously improves based on self or user feedback until reaching the set goal. The tool aims to continuously refine and build novel content based on user-provided initial ideas, ultimately generating long novels at the level of human writers.

github

: 396

Korea-Startups Screenshot

Korea-Startups

Korea-Startups is a repository containing a comprehensive list of major tech companies and startups in Korea. It covers a wide range of industries such as mobility, local community trading, food tech, interior design, fintech, AI, natural language processing, computer vision, robotics, legal tech, and more. The repository provides detailed information about each company's field, key services, and unique features, showcasing the diverse and innovative startup ecosystem in Korea.

github

: 109

awesome-chatgpt-zh Screenshot

awesome-chatgpt-zh

The Awesome ChatGPT Chinese Guide project aims to help Chinese users understand and use ChatGPT. It collects various free and paid ChatGPT resources, as well as methods to communicate more effectively with ChatGPT in Chinese. The repository contains a rich collection of ChatGPT tools, applications, and examples.

github

: 10.5k

Stable-Diffusion Screenshot

Stable-Diffusion

Stable Diffusion is a text-to-image AI model that can generate realistic images from a given text prompt. It is a powerful tool that can be used for a variety of creative and practical applications, such as generating concept art, creating illustrations, and designing products. Stable Diffusion is also a great tool for learning about AI and machine learning. This repository contains a collection of tutorials and resources on how to use Stable Diffusion.

github

: 2.4k

LLM-And-More Screenshot

LLM-And-More

LLM-And-More is a one-stop solution for training and applying large models, covering the entire process from data processing to model evaluation, from training to deployment, and from idea to service. In this project, users can easily train models through this project and generate the required product services with one click.

github

: 447

Video-ChatGPT Screenshot

Video-ChatGPT

github

: 1.3k

cloudflare-ai-web Screenshot

cloudflare-ai-web

Cloudflare-ai-web is a lightweight and easy-to-use tool that allows you to quickly deploy a multi-modal AI platform using Cloudflare Workers AI. It supports serverless deployment, password protection, and local storage of chat logs. With a size of only ~638 kB gzip, it is a great option for building AI-powered applications without the need for a dedicated server.

github

: 1.9k

Code-Review-GPT-Gitlab Screenshot

Code-Review-GPT-Gitlab

A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

github

: 452

meet-libai Screenshot

meet-libai

The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.

github

: 1.1k

spring-boot-init-template Screenshot

spring-boot-init-template

github

: 305

ai-paint-today-BE Screenshot

ai-paint-today-BE

AI Paint Today is an API server repository that allows users to record their emotions and daily experiences, and based on that, AI generates a beautiful picture diary of their day. The project includes features such as generating picture diaries from written entries, utilizing DALL-E 2 model for image generation, and deploying on AWS and Cloudflare. The project also follows specific conventions and collaboration strategies for development.

github

: 60

Awesome-Mind-Network Screenshot

Awesome-Mind-Network

github

: 197

For similar tasks

No tools available

For similar jobs

No tools available