Awesome-CVPR2024-AIGC
A Collection of Papers and Codes for CVPR2024 AIGC
Stars: 251
README:
A Collection of Papers and Codes for CVPR2024 AIGC
整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
Please feel free to star, fork or PR if helpful~
CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024
CVPR完整论文列表:https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
开会时间:2024年6月17日-6月21日
论文接收公布时间:2024年2月27日
【Contents】
- 1.图像生成(Image Generation/Image Synthesis)
- 2.图像编辑(Image Editing)
- 3.视频生成(Video Generation/Image Synthesis)
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Model)
- 8.其他多任务(Others)
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
- Paper: https://arxiv.org/abs/2403.10255
- Code:
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
- Paper: https://arxiv.org/abs/2404.00521
- Code:
- Paper: https://arxiv.org/abs/2311.15773
- Code:
- Paper: https://arxiv.org/abs/2312.15905
- Code:
- Paper:
- Code: https://github.com/haofengl/DragNoise
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
- Paper: https://arxiv.org/abs/2311.18822
- Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official
- Paper:
- Code:
- Paper:
- Code:
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- Paper: https://arxiv.org/abs/2403.06775
- Code:
- Paper:
- Code: https://github.com/aim-uofa/FreeCustom
- Paper: https://www.cs.jhu.edu/~alanlab/Pubs24/chen2024towards.pdf
- Code: https://github.com/MrGiovanni/DiffTumor
- Paper: https://arxiv.org/abs/2404.13353
- Code:
- Paper: https://arxiv.org/abs/2311.10329
- Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file
- Paper:
- Code: https://github.com/xiefan-guo/initno
- Paper: https://arxiv.org/abs/2401.01952
- Code:
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
- Paper:
- Code: https://github.com/PanchengZhao/LAKE-RED
- Paper: https://arxiv.org/abs/2312.07330
- Code: https://github.com/cvlab-stonybrook/Large-Image-Diffusion
- Paper: https://arxiv.org/abs/2402.08654
- Code: https://github.com/ttchengab/continuous_3d_words_code/
- Paper: https://arxiv.org/abs/2311.15841
- Code:
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
- Paper:
- Code: https://github.com/ewrfcas/LeftRefill
- Paper: https://arxiv.org/abs/2404.02883
- Code:
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
- Paper: https://arxiv.org/abs/2404.15081
- Code:
- Paper:
- Code: https://github.com/cszy98/PLACE
- Paper:
- Code: https://github.com/WUyinwei-hah/RRNet
- Paper: https://arxiv.org/abs/2401.09603
- Code: https://github.com/google-research/google-research/tree/master/cmmd
- Paper: https://arxiv.org/abs/2401.08053
- Code:
- Paper: https://arxiv.org/abs/2308.09972
- Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2
- Paper: https://arxiv.org/abs/2402.17563
- Code:
- Paper: https://arxiv.org/abs/2403.18978
- Code:
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- Paper: https://arxiv.org/abs/2404.00922
- Code:
- Paper:
- Code: https://github.com/Snowfallingplum/CSD-MT
- Paper: https://arxiv.org/abs/2311.18608
- Code: https://github.com/HyelinNAM/ContrastiveDenoisingScore
- Paper:
- Code: https://github.com/HansSunY/DiffAM
- Paper: https://arxiv.org/abs/2312.10113
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: https://arxiv.org/abs/2403.09632
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: hhttps://arxiv.org/abs/2312.04965
- Code: https://github.com/sled-group/InfEdit
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
- Paper: https://arxiv.org/abs/2303.17546
- Code: https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2403.00483
- Code:
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- Paper: https://arxiv.org/abs/2402.18848
- Code:
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
- Paper: https://arxiv.org/abs/2404.00234
- Code: https://github.com/taegyeong-lee/Grid-Diffusion-Models-for-Text-to-Video-Generation
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
- Paper: https://arxiv.org/abs/2311.17590
- Code:
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
- Paper:
- Code: https://github.com/zhangguiwei610/CAMEL
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
- Paper: https://paperswithcode.com/paper/diffusion-time-step-curriculum-for-one-image
- Code: https://github.com/yxymessi/DTC123
- Paper: https://arxiv.org/abs/2312.12274
- Code: https://github.com/Peter-Kocsis/IntrinsicImageDiffusion
- Paper: https://arxiv.org/abs/2402.05746
- Code: https://github.com/yifanlu0227/ChatSim?tab=readme-ov-file
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Paper: https://arxiv.org/abs/2403.07773
- Code: https://github.com/zoomin-lee/SemCity?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2312.09250
- Code: https://github.com/google-research/google-research/tree/master/mesh_diffusion
- Paper: https://cvlab.cse.msu.edu/pdfs/Ren_Kim_Liu_Liu_TIGER_supp.pdf
- Code: https://github.com/Zhiyuan-R/Tiger-Diffusion
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper: https://arxiv.org/abs/2312.02974
- Code: https://github.com/Understanding-Visual-Datasets/VisDiff
- Paper: https://arxiv.org/abs/2404.11207
- Code: https://github.com/zycheiheihei/transferable-visual-prompting
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
- Paper: https://arxiv.org/abs/2404.16123
- Code:
- Paper: https://arxiv.org/abs/2404.00909
- Code:
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- Paper: https://arxiv.org/abs/2403.07839
- Code:
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Paper: https://arxiv.org/abs/2404.09011
- Code:
- Paper: https://arxiv.org/abs/2404.01156
- Code:
- Paper: https://arxiv.org/abs/2403.12532
- Code:
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
持续更新~
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-CVPR2024-AIGC
Similar Open Source Tools
LLMLanding
LLMLanding is a repository focused on practical implementation of large models, covering topics from theory to practice. It provides a structured learning path for training large models, including specific tasks like training 1B-scale models, exploring SFT, and working on specialized tasks such as code generation, NLP tasks, and domain-specific fine-tuning. The repository emphasizes a dual learning approach: quickly applying existing tools for immediate output benefits and delving into foundational concepts for long-term understanding. It offers detailed resources and pathways for in-depth learning based on individual preferences and goals, combining theory with practical application to avoid overwhelm and ensure sustained learning progress.
bedrock-book
This repository contains sample code for hands-on exercises related to the book 'Amazon Bedrock 生成AIアプリ開発入門'. It allows readers to easily access and copy the code. The repository also includes directories for each chapter's hands-on code, settings, and a 'requirements.txt' file listing necessary Python libraries. Updates and error fixes will be provided as needed. Users can report issues in the repository's 'Issues' section, and errata will be published on the SB Creative official website.
llm-resource
llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.
Long-Novel-GPT
Long-Novel-GPT is a long novel generator based on large language models like GPT. It utilizes a hierarchical outline/chapter/text structure to maintain the coherence of long novels. It optimizes API calls cost through context management and continuously improves based on self or user feedback until reaching the set goal. The tool aims to continuously refine and build novel content based on user-provided initial ideas, ultimately generating long novels at the level of human writers.
Korea-Startups
Korea-Startups is a repository containing a comprehensive list of major tech companies and startups in Korea. It covers a wide range of industries such as mobility, local community trading, food tech, interior design, fintech, AI, natural language processing, computer vision, robotics, legal tech, and more. The repository provides detailed information about each company's field, key services, and unique features, showcasing the diverse and innovative startup ecosystem in Korea.
awesome-chatgpt-zh
The Awesome ChatGPT Chinese Guide project aims to help Chinese users understand and use ChatGPT. It collects various free and paid ChatGPT resources, as well as methods to communicate more effectively with ChatGPT in Chinese. The repository contains a rich collection of ChatGPT tools, applications, and examples.
duix.ai
Duix is a silicon-based digital human SDK for intelligent interaction, providing users with instant virtual human interaction experience on devices like Android and iOS. The SDK offers intuitive effect display and supports user customization through open documentation. It is fully open-source, allowing developers to understand its workings, optimize, and innovate further.
LLM-And-More
LLM-And-More is a one-stop solution for training and applying large models, covering the entire process from data processing to model evaluation, from training to deployment, and from idea to service. In this project, users can easily train models through this project and generate the required product services with one click.
cloudflare-ai-web
Cloudflare-ai-web is a lightweight and easy-to-use tool that allows you to quickly deploy a multi-modal AI platform using Cloudflare Workers AI. It supports serverless deployment, password protection, and local storage of chat logs. With a size of only ~638 kB gzip, it is a great option for building AI-powered applications without the need for a dedicated server.
ai-paint-today-BE
AI Paint Today is an API server repository that allows users to record their emotions and daily experiences, and based on that, AI generates a beautiful picture diary of their day. The project includes features such as generating picture diaries from written entries, utilizing DALL-E 2 model for image generation, and deploying on AWS and Cloudflare. The project also follows specific conventions and collaboration strategies for development.
AHU-AI-Repository
This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].
rime_wanxiang_pro
Rime Wanxiang Pro is an enhanced version of Wanxiang, supporting the 9, 14, and 18-key layouts. It features a pinyin library with optimized word and language models, supporting accurate sentence output with tones. The tool also allows for mixed Chinese and English input, offering various usage scenarios. Users can customize their input method by selecting different decoding and auxiliary code rules, enabling flexible combinations of pinyin and auxiliary codes. The tool simplifies the complex configuration of Rime and provides a unified word library for multiple input methods, enhancing input efficiency and user experience.
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
Tegridy-MIDI-Dataset
Tegridy MIDI Dataset is an ultimate multi-instrumental MIDI dataset designed for Music Information Retrieval (MIR) and Music AI purposes. It provides a comprehensive collection of MIDI datasets and essential software tools for MIDI editing, rendering, transcription, search, classification, comparison, and various other MIDI applications.