Awesome-CVPR2024-AIGC
A Collection of Papers and Codes for CVPR2024 AIGC
Stars: 251
README:
A Collection of Papers and Codes for CVPR2024 AIGC
整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
Please feel free to star, fork or PR if helpful~
CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024
CVPR完整论文列表:https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
开会时间:2024年6月17日-6月21日
论文接收公布时间:2024年2月27日
【Contents】
- 1.图像生成(Image Generation/Image Synthesis)
- 2.图像编辑(Image Editing)
- 3.视频生成(Video Generation/Image Synthesis)
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Model)
- 8.其他多任务(Others)
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
- Paper: https://arxiv.org/abs/2403.10255
- Code:
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
- Paper: https://arxiv.org/abs/2404.00521
- Code:
- Paper: https://arxiv.org/abs/2311.15773
- Code:
- Paper: https://arxiv.org/abs/2312.15905
- Code:
- Paper:
- Code: https://github.com/haofengl/DragNoise
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
- Paper: https://arxiv.org/abs/2311.18822
- Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official
- Paper:
- Code:
- Paper:
- Code:
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- Paper: https://arxiv.org/abs/2403.06775
- Code:
- Paper:
- Code: https://github.com/aim-uofa/FreeCustom
- Paper: https://www.cs.jhu.edu/~alanlab/Pubs24/chen2024towards.pdf
- Code: https://github.com/MrGiovanni/DiffTumor
- Paper: https://arxiv.org/abs/2404.13353
- Code:
- Paper: https://arxiv.org/abs/2311.10329
- Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file
- Paper:
- Code: https://github.com/xiefan-guo/initno
- Paper: https://arxiv.org/abs/2401.01952
- Code:
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
- Paper:
- Code: https://github.com/PanchengZhao/LAKE-RED
- Paper: https://arxiv.org/abs/2312.07330
- Code: https://github.com/cvlab-stonybrook/Large-Image-Diffusion
- Paper: https://arxiv.org/abs/2402.08654
- Code: https://github.com/ttchengab/continuous_3d_words_code/
- Paper: https://arxiv.org/abs/2311.15841
- Code:
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
- Paper:
- Code: https://github.com/ewrfcas/LeftRefill
- Paper: https://arxiv.org/abs/2404.02883
- Code:
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
- Paper: https://arxiv.org/abs/2404.15081
- Code:
- Paper:
- Code: https://github.com/cszy98/PLACE
- Paper:
- Code: https://github.com/WUyinwei-hah/RRNet
- Paper: https://arxiv.org/abs/2401.09603
- Code: https://github.com/google-research/google-research/tree/master/cmmd
- Paper: https://arxiv.org/abs/2401.08053
- Code:
- Paper: https://arxiv.org/abs/2308.09972
- Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2
- Paper: https://arxiv.org/abs/2402.17563
- Code:
- Paper: https://arxiv.org/abs/2403.18978
- Code:
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- Paper: https://arxiv.org/abs/2404.00922
- Code:
- Paper:
- Code: https://github.com/Snowfallingplum/CSD-MT
- Paper: https://arxiv.org/abs/2311.18608
- Code: https://github.com/HyelinNAM/ContrastiveDenoisingScore
- Paper:
- Code: https://github.com/HansSunY/DiffAM
- Paper: https://arxiv.org/abs/2312.10113
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: https://arxiv.org/abs/2403.09632
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: hhttps://arxiv.org/abs/2312.04965
- Code: https://github.com/sled-group/InfEdit
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
- Paper: https://arxiv.org/abs/2303.17546
- Code: https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2403.00483
- Code:
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- Paper: https://arxiv.org/abs/2402.18848
- Code:
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
- Paper: https://arxiv.org/abs/2404.00234
- Code: https://github.com/taegyeong-lee/Grid-Diffusion-Models-for-Text-to-Video-Generation
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
- Paper: https://arxiv.org/abs/2311.17590
- Code:
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
- Paper:
- Code: https://github.com/zhangguiwei610/CAMEL
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
- Paper: https://paperswithcode.com/paper/diffusion-time-step-curriculum-for-one-image
- Code: https://github.com/yxymessi/DTC123
- Paper: https://arxiv.org/abs/2312.12274
- Code: https://github.com/Peter-Kocsis/IntrinsicImageDiffusion
- Paper: https://arxiv.org/abs/2402.05746
- Code: https://github.com/yifanlu0227/ChatSim?tab=readme-ov-file
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Paper: https://arxiv.org/abs/2403.07773
- Code: https://github.com/zoomin-lee/SemCity?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2312.09250
- Code: https://github.com/google-research/google-research/tree/master/mesh_diffusion
- Paper: https://cvlab.cse.msu.edu/pdfs/Ren_Kim_Liu_Liu_TIGER_supp.pdf
- Code: https://github.com/Zhiyuan-R/Tiger-Diffusion
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper: https://arxiv.org/abs/2312.02974
- Code: https://github.com/Understanding-Visual-Datasets/VisDiff
- Paper: https://arxiv.org/abs/2404.11207
- Code: https://github.com/zycheiheihei/transferable-visual-prompting
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
- Paper: https://arxiv.org/abs/2404.16123
- Code:
- Paper: https://arxiv.org/abs/2404.00909
- Code:
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- Paper: https://arxiv.org/abs/2403.07839
- Code:
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Paper: https://arxiv.org/abs/2404.09011
- Code:
- Paper: https://arxiv.org/abs/2404.01156
- Code:
- Paper: https://arxiv.org/abs/2403.12532
- Code:
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
持续更新~
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-CVPR2024-AIGC
Similar Open Source Tools
llm-resource
llm-resource is a comprehensive collection of high-quality resources for Large Language Models (LLM). It covers various aspects of LLM including algorithms, training, fine-tuning, alignment, inference, data engineering, compression, evaluation, prompt engineering, AI frameworks, AI basics, AI infrastructure, AI compilers, LLM application development, LLM operations, AI systems, and practical implementations. The repository aims to gather and share valuable resources related to LLM for the community to benefit from.
Korea-Startups
Korea-Startups is a repository containing a comprehensive list of major tech companies and startups in Korea. It covers a wide range of industries such as mobility, local community trading, food tech, interior design, fintech, AI, natural language processing, computer vision, robotics, legal tech, and more. The repository provides detailed information about each company's field, key services, and unique features, showcasing the diverse and innovative startup ecosystem in Korea.
duix.ai
Duix is a silicon-based digital human SDK for intelligent interaction, providing users with instant virtual human interaction experience on devices like Android and iOS. The SDK offers intuitive effect display and supports user customization through open documentation. It is fully open-source, allowing developers to understand its workings, optimize, and innovate further.
LLM-And-More
LLM-And-More is a one-stop solution for training and applying large models, covering the entire process from data processing to model evaluation, from training to deployment, and from idea to service. In this project, users can easily train models through this project and generate the required product services with one click.
cloudflare-ai-web
Cloudflare-ai-web is a lightweight and easy-to-use tool that allows you to quickly deploy a multi-modal AI platform using Cloudflare Workers AI. It supports serverless deployment, password protection, and local storage of chat logs. With a size of only ~638 kB gzip, it is a great option for building AI-powered applications without the need for a dedicated server.
AHU-AI-Repository
This repository is dedicated to the learning and exchange of resources for the School of Artificial Intelligence at Anhui University. Notes will be published on this website first: https://www.aoaoaoao.cn and will be synchronized to the repository regularly. You can also contact me at [email protected].
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
gpt_server
The GPT Server project leverages the basic capabilities of FastChat to provide the capabilities of an openai server. It perfectly adapts more models, optimizes models with poor compatibility in FastChat, and supports loading vllm, LMDeploy, and hf in various ways. It also supports all sentence_transformers compatible semantic vector models, including Chat templates with function roles, Function Calling (Tools) capability, and multi-modal large models. The project aims to reduce the difficulty of model adaptation and project usage, making it easier to deploy the latest models with minimal code changes.
how-to-optim-algorithm-in-cuda
This repository documents how to optimize common algorithms based on CUDA. It includes subdirectories with code implementations for specific optimizations. The optimizations cover topics such as compiling PyTorch from source, NVIDIA's reduce optimization, OneFlow's elementwise template, fast atomic add for half data types, upsample nearest2d optimization in OneFlow, optimized indexing in PyTorch, OneFlow's softmax kernel, linear attention optimization, and more. The repository also includes learning resources related to deep learning frameworks, compilers, and optimization techniques.
Awesome-ChatTTS
Awesome-ChatTTS is an official recommended guide for ChatTTS beginners, compiling common questions and related resources. It provides a comprehensive overview of the project, including official introduction, quick experience options, popular branches, parameter explanations, voice seed details, installation guides, FAQs, and error troubleshooting. The repository also includes video tutorials, discussion community links, and project trends analysis. Users can explore various branches for different functionalities and enhancements related to ChatTTS.
bytedesk
Bytedesk is an AI-powered customer service and team instant messaging tool that offers features like enterprise instant messaging, online customer service, large model AI assistant, and local area network file transfer. It supports multi-level organizational structure, role management, permission management, chat record management, seating workbench, work order system, seat management, data dashboard, manual knowledge base, skill group management, real-time monitoring, announcements, sensitive words, CRM, report function, and integrated customer service workbench services. The tool is designed for team use with easy configuration throughout the company, and it allows file transfer across platforms using WiFi/hotspots without the need for internet connection.
SakuraLLM
SakuraLLM is a project focused on building large language models for Japanese to Chinese translation in the light novel and galgame domain. The models are based on open-source large models and are pre-trained and fine-tuned on general Japanese corpora and specific domains. The project aims to provide high-performance language models for galgame/light novel translation that are comparable to GPT3.5 and can be used offline. It also offers an API backend for running the models, compatible with the OpenAI API format. The project is experimental, with version 0.9 showing improvements in style, fluency, and accuracy over GPT-3.5.
CVPR2024-Papers-with-Code-Demo
This repository contains a collection of papers and code for the CVPR 2024 conference. The papers cover a wide range of topics in computer vision, including object detection, image segmentation, image generation, and video analysis. The code provides implementations of the algorithms described in the papers, making it easy for researchers and practitioners to reproduce the results and build upon the work of others. The repository is maintained by a team of researchers at the University of California, Berkeley.
SecReport
SecReport is a platform for collaborative information security penetration testing report writing and exporting, powered by ChatGPT. It standardizes penetration testing processes, allows multiple users to edit reports, offers custom export templates, generates vulnerability summaries and fix suggestions using ChatGPT, and provides APP security compliance testing reports. The tool aims to streamline the process of creating and managing security reports for penetration testing and compliance purposes.