CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
Stars: 1166
This repository contains a collection of papers and code for the CVPR 2024 conference. The papers cover a wide range of topics in computer vision, including object detection, image segmentation, image generation, and video analysis. The code provides implementations of the algorithms described in the papers, making it easy for researchers and practitioners to reproduce the results and build upon the work of others. The repository is maintained by a team of researchers at the University of California, Berkeley.
README:
☪️添加微信: nvshenj125, 备注方向,进交流学习群
欢迎关注公众号:AI算法与图像处理
🌟 CVPR 2024 持续更新最新论文/paper和相应的开源代码/code!
B站demo:https://space.bilibili.com/288489574
✋ 注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目
往年顶会论文汇总:
CVPR 2024 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
目录(右侧点击可折叠)
- Backbone
- 数据集/Dataset
- Diffusion Model
- Text-to-Image
- NAS
- NeRF
- Knowledge Distillation
- 多模态 / Multimodal
- 对比学习/Contrastive Learning
- 图神经网络 / Graph Neural Networks
- 胶囊网络 / Capsule Network
- 图像分类 / Image Classification
- 目标检测/Object Detection
- 目标跟踪/Object Tracking
- 轨迹预测/Trajectory Prediction
- 语义分割/Segmentation
- 弱监督语义分割/Weakly Supervised Semantic Segmentation
- 医学图像分割
- 视频目标分割/Video Object Segmentation
- 交互式视频目标分割/Interactive Video Object Segmentation
- Visual Transformer
- 深度估计/Depth Estimation
- 人脸识别/Face Recognition
- 人脸检测/Face Detection
- 人脸活体检测/Face Anti-Spoofing
- 人脸年龄估计/Age Estimation
- 人脸表情识别/Facial Expression Recognition
- 人脸属性识别/Facial Attribute Recognition
- 人脸编辑/Facial Editing
- 人脸重建/Face Reconstruction
- Talking Face
- 换脸/Face Swap
- 姿态估计/Pose Estimation
- 手势姿态估计(重建)/Hand Pose Estimation( Hand Mesh Recovery)
- 视频动作检测/Video Action Detection
- 手语翻译/Sign Language Translation
- 3D人体重建
- 行人重识别/Person Re-identification
- 行人搜索/Person Search
- 人群计数 / Crowd Counting
- GAN
- 彩妆迁移 / Color-Pattern Makeup Transfer
- 字体生成 / Font Generation
- 场景文本检测、识别/Scene Text Detection/Recognition
- 图像、视频检索 / Image Retrieval/Video retrieval
- Image Animation
- 抠图/Image Matting
- 超分辨率/Super Resolution
- 图像复原/Image Restoration
- 图像补全/Image Inpainting
- 图像去噪/Image Denoising
- 图像编辑/Image Editing
- 图像拼接/Image stitching
- 图像匹配/Image Matching
- 图像融合/Image Blending
- 图像去雾/Image Dehazing
- 图像去模糊/Image Deblur
- 图像压缩/Image Compression
- 反光去除/Reflection Removal
- 车道线检测/Lane Detection
- 自动驾驶 / Autonomous Driving
- 流体重建/Fluid Reconstruction
- 场景重建 / Scene Reconstruction
- 3D Reconstruction
- 视频插帧/Frame Interpolation
- 视频超分 / Video Super-Resolution
- 3D点云/3D point cloud
- 标签噪声 / Label-Noise
- 对抗样本/Adversarial Examples
- Anomaly Detection
- 其他/Other
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
- 论文/Paper: http://arxiv.org/pdf/2403.02640
- 代码/Code: None
Traffic Scene Parsing through the TSP6K Dataset
- 论文/Paper: https://arxiv.org/pdf/2303.02835.pdf
- 代码/Code: https://github.com/PengtaoJiang/TSP6K
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2402.18206
- 代码/Code: None
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2402.19481
- 代码/Code: https://github.com/mit-han-lab/distrifuser
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
- 论文/Paper: http://arxiv.org/pdf/2402.19302
- 代码/Code: https://github.com/iit-pavis/diffassemble
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
- 论文/Paper: http://arxiv.org/pdf/2403.00644
- 代码/Code: None
Few-shot Learner Parameterization by Diffusion Time-steps
- 论文/Paper: http://arxiv.org/pdf/2403.02649
- 代码/Code: https://github.com/yue-zhongqi/tif
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
- 论文/Paper: http://arxiv.org/pdf/2403.04290
- 代码/Code: None
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
- 论文/Paper: https://arxiv.org/abs/2403.06951
- 代码/Code: https://github.com/Tianhao-Qi/DEADiff_code
Face2Diffusion for Fast and Editable Face Personalization
- 论文/Paper: http://arxiv.org/pdf/2403.05094
- 代码/Code: https://github.com/mapooon/Face2Diffusion
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
- 论文/Paper: http://arxiv.org/pdf/2403.06951
- 代码/Code: None
MACE: Mass Concept Erasure in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2403.06135
- 代码/Code: https://github.com/Shilin-LU/MACE
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
- 论文/Paper: http://arxiv.org/pdf/2403.07234
- 代码/Code: https://github.com/subhadeepkoley/demosketch2rgb
SemCity: Semantic Scene Generation with Triplane Diffusion
- 论文/Paper: http://arxiv.org/pdf/2403.07773
- 代码/Code: https://github.com/zoomin-lee/semcity
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
- 论文/Paper: http://arxiv.org/pdf/2403.00483
- 代码/Code: None
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
- 论文/Paper: http://arxiv.org/pdf/2403.03485
- 代码/Code: https://github.com/univ-esuty/noisecollage
Discriminative Probing and Tuning for Text-to-Image Generation
- 论文/Paper: http://arxiv.org/pdf/2403.04321
- 代码/Code: None
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- 论文/Paper: http://arxiv.org/pdf/2403.05239
- 代码/Code: None
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
- 论文/Paper: http://arxiv.org/pdf/2403.06452
- 代码/Code: https://github.com/mulns/Text2QR
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
- 论文/Paper: http://arxiv.org/pdf/2403.07214
- 代码/Code: None
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
- 论文/Paper: http://arxiv.org/pdf/2403.03608
- 代码/Code: None
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
- 论文/Paper: http://arxiv.org/pdf/2403.06912
- 代码/Code: https://github.com/fictionarry/dngaussian
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.06205
- 代码/Code: None
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
- 论文/Paper: http://arxiv.org/pdf/2403.02781
- 代码/Code: https://github.com/zhengli97/PromptKD
Logit Standardization in Knowledge Distillation
- 论文/Paper: https://arxiv.org/abs/2403.01427
- 代码/Code: https://github.com/sunshangquan/logit-standardization-KD
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
- 论文/Paper: http://arxiv.org/pdf/2403.05061
- 代码/Code: None
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
- 论文/Paper: http://arxiv.org/pdf/2403.06213
- 代码/Code: https://github.com/roymiles/vkd
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
- 论文/Paper: https://arxiv.org/abs/2312.07472
- 代码/Code: https://github.com/IranQin/MP5
- 主页/Website:https://iranqin.github.io/MP5.github.io/
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
- 论文/Paper: http://arxiv.org/pdf/2402.18091
- 代码/Code: None
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
- 论文/Paper: http://arxiv.org/pdf/2403.02991
- 代码/Code: None
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- 论文/Paper: http://arxiv.org/pdf/2403.07839
- 代码/Code: None
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
- 论文/Paper: http://arxiv.org/pdf/2403.07636
- 代码/Code: https://github.com/hieuphan33/mavl
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
- 论文/Paper: http://arxiv.org/pdf/2403.07241
- 代码/Code: None
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2403.06122
- 代码/Code: https://github.com/root0yang/blindnet
UniMODE: Unified Monocular 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2402.18573
- 代码/Code: None
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
- 论文/Paper: http://arxiv.org/pdf/2403.04198
- 代码/Code: https://github.com/SerCharles/CN-RMA
Memory-based Adapters for Online 3D Scene Perception
- 论文/Paper: https://arxiv.org/abs/2403.06974
- 代码/Code:https://github.com/xuxw98/Online3D
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
-
论文/Paper: https://arxiv.org/abs/2403.16131
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
- 论文/Paper: http://arxiv.org/pdf/2403.06093
- 代码/Code: https://github.com/nullmax-vision/QAF2D
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
- 论文/Paper: http://arxiv.org/pdf/2403.05817
- 代码/Code: https://github.com/zhanggang001/hednet
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
- 论文/Paper: http://arxiv.org/pdf/2403.02767
- 代码/Code: None
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
- 论文/Paper: http://arxiv.org/pdf/2403.04700
- 代码/Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
- 论文/Paper: http://arxiv.org/pdf/2402.19422
- 代码/Code: https://github.com/niccolocavagnero/pem
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06462
- 代码/Code: https://github.com/Gavinwxy/DDFP
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06247
- 代码/Code: None
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
- 论文/Paper: http://arxiv.org/pdf/2402.18933
- 代码/Code: None
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.04258
- 代码/Code: None
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
- 论文/Paper: http://arxiv.org/pdf/2403.05419
- 代码/Code: https://github.com/techmn/satmae_pp
Representations for Recognition and Retrieval
- 论文/Paper: https://arxiv.org/pdf/2403.07535.pdf
- 代码/Code: https://github.com/Junda24/AFNet
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.00272
- 代码/Code: None
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
- 论文/Paper: http://arxiv.org/pdf/2403.05105
- 代码/Code: https://github.com/hhc1997/L2RM
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
- 论文/Paper: http://arxiv.org/pdf/2403.07203
- 代码/Code: None
SeD: Semantic-Aware Discriminator for Image Super-Resolution
- 论文/Paper: http://arxiv.org/pdf/2402.19387
- 代码/Code: None
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
- 论文/Paper: http://arxiv.org/pdf/2402.19215
- 代码/Code: https://github.com/mandalinadagi/wgsr
CAMixerSR: Only Details Need More "Attention"
- 论文/Paper: http://arxiv.org/pdf/2402.19289
- 代码/Code: https://github.com/icandle/camixersr
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
- 论文/Paper: http://arxiv.org/pdf/2403.02601
- 代码/Code: None
Boosting Image Restoration via Priors from Pre-trained Models
- 论文/Paper: http://arxiv.org/pdf/2403.06793
- 代码/Code: None
Doubly Abductive Counterfactual Inference for Text-based Image Editing
- 论文/Paper: http://arxiv.org/pdf/2403.02981
- 代码/Code: https://github.com/xuesong39/DAC
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
- 论文/Paper: http://arxiv.org/pdf/2403.02611
- 代码/Code: https://github.com/PieceZhang/MPT-CataBlur
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
- 论文/Paper: http://arxiv.org/pdf/2403.00436
- 代码/Code: None
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
- 论文/Paper: http://arxiv.org/pdf/2403.07535
- 代码/Code: website:https://github.com/Junda24/AFNet/
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
- 论文/Paper: http://arxiv.org/pdf/2402.19298
- 代码/Code: https://github.com/omggggg/mmdg
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.03221
- 代码/Code: None
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.04381
- 代码/Code: https://github.com/MickeyLLG/S2DHand
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
- 论文/Paper: https://arxiv.org/pdf/2311.12028.pdf
- 代码/Code: https://github.com/NationalGAILab/HoT
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets
- 论文/Paper: http://arxiv.org/pdf/2403.05086
- 代码/Code: https://github.com/Youngju-Na/UFORecon
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2403.05005
- 代码/Code: None
Memory-based Adapters for Online 3D Scene Perception
- 论文/Paper: http://arxiv.org/pdf/2403.06974
- 代码/Code: None
Bayesian Diffusion Models for 3D Shape Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2403.06973
- 代码/Code: None
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.00592
- 代码/Code: https://github.com/ZhaochongAn/COSeg
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
- 论文/Paper: http://arxiv.org/pdf/2403.03532
- 代码/Code: https://github.com/liuquan98/eyoc
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
- 论文/Paper: http://arxiv.org/pdf/2403.05247
- 代码/Code: https://github.com/TRLou/HiT-ADV
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
- 论文/Paper: http://arxiv.org/pdf/2403.06495
- 代码/Code: https://github.com/mala-lab/inctrl
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
- 论文/Paper: http://arxiv.org/pdf/2403.05897
- 代码/Code: https://github.com/cnulab/realnet
DisCo: Disentangled Control for Realistic Human Dance Generation
- 论文/Paper: https://arxiv.org/abs/2307.00040
- 代码/Code: https://github.com/Wangt-CN/DisCo
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
- 论文/Paper: http://arxiv.org/pdf/2402.18528
- 代码/Code: None
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
- 论文/Paper: http://arxiv.org/pdf/2402.18490
- 代码/Code: None
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
- 论文/Paper: http://arxiv.org/pdf/2402.18330
- 代码/Code: https://github.com/tho-kn/egotap
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
- 论文/Paper: http://arxiv.org/pdf/2402.18277
- 代码/Code: None
Misalignment-Robust Frequency Distribution Loss for Image Transformation
- 论文/Paper: http://arxiv.org/pdf/2402.18192
- 代码/Code: https://github.com/eezkni/FDL
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
- 论文/Paper: http://arxiv.org/pdf/2402.18146
- 代码/Code: https://github.com/jiangchaokang/3dsflabelling
OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction
- 论文/Paper: http://arxiv.org/pdf/2402.18140
- 代码/Code: None
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
- 论文/Paper: http://arxiv.org/pdf/2402.18115
- 代码/Code: https://github.com/minghanli/univs
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
- 论文/Paper: http://arxiv.org/pdf/2402.18078
- 代码/Code: https://github.com/YanzuoLu/CFLD
Boosting Neural Representations for Videos with a Conditional Decoder
- 论文/Paper: http://arxiv.org/pdf/2402.18152
- 代码/Code: None
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
- 论文/Paper: http://arxiv.org/pdf/2402.18133
- 代码/Code: None
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
- 论文/Paper: http://arxiv.org/pdf/2402.17951
- 代码/Code: None
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
- 论文/Paper: http://arxiv.org/pdf/2402.19479
- 代码/Code: None
SeMoLi: What Moves Together Belongs Together
- 论文/Paper: http://arxiv.org/pdf/2402.19463
- 代码/Code: None
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
- 论文/Paper: http://arxiv.org/pdf/2402.19326
- 代码/Code: None
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
- 论文/Paper: http://arxiv.org/pdf/2402.19231
- 代码/Code: https://github.com/lu-feng/cricavpr
MemoNav: Working Memory Model for Visual Navigation
- 论文/Paper: http://arxiv.org/pdf/2402.19161
- 代码/Code: None
VideoMAC: Video Masked Autoencoders Meet ConvNets
- 论文/Paper: http://arxiv.org/pdf/2402.19082
- 代码/Code: https://github.com/nust-machine-intelligence-laboratory/videomac
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
- 论文/Paper: http://arxiv.org/pdf/2402.18975
- 代码/Code: https://github.com/Jittor/JDet
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
- 论文/Paper: http://arxiv.org/pdf/2402.18969
- 代码/Code: None
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
- 论文/Paper: http://arxiv.org/pdf/2402.18956
- 代码/Code: None
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
- 论文/Paper: http://arxiv.org/pdf/2402.18920
- 代码/Code: None
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- 论文/Paper: http://arxiv.org/pdf/2402.18848
- 代码/Code: None
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
- 论文/Paper: http://arxiv.org/pdf/2402.18842
- 代码/Code: None
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
- 论文/Paper: http://arxiv.org/pdf/2402.18786
- 代码/Code: None
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
- 论文/Paper: http://arxiv.org/pdf/2402.18771
- 代码/Code: None
Towards Generalizable Tumor Synthesis
- 论文/Paper: http://arxiv.org/pdf/2402.19470
- 代码/Code: None
Rethinking Multi-domain Generalization with A General Learning Objective
- 论文/Paper: http://arxiv.org/pdf/2402.18853
- 代码/Code: None
Rethinking Inductive Biases for Surface Normal Estimation
- 论文/Paper: http://arxiv.org/pdf/2403.00712
- 代码/Code: https://github.com/baegwangbin/DSINE
SURE: SUrvey REcipes for building reliable and robust deep networks
- 论文/Paper: http://arxiv.org/pdf/2403.00543
- 代码/Code: https://github.com/YutingLi0606/SURE
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
- 论文/Paper: http://arxiv.org/pdf/2403.00486
- 代码/Code: https://github.com/Windsrain/Selective-Stereo.
Deformable One-shot Face Stylization via DINO Semantic Guidance
- 论文/Paper: http://arxiv.org/pdf/2403.00459
- 代码/Code: https://github.com/zichongc/DoesFS
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
- 论文/Paper: http://arxiv.org/pdf/2403.00274
- 代码/Code: None
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None
Learning Group Activity Features Through Person Attribute Prediction
- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.
Interactive Continual Learning: Fast and Slow Thinking
- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
- 论文/Paper: http://arxiv.org/pdf/2403.03122
- 代码/Code: None
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
- 论文/Paper: http://arxiv.org/pdf/2403.02782
- 代码/Code: None
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
- 论文/Paper: http://arxiv.org/pdf/2403.02769
- 代码/Code: None
Learning Group Activity Features Through Person Attribute Prediction
- 论文/Paper: http://arxiv.org/pdf/2403.02753
- 代码/Code: https://github.com/chihina/GAFL-CVPR2024.
Interactive Continual Learning: Fast and Slow Thinking
- 论文/Paper: http://arxiv.org/pdf/2403.02628
- 代码/Code: None
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
- 论文/Paper: http://arxiv.org/pdf/2403.03890
- 代码/Code: None
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
- 论文/Paper: http://arxiv.org/pdf/2403.03896
- 代码/Code: None
MeaCap: Memory-Augmented Zero-shot Image Captioning
- 论文/Paper: http://arxiv.org/pdf/2403.03715
- 代码/Code: https://github.com/joeyz0z/MeaCap
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
- 论文/Paper: http://arxiv.org/pdf/2403.03561
- 代码/Code: None
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
- 论文/Paper: http://arxiv.org/pdf/2403.03477
- 代码/Code: https://github.com/jordangong/CoMasTRe
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
- 论文/Paper: http://arxiv.org/pdf/2403.03447
- 代码/Code: None
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
- 论文/Paper: http://arxiv.org/pdf/2403.03421
- 代码/Code: https://github.com/ispc-lab/lead
F$^3$Loc: Fusion and Filtering for Floorplan Localization
- 论文/Paper: http://arxiv.org/pdf/2403.03370
- 代码/Code: None
Enhancing Vision-Language Pre-training with Rich Supervisions
- 论文/Paper: http://arxiv.org/pdf/2403.03346
- 代码/Code: None
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
- 论文/Paper: http://arxiv.org/pdf/2403.04765
- 代码/Code: None
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
- 论文/Paper: http://arxiv.org/pdf/2403.04492
- 代码/Code: https://github.com/rashindrie/dipa
Learning to Remove Wrinkled Transparent Film with Polarized Prior
- 论文/Paper: http://arxiv.org/pdf/2403.04368
- 代码/Code: https://github.com/jqtangust/filmremoval
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
- 论文/Paper: http://arxiv.org/pdf/2403.04303
- 代码/Code: None
Active Generalized Category Discovery
- 论文/Paper: http://arxiv.org/pdf/2403.04272
- 代码/Code: https://github.com/mashijie1028/activegcd
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
- 论文/Paper: http://arxiv.org/pdf/2403.04149
- 代码/Code: https://github.com/ispc-lab/map
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
- 论文/Paper: http://arxiv.org/pdf/2403.04245
- 代码/Code: https://github.com/dalision/modalbiasavsr
Seamless Human Motion Composition with Blended Positional Encodings
- 论文/Paper: https://arxiv.org/abs/2402.15509
- 代码/Code:https://github.com/BarqueroGerman/FlowMDM
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
- 论文/Paper: http://arxiv.org/pdf/2403.05087
- 代码/Code: https://github.com/initialneil/SplattingAvatar
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
- 论文/Paper: http://arxiv.org/pdf/2403.06946
- 代码/Code: https://github.com/tl-uestc/unimos
Real-Time Simulated Avatar from Head-Mounted Sensors
- 论文/Paper: http://arxiv.org/pdf/2403.06862
- 代码/Code: None
DiaLoc: An Iterative Approach to Embodied Dialog Localization
- 论文/Paper: http://arxiv.org/pdf/2403.06846
- 代码/Code: None
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- 论文/Paper: http://arxiv.org/pdf/2403.06775
- 代码/Code: https://github.com/modelscope/facechain
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
- 论文/Paper: http://arxiv.org/pdf/2403.06758
- 代码/Code: https://github.com/gmberton/earthloc
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
- 论文/Paper: http://arxiv.org/pdf/2403.06676
- 代码/Code: https://github.com/snskysk/cam-back-again
Distributionally Generative Augmentation for Fair Facial Attribute Classification
- 论文/Paper: http://arxiv.org/pdf/2403.06606
- 代码/Code: https://github.com/heqianpei/diga
Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection
- 论文/Paper: http://arxiv.org/pdf/2403.06592
- 代码/Code: None
MoST: Motion Style Transformer between Diverse Action Contents
- 论文/Paper: http://arxiv.org/pdf/2403.06225
- 代码/Code: https://github.com/Boeun-Kim/MoST.
Coherent Temporal Synthesis for Incremental Action Segmentation
- 论文/Paper: http://arxiv.org/pdf/2403.06102
- 代码/Code: None
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
- 论文/Paper: http://arxiv.org/pdf/2403.06092
- 代码/Code: None
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
- 论文/Paper: http://arxiv.org/pdf/2403.05854
- 代码/Code: None
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
- 论文/Paper: http://arxiv.org/pdf/2403.06668
- 代码/Code: None
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
- 论文/Paper: http://arxiv.org/pdf/2403.03170
- 代码/Code: None
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
- 论文/Paper: https://arxiv.org/abs/2403.17749
- 代码/Code: https://github.com/YuqiYang213/MLoRE
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
- 论文/Paper: http://arxiv.org/pdf/2403.07874
- 代码/Code: https://github.com/zh460045050/v2l-tokenizer
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
- 论文/Paper: http://arxiv.org/pdf/2403.07719
- 代码/Code: https://github.com/wonderlandxd/wikg
Robust Synthetic-to-Real Transfer for Stereo Matching
- 论文/Paper: http://arxiv.org/pdf/2403.07705
- 代码/Code: https://github.com/jiaw-z/dkt-stereo
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
- 论文/Paper: http://arxiv.org/pdf/2403.07700
- 代码/Code: https://github.com/shahaf-arica/cuvler
Masked AutoDecoder is Effective Multi-Task Vision Generalist
- 论文/Paper: http://arxiv.org/pdf/2403.07692
- 代码/Code: https://github.com/hanqiu-hq/mad
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
- 论文/Paper: http://arxiv.org/pdf/2403.07589
- 代码/Code: None
Unleashing Network Potentials for Semantic Scene Completion
- 论文/Paper: http://arxiv.org/pdf/2403.07560
- 代码/Code: https://github.com/fereenwong/ammnet
Open-World Semantic Segmentation Including Class Similarity
- 论文/Paper: http://arxiv.org/pdf/2403.07532
- 代码/Code: https://github.com/PRBonn/ContMAV
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
- 论文/Paper: http://arxiv.org/pdf/2403.07392
- 代码/Code: https://github.com/Traffic-X/ViT-CoMer
FSC: Few-point Shape Completion
- 论文/Paper: http://arxiv.org/pdf/2403.07359
- 代码/Code: None
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
- 论文/Paper: http://arxiv.org/pdf/2403.07347
- 代码/Code: https://github.com/jiafei127/fd4mm
A Bayesian Approach to OOD Robustness in Image Classification
- 论文/Paper: http://arxiv.org/pdf/2403.07277
- 代码/Code: None
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for CVPR2024-Papers-with-Code-Demo
Similar Open Source Tools
CVPR2024-Papers-with-Code-Demo
This repository contains a collection of papers and code for the CVPR 2024 conference. The papers cover a wide range of topics in computer vision, including object detection, image segmentation, image generation, and video analysis. The code provides implementations of the algorithms described in the papers, making it easy for researchers and practitioners to reproduce the results and build upon the work of others. The repository is maintained by a team of researchers at the University of California, Berkeley.
Anim
Anim v0.1.0 is an animation tool that allows users to convert videos to animations using mixamorig characters. It features FK animation editing, object selection, embedded Python support (only on Windows), and the ability to export to glTF and FBX formats. Users can also utilize Mediapipe to create animations. The tool is designed to assist users in creating animations with ease and flexibility.
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
omnia
Omnia is a deployment tool designed to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters. It provides an Ansible playbook-based deployment for Slurm and Kubernetes on servers running an RPM-based Linux OS. The tool simplifies the process of setting up and managing clusters, making it easier for users to deploy and maintain their infrastructure.
agenta
Agenta is an open-source LLM developer platform for prompt engineering, evaluation, human feedback, and deployment of complex LLM applications. It provides tools for prompt engineering and management, evaluation, human annotation, and deployment, all without imposing any restrictions on your choice of framework, library, or model. Agenta allows developers and product teams to collaborate in building production-grade LLM-powered applications in less time.
HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
NeuroAI_Course
Neuromatch Academy NeuroAI Course Syllabus is a repository that contains the schedule and licensing information for the NeuroAI course. The course is designed to provide participants with a comprehensive understanding of artificial intelligence in neuroscience. It covers various topics related to AI applications in neuroscience, including machine learning, data analysis, and computational modeling. The content is primarily accessed from the ebook provided in the repository, and the course is scheduled for July 15-26, 2024. The repository is shared under a Creative Commons Attribution 4.0 International License and software elements are additionally licensed under the BSD (3-Clause) License. Contributors to the project are acknowledged and welcomed to contribute further.
Nocode-Wep
Nocode/WEP is a forward-looking office visualization platform that includes modules for document building, web application creation, presentation design, and AI capabilities for office scenarios. It supports features such as configuring bullet comments, global article comments, multimedia content, custom drawing boards, flowchart editor, form designer, keyword annotations, article statistics, custom appreciation settings, JSON import/export, content block copying, and unlimited hierarchical directories. The platform is compatible with major browsers and aims to deliver content value, iterate products, share technology, and promote open-source collaboration.
activepieces
Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide
anylabeling
AnyLabeling is a tool for effortless data labeling with AI support from YOLO and Segment Anything. It combines features from LabelImg and Labelme with an improved UI and auto-labeling capabilities. Users can annotate images with polygons, rectangles, circles, lines, and points, as well as perform auto-labeling using YOLOv5 and Segment Anything. The tool also supports text detection, recognition, and Key Information Extraction (KIE) labeling, with multiple language options available such as English, Vietnamese, and Chinese.
llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.
bitcart
Bitcart is a platform designed for merchants, users, and developers, providing easy setup and usage. It includes various linked repositories for core daemons, admin panel, ready store, Docker packaging, Python library for coins connection, BitCCL scripting language, documentation, and official site. The platform aims to simplify the process for merchants and developers to interact and transact with cryptocurrencies, offering a comprehensive ecosystem for managing transactions and payments.
how-to-optim-algorithm-in-cuda
This repository documents how to optimize common algorithms based on CUDA. It includes subdirectories with code implementations for specific optimizations. The optimizations cover topics such as compiling PyTorch from source, NVIDIA's reduce optimization, OneFlow's elementwise template, fast atomic add for half data types, upsample nearest2d optimization in OneFlow, optimized indexing in PyTorch, OneFlow's softmax kernel, linear attention optimization, and more. The repository also includes learning resources related to deep learning frameworks, compilers, and optimization techniques.
DecryptPrompt
This repository does not provide a tool, but rather a collection of resources and strategies for academics in the field of artificial intelligence who are feeling depressed or overwhelmed by the rapid advancements in the field. The resources include articles, blog posts, and other materials that offer advice on how to cope with the challenges of working in a fast-paced and competitive environment.
RAG-Retrieval
RAG-Retrieval provides full-chain RAG retrieval fine-tuning and inference code. It supports fine-tuning any open-source RAG retrieval models, including vector (embedding, graph a), delayed interactive models (ColBERT, graph d), interactive models (cross encoder, graph c). For inference, RAG-Retrieval focuses on ranking (reranker) and has developed a lightweight Python library rag-retrieval, providing a unified way to call any different RAG ranking models.
For similar tasks
CVPR2024-Papers-with-Code-Demo
This repository contains a collection of papers and code for the CVPR 2024 conference. The papers cover a wide range of topics in computer vision, including object detection, image segmentation, image generation, and video analysis. The code provides implementations of the algorithms described in the papers, making it easy for researchers and practitioners to reproduce the results and build upon the work of others. The repository is maintained by a team of researchers at the University of California, Berkeley.
ezlocalai
ezlocalai is an artificial intelligence server that simplifies running multimodal AI models locally. It handles model downloading and server configuration based on hardware specs. It offers OpenAI Style endpoints for integration, voice cloning, text-to-speech, voice-to-text, and offline image generation. Users can modify environment variables for customization. Supports NVIDIA GPU and CPU setups. Provides demo UI and workflow visualization for easy usage.
ms-copilot-play
Microsoft Copilot Play is a Cloudflare Worker service that accelerates Microsoft Copilot functionalities in China. It allows high-speed access to Microsoft Copilot features like chatting, notebook, plugins, image generation, and sharing. The service filters out meaningless requests used for statistics, saving up to 80% of Cloudflare Worker requests. Users can deploy the service easily with Cloudflare Worker, ensuring fast and unlimited access with no additional operations. The service leverages the power of Microsoft Copilot, based on OpenAI GPT-4, and utilizes Bing search to answer questions.
kaapana
Kaapana is an open-source toolkit for state-of-the-art platform provisioning in the field of medical data analysis. The applications comprise AI-based workflows and federated learning scenarios with a focus on radiological and radiotherapeutic imaging. Obtaining large amounts of medical data necessary for developing and training modern machine learning methods is an extremely challenging effort that often fails in a multi-center setting, e.g. due to technical, organizational and legal hurdles. A federated approach where the data remains under the authority of the individual institutions and is only processed on-site is, in contrast, a promising approach ideally suited to overcome these difficulties. Following this federated concept, the goal of Kaapana is to provide a framework and a set of tools for sharing data processing algorithms, for standardized workflow design and execution as well as for performing distributed method development. This will facilitate data analysis in a compliant way enabling researchers and clinicians to perform large-scale multi-center studies. By adhering to established standards and by adopting widely used open technologies for private cloud development and containerized data processing, Kaapana integrates seamlessly with the existing clinical IT infrastructure, such as the Picture Archiving and Communication System (PACS), and ensures modularity and easy extensibility.
MONAI
MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging. It provides a comprehensive set of tools for medical image analysis, including data preprocessing, model training, and evaluation. MONAI is designed to be flexible and easy to use, making it a valuable resource for researchers and developers in the field of medical imaging.
PyTorch-Tutorial-2nd
The second edition of "PyTorch Practical Tutorial" was completed after 5 years, 4 years, and 2 years. On the basis of the essence of the first edition, rich and detailed deep learning application cases and reasoning deployment frameworks have been added, so that this book can more systematically cover the knowledge involved in deep learning engineers. As the development of artificial intelligence technology continues to emerge, the second edition of "PyTorch Practical Tutorial" is not the end, but the beginning, opening up new technologies, new fields, and new chapters. I hope to continue learning and making progress in artificial intelligence technology with you in the future.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.
For similar jobs
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.
peft
PEFT (Parameter-Efficient Fine-Tuning) is a collection of state-of-the-art methods that enable efficient adaptation of large pretrained models to various downstream applications. By only fine-tuning a small number of extra model parameters instead of all the model's parameters, PEFT significantly decreases the computational and storage costs while achieving performance comparable to fully fine-tuned models.
jetson-generative-ai-playground
This repo hosts tutorial documentation for running generative AI models on NVIDIA Jetson devices. The documentation is auto-generated and hosted on GitHub Pages using their CI/CD feature to automatically generate/update the HTML documentation site upon new commits.
emgucv
Emgu CV is a cross-platform .Net wrapper for the OpenCV image-processing library. It allows OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio, Unity, and "dotnet" command, and it can run on Windows, Mac OS, Linux, iOS, and Android.
MMStar
MMStar is an elite vision-indispensable multi-modal benchmark comprising 1,500 challenge samples meticulously selected by humans. It addresses two key issues in current LLM evaluation: the unnecessary use of visual content in many samples and the existence of unintentional data leakage in LLM and LVLM training. MMStar evaluates 6 core capabilities across 18 detailed axes, ensuring a balanced distribution of samples across all dimensions.
VLMEvalKit
VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs). It enables one-command evaluation of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt generation-based evaluation for all LVLMs, and provide the evaluation results obtained with both exact matching and LLM-based answer extraction.
llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.