Awesome-Segment-Anything
This repository is for the first comprehensive survey on Meta AI's Segment Anything Model (SAM).
Stars: 789
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
README:
The First Comprehensive SAM Survey: A Comprehensive Survey on Segment Anything Model for Vision and Beyond. Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu. [paper] [homepage][中文解读]
Abstract:
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence similar to that of a human being. This is in contrast to narrow or specialized AI, which is designed to perform specific tasks with a high degree of efficiency. Therefore, it is urgent to design a general class of models, which we term foundation models, trained on broad data that can be adapted to various downstream tasks. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation, greatly promoting the development of foundation models for computer vision. To fully comprehend SAM, we conduct a survey study. As the first to comprehensively review the progress of segmenting anything task for vision and beyond based on the foundation model of SAM, this work focuses on its applications to various tasks and data types by discussing its historical development, recent progress, and profound impact on broad applications. We first introduce the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM that are significant for segmenting anything task. Then, we analyze and summarize the advantages and limitations of SAM across various image processing applications, including software scenes, real-world scenes, and complex scenes. Importantly, many insights are drawn to guide future research to develop more versatile foundation models and improve the architecture of SAM. We also summarize massive other amazing applications of SAM in vision and beyond. Finally, we maintain a continuously updated paper list and an open-source project summary for foundation model SAM at here.
Awesome Segment Anything Models: A curated list of awesome segment anything models in computer vision and beyond. This repository supplements our survey paper. We intend to continuously update it.
We strongly encourage authors of relevant works to make a pull request and add their paper's information [here].
💥SAM 2: Segment Anything in Images and Videos was released.
💥The first survey on SAM for videos: Segment Anything for Videos: A Systematic Survey was online.
- 2024.07.31: The first survey on SAM for videos was online.
- 2024.07.30: The SAM 2 was released.
- 2023.07.14: "Segment Anything" was accepted by ICCV 2023.
- 2023.05.16: An initial version of recent papers and projects.
- 2023.04.05: The paper of "Segment Anything" was online.
If you find our work useful in your research, please consider citing:
@article{chunhui2023samsurvey,
title={A Comprehensive Survey on Segment Anything Model for Vision and Beyond},
author={Zhang, Chunhui and Liu, Li and Cui, Yawen and Huang, Guanjie and Lin, Weilin and Yang, Yiqian and Hu, Yuehong},
journal={arXiv:2305.08196},
year={2023}
}
@article{chunhui2024samforvideos,
title={Segment Anything for Videos: A Systematic Survey},
author={Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan},
journal={arXiv},
year={2024}
}
-
The first comprehensive SAM survey: Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu.
"A Comprehensive Survey on Segment Anything Model for Vision and Beyond." ArXiv (2024). [paper] [homepage] [中文解读] [2023.05] -
SAM for Videos: Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan.
"Segment Anything for Videos: A Systematic Survey." ArXiv (2024). [ArXiv] [ChinaXiv] [ResearchGate] [Project] [中文解读] [2024.07] -
SAM4MIS: Yichi Zhang, Rushi Jiao.
"Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey." CBM (2024). [paper] [project] [2023.05] -
Yichi Zhang, Zhenrong Shen.
"Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey." ArXiv (2024). [paper] [code] [2024.08] -
Tianfei Zhou, Fei Zhang, Boyu Chang, Wenguan Wang, Ye Yuan, Ender Konukoglu, Daniel Cremers.
"Image Segmentation in Foundation Model Era: A Survey." ArXiv (2024). [paper] [2024.08] -
Chaoning Zhang, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong.
"A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering." ArXiv (2024). [paper] [2023.05]
-
SAM: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick.
"Segment Anything." ICCV (2023) Best Paper Honorable Mention. [paper] [homepage] [code] [Zhihu] [Reddit] [2023.04] -
SAM 2: Nikhila Ravi∗,†, Valentin Gabeur∗, Yuan-Ting Hu∗, Ronghang Hu∗, Chaitanya Ryali∗, Tengyu Ma∗, Haitham Khedr∗, Roman Rädle∗ Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár†, Christoph Feichtenhofer∗,†.
"SAM 2: Segment Anything in Images and Videos." ArXiv (2024). [paper] [demo]] [code] [project]] [dataset] [blog] [2024.07] -
GPT-4V: OpenAI.
"GPT-4V(ision) System Card." ArXiv (2023). [paper] [homepage] [2023.09] -
Gemini: Gemini Team, Googl.
"Gemini: A Family of Highly Capable Multimodal Models." ArXiv (2023). [paper] [homepage] [blog] [2023.12] -
SEEM: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Gao, Yong Jae Lee.
"Segment Everything Everywhere All at Once." NeurIPS (2023). [paper] [code] [2023.04] -
SegGPT: Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang.
"SegGPT: Segmenting Everything In Context." ICCV (2023). [paper] [code] [2023.04] -
Grounding DINO: Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection." ArXiv (2023). [paper] [code] [2023.04] -
ImageBind: Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra.
"ImageBind: One Embedding Space To Bind Them All." CVPR (2023). [paper] [homepage] [code] [2023.05] -
LanguageBind: Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan.
"LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment." ArXiv (2023). [paper] [code] -
Meta-Transformer: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue.
"Meta-Transformer: A Unified Framework for Multimodal Learning." ArXiv (2023). [paper] [homepage] [code] [中文解读] [2023.07] -
OpenSeeD: Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang.
"A Simple Framework for Open-Vocabulary Segmentation and Detection." ICCV (2023). [paper] [code] [2023.03] -
RAM: Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang.
"Recognize Anything: A Strong Image Tagging Model." ArXiv (2023). [paper] [homepage] [code] [2023.06] -
PACGen: Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee.
"Generate Anything Anywhere in Any Scene." ArXiv (2023). [paper] [homepage] [code] [2023.06] -
ASM: Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao.
"The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World." ArXiv (2023). [paper] [homepage] [demo] [2023.08] -
OneFormer: Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi.
"OneFormer: One Transformer to Rule Universal Image Segmentation." CVPR (2023). [paper] [homepage] [code] [2022.11] -
OVSeg: Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu.
"Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP." CVPR (2023). [paper] [homepage] [code] [2022.10]
💥Chen, Haoyuan, Sihang Zhou, Kuan Li, Jianping Yin, and Jian Huang.
"A Hybrid Framework for Referring Image Segmentation: Dual-Decoder Model with SAM Complementation." Mathematics (2024).
[paper]
[2024.10]
💥DFQ-SAM: Zhikai Li, Jing Zhang, Qingyi Gu.
"Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare." ArXiv (2024).
[paper]
[2024.10]
💥SinkSAM: Osher Rafaeli, Tal Svoray, Ariel Nahlieli.
"SinkSAM: A Monocular Depth-Guided SAM Framework for Automatic Sinkhole Segmentation." ArXiv (2024).
[paper]
[2024.10]
💥Qingyuan Liu, Avideh Zakhor.
"Adapting Segment Anything Model to Melanoma Segmentation in Microscopy Slide Images." ArXiv (2024).
[paper]
[2024.10]
💥PixelCLIP: Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim.
"Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels." NeurIPS (2024).
[paper]
[code]
[2024.09]
💥Automatic MedSAM: Mélanie Gaillochet, Christian Desrosiers, Hervé Lombaert.
"Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision." MICCAI-MedAGI (2024).
[paper]
[code]
[2024.09]
💥Iira Häkkinen, Iaroslav Melekhov, Erik Englesson, Hossein Azizpour, Juho Kannala.
"Medical Image Segmentation with SAM-generated Annotations." ECCVW (2024).
[paper]
[2024.09]
💥SCEF: Fulong Ma, Guoyang Zhao, Weiqing Qi, Ming Liu, Jun Ma.
"Task-Oriented Pre-Training for Drivable Area Detection." ArXiv (2024).
[paper]
[code]
[2024.09]
💥VideoLISA: Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou.
"One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos." NeurIPS (2024).
[paper]
[code]
[2024.09]
💥RoboNurse-VLA: Shunlei Li, Jin Wang, Rui Dai, Wanyu Ma, Wing Yin Ng, Yingbai Hu, Zheng Li.
"RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model." ArXiv (2024).
[paper]
[code]
[2024.09]
💥MedCLIP-SAMv2: Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao.
"MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation." ArXiv (2024).
[paper]
[code]
[2024.09]
💥SAM2-VCOS: Yuli Zhou, Guolei Sun, Yawei Li, Luca Benini, Ender Konukoglu.
"When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation." ArXiv (2024).
[paper]
[code]
[2024.09]
💥SAM-ICE: Zhao, Ruhao and Zhong, Xian and Liao, Liang and Liu, Wenxuan and Huang, Wenxin and Wang, Zheng.
"Localization of Image Splicing Under Segment Anything Model With Integrated Compression and Edge Artifacts." ICIP (2024).
[paper]
[2024.09]
💥SAM-SPB: Zhao, Quan and Wu, Siying and Zhang, Yueyi and Sun, Xiaoyan.
"Semantic-Enhanced Point-Box Joint Prompting for Video Object Segmentation" ICIP (2024).
paper]
[2024.09]
💥OpenDet-D: Luo, Sheng and Zhou, Yi.
"Open World Object Detection Via Cooperative Foundation Models for Driving Scenes." ICIP (2024). [paper]
[2024.09]
💥CERM: Zhao, Xueqiang, Zheng Wu, Yangbo Chen, Wei Zhou, and Mingan Wei..
"Fine-Grained High-Resolution Remote Sensing Image Change Detection by SAM-UNet Change Detection Model." Remote Sensing (2024).
[paper]
[2024.09]
💥VP-SAM: Zhixue Fang, Yuzhi Liu, Huisi Wu , and Jin Qin.
"VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network." ECCV (2024).
[paper]
[code]
[2024.09]
💥Pro2SAM: Xi Yang, Songsong Duan, Nannan Wang, and Xinbo Gao.
"Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization." ECCV (2024).
[paper]
[2024.09]
💥EPLD: Jing Li, Junsong Fan, and Zhaoxiang Zhang.
"Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance." ECCV (2024).
[paper]
[2024.09]
💥Gaussian Grouping: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke.
"Gaussian Grouping: Segment and Edit Anything in 3D Scenes —– Supplementary Material —." ECCV (2024).
[paper]
[2024.09]
💥GiT: Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang.
"Supplementary to GiT: Towards Generalist Vision Transformer through Universal Language Interface." ECCV (2024).
[paper]
[2024.09]
💥SFRecSAM: Wanting Zhang, Huisi Wu1, and Jing Qi.
"Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction." ECCV (2024).
[paper]
[code]
[2024.09]
💥PQ-SAM: Xiaoyu Liu, Xin Ding, Lei Yu, Yuanyuan Xi, Wei Li, Zhijun Tu, Jie Hu, Hanting Chen, Baoqun Yin, and Zhiwei Xiong.
"PQ-SAM: Post-training Quantization for Segment Anything Model." ECCV (2024).
[paper]
[2024.09]
💥Zhi, Junjun, Lin Li, Hong Zhu, Zipeng Li, Mian Wu, Rui Dong, Xinyue Cao, Wangbing Liu, Le’an Qu, Xiaoqing Song, and et al.
"Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees." Forests (2024).
[paper]
[2024.09]
💥Zhewei Chena, Wai Keung Wonga,b, Zuofeng Zhongb, Jinpiao Liaoa, and Ying Qu.
"Efficient Domain Knowledge Injection for Bridging the Gap Between Generalized Large Vision Models and Specialized Fabric Defect Tasks." Journal of Natural Fibers (2024).
[paper]
[2024.09]
💥SAM+nnUNet: Fang, Z. and Lu, Z. and Liu, H. and Liu, Y. and Mok, G. S.P.
"SAM+nnUNet: Deep-learning-based Head and Neck Tumor Segmentation on FDG PET." IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD) (2024).
[paper]
[2024.09]
💥Khajvand, N. and Ahmadyar, Y. and Samimi, R. and Mehrban, Q. and Kamali-Asl, A. and Arabi, H. and Zaidi, H.
"Whole-Body PET Tumor Lesion Segmentation Using a Transformer-Based Model." IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD) (2024).
[paper]
[2024.09]
💥Robotic-CLIP: Nghia Nguyen, Minh Nhat Vu, Tung D. Ta, Baoru Huang, Thieu Vo, Ngan Le, Anh Nguyen.
"Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications." ArXiv (2024).
[paper]
[2024.09]
💥Jiangshan Liu, Wenlong Dong, Jiankun Wang, Max Q.-H. Meng.
"Leveraging Semantic and Geometric Information for Zero-Shot Robot-to-Human Handover." ArXiv (2024).
[paper]
[code]
[2024.09]
💥3D-SAutoMed: Junjie Liang, Peng Cao, Wenju Yang, Jinzhu Yang, and Osmar R. Zaiane.
"3D-SAutoMed: Automatic Segment Anything Model for 3D Medical Image Segmentation from Local-Global Perspective." MICCAI (2024).
[paper]
[2024.09]
💥Rezzag Bedida T, Hammouya A.
"Improving SAM model for medical image segmentation." ArXiv (2024).
[paper]
[2024.09]
💥Stiles, Nicole.
"Efficient Segment Anything on the Edge." ArXiv (2024).
[paper]
[2024.09]
-
DarkSAM: Ziqi Zhou, Yufei Song, Minghui Li, Shengshan Hu, Xianlong Wang, Leo Yu Zhang, Dezhong Yao, Hai Jin.
"DarkSAM: Fooling Segment Anything Model to Segment Nothing." NeurIPS (2024). [paper] [code] [2024.09] -
SegVLAD: Kartik Garg, Sai Shubodh Puligilla, Shishir Kolathaya, Madhava Krishna, Sourav Garg.
"Revisit Anything: Visual Place Recognition via Image Segment Retrieval." ECCV (2024). [paper] [code] [2024.09] -
SAMAL: Alvaro Patricio, Joao Valente, Atabak Dehban, Ines Cadilha, Daniel Reis, Rodrigo Ventura.
"AI-Powered Augmented Reality for Satellite Assembly, Integration and Test." ArXiv (2024). [paper] [2024.09] -
GMed-SA: Meng Wang, Yarong Feng, Yongwei Tang, Tian Zhang, Yuxin Liang, Chao Lv.
"Global-Local Medical SAM Adaptor Based on Full Adaption." ArXiv (2024). [paper] [2024.09] -
UW-COT: Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang.
"Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2." ArXiv (2024). [paper] [project] -
Illia Tsiporenko, Pavel Chizhov, Dmytro Fishman.
"Going Beyond U-Net: Assessing Vision Transformers for Semantic Segmentation in Microscopy Image Analysis." ECCVW (2024). [paper] [2024.09] -
Xi Wang, Tianxing Chen, Qiaojun Yu, Tianling Xu, Zanxin Chen, Yiting Fu, Cewu Lu, Yao Mu, Ping Luo.
"Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking." ArXiv (2024). [paper] [code] [2024.09] -
Ali Badiezadeh, Amin Malekmohammadi, Seyed Mostafa Mirhassani, Parisa Gifani, Majid Vafaeezadeh.
"Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis: A Comparative Study of Mamba, SAM, and YOLO." ArXiv (2024). [paper] [code] [2024.09] -
Sunoh Lee, Minsik Jeon, Jihong Min, Junwon Seo.
"Open-World Object Detection with Instance Representation Learning." ArXiv (2024). [paper] [code] [2024.09] -
CAD: Joohyeok Kim, Joonhyeon Song, Seohwan Yun, Seongho Yoon, Sangmin Lee.
"CAD: Memory Efficient Convolutional Adapter for Segment Anything." ArXiv (2024). [paper] [code] [2024.09] -
UOIS-SAM: Rui Cao, Chuanxin Song, Biqi Yang, Jiangliu Wang, Pheng-Ann Heng, Yun-Hui Liu.
"Adapting Segment Anything Model for Unseen Object Instance Segmentation." ArXiv (2024). [paper] [2024.09] -
SOS: Christian Wilms, Tim Rolff, Maris Hillemann, Robert Johanson, Simone Frintrop.
"SOS: Segment Object System for Open-World Instance Segmentation With Object Priors." ECCV (2024). [paper] [code] [2024.09] -
EvanySeg: Ahjol Senbi, Tianyu Huang, Fei Lyu, Qing Li, Yuhui Tao, Wei Shao, Qiang Chen, Chengyan Wang, Shuo Wang, Tao Zhou, Yizhe Zhang.
"Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images." ArXiv (2024). [paper] [code] [2024.09] -
SAMEdge: Rui Lu, Siping Shi, Yanting Liu, Dan Wang.
"SAMEdge: An Edge-cloud Video Analytics Architecture for the Segment Anything Model." ArXiv (2024). [paper] [2024.09] -
AdvImmu: Wei-Bin Kou, Guangxu Zhu, Rongguang Ye, Shuai Wang, Qingfeng Lin, Ming Tang, Yik-Chung Wu.
"An Adverse Weather-Immune Scheme with Unfolded Regularization and Foundation Model Knowledge Distillation for Street Scene Understanding." ArXiv (2024). [paper] [code] [2024.09] -
VTA: Yuchen Hu, Yu Gu, Chenxing Li, Rilin Chen, Dong Yu.
"Video-to-Audio Generation with Fine-grained Temporal Semantics." ArXiv (2024). [paper] [code] [2024.09] -
Kundan Chaudhary, Subhei Shaar, Raja Muthinti.
"Deep learning for fast segmentation and critical dimension metrology & characterization enabling AR/VR design and fabrication." ArXiv (2024). [paper] [2024.09] -
S-AModal: Jasmin Breitenstein, Franz Jünger, Andreas Bär, Tim Fingscheidt.
"Foundation Models for Amodal Video Instance Segmentation in Automated Driving." ECCV VCAD Workshop (2024). [paper] [code] [2024.09] -
MGLMM: Li Zhou, Xu Yuan, Zenghui Sun, Zikun Zhou, Jingsong Lan.
"Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model." ArXiv (2024). [paper] [code] [2024.09] -
PointSAM: Nanqing Liu, Xun Xu, Yongyi Su, Haojie Zhang, Heng-Chao Li.
"PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images." ArXiv (2024). [paper] [code] [2024.09] -
MCICSAM: Guantian Huang, Beibei Li, Xiaobing Fan, Aritrick Chatterjee, Cheng Wei, Shouliang Qi, Wei Qian, Dianning He.
"MCICSAM: Monte Carlo-guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prostate Zone Segmentation." ArXiv (2024). [paper] [2024.09] -
Francis Ogoke, Sumesh Kalambettu Suresh, Jesse Adamczyk, Dan Bolintineanu, Anthony Garland, Michael Heiden, Amir Barati Farimani.
"Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring." ArXiv (2024). [paper] [2024.09] -
Michele Carlo La Greca, Mirko Usuelli, Matteo Matteucci.
"Enhancing Agricultural Environment Perception via Active Vision and Zero-Shot Learning." ArXiv (2024). [paper] [code] [2024.09] -
Ali, L., Alnajjar, F., Swavaf, M. et al.
"Evaluating segment anything model (SAM) on MRI scans of brain tumors." Scientific Reports (2024). [paper] [2024.09] -
SAM-RSIS: Luo, Muying and Zhang, Tao and Wei, Shiqing and Ji, Shunping.
"SAM-RSIS: progressively adapting SAM with box prompting to remote sensing image instance segmentation." IEEE Transactions on Geoscience and Remote Sensing (2024). [paper] [2024.09] -
EmbSAM: Cunmin Zhao, Zelin Li, Pei Zhang, Yixuan Chen, Pohao Ye, Ming-Kin Wong, Lu-Yan Chan, Hong Yan, Chao Tang, Guoye Guan, Zhongying Zhao.
"EmbSAM: Cell boundary localization and Segment Anything Model for fast images of developing embryos." ArXiv (2024). [paper] [2024.09] -
Bengtsson B R, Bengs J.
"Accelerated Segmentation with Mixed-Precision Quantization of EfficientViT-SAM." ArXiv (2024). [paper] [2024.09] -
Pengzhou Cai, Xueyuan Zhang, Libin Lan, Ze Zhao.
"Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models." ArXiv (2024). [paper] [code] [2024.09] -
Ali, L., Alnajjar, F., Swavaf, M. et al.
"Evaluating segment anything model (SAM) on MRI scans of brain tumors." Scientific Reports (2024). [paper] [2024.09] -
Sigurðardóttir, Andrea Rakel, Hildur Inga Sveinsdóttir, Nette Schultz, Hafsteinn Einarsson, and María Gudjónsdóttir.
"Sequence Segmentation of Nematodes in Atlantic Cod with Multispectral Imaging Data." Foods (2024). [paper] [2024.09] -
Kulyabin, Mikhail, Aleksei Zhdanov, Andrey Pershin, Gleb Sokolov, Anastasia Nikiforova, Mikhail Ronkin, Vasilii Borisov, and Andreas Maier.
"Segment Anything in Optical Coherence Tomography: SAM 2 for Volumetric Segmentation of Retinal Biomarkers." Bioengineering (2024). [paper] [2024.09] -
DAPSAM: Zhikai Wei, Wenhui Dong, Peilin Zhou, Yuliang Gu, Zhou Zhao, Yongchao Xu.
"Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation." MICCAI (2024). [paper] [code] [2024.09] -
GraspSAM: Sangjun Noh, Jongwon Kim, Dongwoo Nam, Seunghyeok Back, Raeyoung Kang, Kyoobin Lee.
"GraspSAM: When Segment Anything Model Meets Grasp Detection." ArXiv (2024). [paper] [code] [2024.09] -
FGSA-Net: Shizhou Zhang, Dexuan Kong, Yinghui Xing, Yue Lu, Lingyan Ran, Guoqiang Liang, Hexu Wang, Yanning Zhang.
"Frequency-Guided Spatial Adaptation for Camouflaged Object Detection." IEEE TMM (2024). [paper] [2024.09] -
SAM4MLLM: Yi-Chia Chen, Wei-Hua Li, Cheng Sun, Yu-Chiang Frank Wang, Chu-Song Chen.
"SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation." ECCV (2024). [paper] [code] [2024.09] -
AMRF: Zheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara.
"An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation." ECCVW (2024). [paper] [2024.09] -
Xin Hu, Janet Wang, Jihun Hamm, Rie R Yotsu, Zhengming Ding.
"Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment." ArXiv (2024). [paper] [2024.09] -
YOLO-SAM 2: Mobina Mansoori, Sajjad Shahabodini, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi.
"Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model." ArXiv (2024). [paper] [code] [2024.09] -
SAM-OCTA2: Xinrun Chen, Chengliang Wang, Haojian Ning, Mengzhan Zhang, Mei Shen, Shiying Li.
"SAM-OCTA2: Layer Sequence OCTA Segmentation with Fine-tuned Segment Anything Model 2." ArXiv (2024). [paper] [code] [2024.09] -
TP-Mamba: Hualiang Wang, Yiqun Lin, Xinpeng Ding, Xiaomeng Li.
"Tri-Plane Mamba: Efficiently Adapting Segment Anything Model for 3D Medical Images." ArXiv (2024). [paper] [code] [2024.09] -
Henninger, S.; Kellner, M.; Rombach, B.; Reiterer, A.
"Reducing Training Data Using Pre-Trained Foundation Models: A Case Study on Traffic Sign Segmentation Using the Segment Anything Model." Journal of Imaging (2024). [paper] [2024.09] -
Hayoung Lee1 , Kwangseob Kim2 , Kiwon Lee.
"Application of Geo-Segment Anything Model (SAM) Scheme to Water Body Segmentation: An Experiment Study Using CAS500-1 Images." Korean Journal of Remote Sensing (2024). [paper] [2024.09] -
YOLOv5-n and EdgeSAM-3×: Hongtao Li,Yong Yang,Shengping Wang,Zhigao Chen &Linbang He.
"Automatic detection and extraction of lost shipping containers based on YOLO and the segment anything model." Remote Sensing Letters (2024). [paper] [2024.09] -
Kulyabin M, Zhdanov A, Pershin A, et al.
"Segment Anything in OCT: SAM 2 for Volumetric Segmentation of Retinal Biomarkers." ArXiv (2024). [paper] [2024.09] -
WaterSAM: Hong, Yang, Xiaowei Zhou, Ruzhuang Hua, Qingxuan Lv, and Junyu Dong.
"WaterSAM: Adapting SAM for Underwater Object Segmentation." Journal of Marine Science and Engineering (2024). [paper] [2024.09] -
HQHSAM: Kerem Cekmeceli, Meva Himmetoglu, Guney I. Tombak, Anna Susmelj, Ertunc Erdil, Ender Konukoglu.
"Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?." ArXiv (2024). [paper] [code] [2024.09] -
SimMAT: Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang.
"SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality." ArXiv (2024). [paper] [code] [2024.09] -
Shilin Hu, Hieu Le, ShahRukh Athar, Sagnik Das, Dimitris Samaras.
"Shadow Removal Refinement via Material-Consistent Shadow Edges." ArXiv (2024). [paper] [2024.09] -
SoftShadow: Xinrui Wang, Lanqing Guo, Xiyu Wang, Siyu Huang, Bihan Wen.
"SoftShadow: Leveraging Penumbra-Aware Soft Masks for Shadow Removal." ArXiv (2024). [paper] [code] [2024.09] -
PaveSAM: Neema Jakisa Owor, Yaw Adu-Gyamfi, Armstrong Aboah, Mark Amo-Boateng.
"PaveSAM Segment Anything for Pavement Distress." Road Materials and Pavement Design (2024). [paper] [2024.09] -
Swin-LiteMedSAM: Ruochen Gao, Donghang Lyu, Marius Staring.
"Swin-LiteMedSAM: A Lightweight Box-Based Segment Anything Model for Large-Scale Medical Image Datasets." ArXiv (2024). [paper] [code] [2024.09] -
Sam2Rad: Assefa Seyoum Wahd, Banafshe Felfeliyan, Yuyue Zhou, Shrimanti Ghosh, Adam McArthur, Jiechen Zhang, Jacob L. Jaremko, Abhilash Hareendranathan.
"Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts." ArXiv (2024). [paper] [2024.09] -
SimIRSTD: Mingjin Zhang, Chi Zhang, Qiming Zhang, Yunsong Li, Xinbo Gao, Jing Zhang.
"Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection." ACM MM (2024). [paper] [code] [2024.09] -
SSFam: Zhengyi Liu, Sheng Deng, Xinrui Wang, Linbo Wang, Xianyong Fang, Bin Tang.
"SSFam: Scribble Supervised Salient Object Detection Family." TMM (2024). [paper] [code] [2024.09] -
TAVP: Jiaqi Yang, Ye Huang, Xiangjian He, Linlin Shen, Guoping Qiu.
"TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation." ArXiv (2024). [paper] [2024.09] -
AnomalyCD: Jingtao Li, Qian Zhu, Xinyu Wang, Hengwei Zhao, Yanfei Zhong.
"AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations." ArXiv (2024). [paper] [2024.09] -
GeSCF: Jaewoo Kim, Uehwan Kim.
"Towards Generalizable Scene Change Detection." ArXiv (2024). [paper] [2024.09] -
Giulio Passerotti, Alberto Alberello, Marcello Vichi, Luke G. Bennetts, Alessandro Toffoli.
"Segmenting sea ice floes in close-range optical imagery with active contour and foundation models." ArXiv (2024). [paper] [2024.09] -
EGFS: Ting-Ru Liu, Hsuan-Kung Yang, Jou-Min Liu, Chun-Wei Huang, Tsung-Chih Chiang, Quan Kong, Norimasa Kobori, Chun-Yi Lee.
"Reprojection Errors as Prompts for Efficient Scene Coordinate Regression." ECCV (2024). [paper] [code] [2024.09] -
FS-MedSAM2: Yunhao Bai, Qinji Yu, Boxiang Yun, Dakai Jin, Yingda Xia, Yan Wang.
"FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning." ArXiv (2024). [paper] [code] [2024.09] -
M2Trans: Ni, Zhangkai and Xiao, Runyu and Yang, Wenhan and Wang, Hanli and Wang, Zhihua and Xiang, Lihua and Sun, Liping.
"M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution." IEEE Journal of Biomedical and Health Informatics (2024). [paper] [2024.09] -
Pablo Delgado-Rodriguez, Roman Kinakh, Rafael Aldabe, Arrate Munoz-Barrutia.
"SAM-based Automatic Workflow for Histology Cyst Segmentation in Autosomal Dominant Polycystic Kidney Disease." ArXiv (2024). [paper] [2024.09] -
pix2pixGAN: Amir Syahmi, Xiangrong Lu, Yinxuan Li, Haoxuan Yao, Hanjun Jiang, Ishita Acharya, Shiyi Wang, Yang Nan, Xiaodan Xing, Guang Yang.
"Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.09] -
BF-SAM:: Zhaoya Gong, Binbo Li, Chenglong Wang, Jun Chen & Pengjun Zhao.
"BF-SAM: enhancing SAM through multi-modal fusion for fine-grained building function identification." International Journal of Geographical Information Science (2024). [paper] [2024.09] -
LOBSTAR: Yanming Xiu, Tim Scargill, Maria Gorlatova.
"LOBSTAR: Language Model-based Obstruction Detection for Augmented Reality." ArXiv (2024). [paper] [2024.09] -
Sebastian Sepulveda, Benjamin Bustos, Ivan Sipiran.
"Repetitive Patterns Recognition in Textures of Ancient Peruvian Pottery." ACM Journal on Computing and Cultural Heritage (2024). [paper] [2024.09] -
RoVi-Aug: Lawrence Yunliang Chen, Chenfeng Xu, Karthik Dharmarajan, Zubair Irshad, Richard Cheng, Kurt Keutzer, Masayoshi Tomizuka, Quan Vuong, Ken Goldberg.
"RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning." CoRL (2024). [paper] [2024.09] -
DecoAD: Chenglizhao Chen, Xinyu Liu, Mengke Song, Luming Li, Xu Yu, Shanchen Pang.
"Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection." ArXiv (2024). [paper] [code] [2024.09] -
MouseSIS: Friedhelm Hamann, Hanxiong Li, Paul Mieske, Lars Lewejohann, Guillermo Gallego.
"MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice." ECCVW (2024). [paper] [code] [2024.09] -
FrozenSeg: Xi Chen, Haosen Yang, Sheng Jin, Xiatian Zhu, Hongxun Yao.
"FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation." ArXiv (2024). [paper] [code] [2024.09] -
TG-LMM: Yihao Zhao, Enhao Zhong, Cuiyun Yuan, Yang Li, Man Zhao, Chunxia Li, Jun Hu, Chenbin Liu.
"TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model." ArXiv (2024). [paper] [2024.09] -
InstanceSAM2Eval: Tiantian Zhang, Zhangjun Zhou, Jialun Pei.
"Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation." ArXiv (2024). [paper] [code] [2024.09] -
Curriculum-Prompting: Xiuqi Zheng, Yuhang Zhang, Haoran Zhang, Hongrui Liang, Xueqi Bao, Zhuqing Jiang, Qicheng Lao.
"Curriculum Prompting Foundation Models for Medical Image Segmentation." MICCAI (2024). [paper] [code] [2024.09] -
SAMTooth: Yifan Liu, Wuyang Li, Cheng Wang, Hui Chen, Yixuan Yuan.
"When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels." MICCAI (2024). [paper] [code] [2024.09] -
DPDEdit: Xiaolong Wang, Zhi-Qi Cheng, Jue Wang, Xiaojiang Peng.
"DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing." ArXiv (2024). [paper] [2024.09] -
Liang Geng.
"Improving Apple Object Detection with Occlusion-Enhanced Distillation." ArXiv (2024). [paper] [2024.09] -
MedSAM-U: Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu.
"MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM." ArXiv (2024). [paper] [2024.09] -
FCSAM: He, Dongzhi and Ma, Zeyuan and Li, Chenxi and Li, Yunqi.
"Dual‑branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images." ACCESS (2024). [paper] [2024.08] -
PMHO: Zhang, Shun and Long, Jihui and Xu, Yaohui and Mei, Shaohui.
"PMHO: Point-supervised Oriented Object Detection Based on Segmentation-Driven Proposal Generation." TGRS (2024). [paper] [2024.08] -
CSAD: Yu-Hsuan Hsieh, Shang-Hong Lai.
"CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection." ArXiv (2024). [paper] [2024.08] -
Tanner D. Harms, Steven L. Brunton, Beverley J. McKeon.
"Estimating Dynamic Flow Features in Groups of Tracked Objects." ArXiv (2024). [paper] [2024.08] -
Setu, J. H.; Islam, M.; Pasha, S. T.; Halder, N.; Hossain, E.; Mahmud, A.; Islam, A.; Alam, M. Z.; Amin, M. A.
"Segment Anything Model (SAM 2) Unveiled: Functionality, Applications, and Practical Implementation Across Multiple Domains." ArXiv (2024). [paper] [2024.08] -
Mani Abedini.
"Skin Cancer Classification by Leveraging Segment Anything Model for Semantic Segmentation of Skin Lesion." IJARCCE (2024). [paper] [2024.08] -
Ensembled-SAMs: Chen, Fei and Ge, Junyao and Zheng, Yang and Guo, Kaitai and Cao, Feng and Liang, Jimin.
"Ensembled-SAMs for Enhanced Small Coronary Artery Segmentation in CCTA Images." JBHI (2024). [paper] [2024.08] -
Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang.
"Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS." ArXiv (2024). [paper] [code] [2024.08] -
SAM2Point: Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Chengzhuo Tong, Peng Gao, Chunyuan Li, Pheng-Ann Heng.
"SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners." ArXiv (2024). [paper] [code] [2024.08] -
SlotSAM: Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Kunze Huang, Xinghao Ding, Yue Huang.
"Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning." ECCVW (2024). [paper] [code] [2024.08] -
Zhirui Gao, Renjiao Yi, Chenyang Zhu, Ke Zhuang, Wei Chen, Kai Xu.
"Generic Objects as Pose Probes for Few-Shot View Synthesis." ArXiv (2024). [paper] [2024.08] -
AL-Ref-SAM2: Shaofei Huang, Rui Ling, Hongyu Li, Tianrui Hui, Zongheng Tang, Xiaoming Wei, Jizhong Han, Si Liu.
"Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
SegmentWithSAM: Zafer Yildiz, Yuwen Chen, Maciej A. Mazurowski.
"SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images." ArXiv (2024). [paper] [code] [2024.08] -
Sammese: Kunpeng Wang, Keke Chen, Chenglong Li, Zhengzheng Tu, Bin Luo.
"Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance." ArXiv (2024). [paper] [2024.08] -
YOLO + SAM: Samir Kassam, Angelo Markham, Katie Vo, Yashas Revanakara, Michael Lam, Kevin Zhu.
"Intraoperative Glioma Segmentation with YOLO + SAM for Improved Accuracy in Tumor Resection." ArXiv (2024). [paper] [2024.08] -
PropSAM: Zifan Chen, Xinyu Nan, Jiazheng Li, Jie Zhao, Haifeng Li, Zilin Lin, Haoshen Li, Heyun Chen, Yiting Liu, Bin Dong, Li Zhang, Lei Tang.
"PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images." ArXiv (2024). [paper] [2024.08] -
FusionSAM: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu.
"FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation." ArXiv (2024). [paper] [2024.08] -
SAMesh: George Tang, William Zhao, Logan Ford, David Benhaim, Paul Zhang.
"Segment Any Mesh: Zero-shot Mesh Part Segmentation via Lifting Segment Anything 2 to 3D." ArXiv (2024). [paper] [code] [2024.08] -
Kai Nichols, Matthew Hauwiller, Nicholas Propes, Shaowei Wu, Stephanie Hernandez, Mike Kautzky.
"Segment Anything Model for Grain Characterization in Hard Drive Design." CVPRW (2024). [paper] [2024.08] -
VALE: Purushothaman Natarajan, Athira Nambiar.
"VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models." ArXiv (2024). [paper] [2024.08] -
S3Simulator: Kamal Basha S, Athira Nambiar.
"S3Simulator: A benchmarking Side Scan Sonar Simulator dataset for Underwater Image Analysis." ArXiv (2024). [paper] [code] [2024.08] -
frg-bgr-modeling: Lukas Picek, Lukas Neumann, Jiri Matas.
"Animal Identification with Independent Foreground and Background Modeling." ArXiv (2024). [paper] [code] [2024.08] -
3DSAM-adapter: Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou.
"3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation." MIA (2024). [paper] [code] [2024.08] -
Xiangru Li, Yifei Zhang, Liang Zhao.
"Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation." ArXiv (2024). [paper] [2024.08] -
Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li.
"LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS." ArXiv (2024). [paper] [2024.08] -
SAM-SP: Chunpeng Zhou, Kangjie Ning, Qianqian Shen, Sheng Zhou, Zhi Yu, Haishuai Wang.
"SAM-SP: Self-Prompting Makes SAM Great Again." ArXiv (2024). [paper] [2024.08] -
GSAM: Sota Kato, Hinako Mitsuoka, Kazuhiro Hotta.
"Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes." ECCVW (2024). [paper] [2024.08] -
Tuyen Tran.
"The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation." ArXiv (2024). [paper] [2024.08] -
EmbodiedSAM: Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu.
"EmbodiedSAM: Online Segment Any 3D Thing in Real Time." ArXiv (2024). [paper] [code] [2024.08] -
NuSegDG: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan.
"NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
SAM-REF: Chongkai Yu, Anqi Li, Xiaochao Qu, Luoqi Liu, Ting Liu.
"SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything." ArXiv (2024). [paper] [2024.08] -
VISTA: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li.
"A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
SAM-COD: Huafeng Chen, Pengxu Wei, Guangqian Guo, Shan Gao.
"SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection." ECCV (2024). [paper] [2024.08] -
Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu.
"The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution." ArXiv (2024). [paper] [2024.08] -
U-MedSAM: Xin Wang, Xiaoyu Liu , Peng Huang, Pu Huang, , Shu Hu,Hongtu Zhu.
"U-MedSAM: Uncertainty-aware MedSAM for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
SAM-driven MAE: Dong Wang, Qi Wang, Weidong Min, Di Gai, Qing Han, Longfei Li & Yuhan Geng .
"SAM-driven MAE pre-training and background-aware meta-learning for unsupervised vehicle re-identification." CVM (2024). [paper] [2024.08] -
MCA-SAM: Ke Zhou and Zhongwei Qiu and Dongmei Fu.
"Multi-scale contrastive adaptor learning for segmenting anything in underperformed scenes." Neurocomputing (2024). [paper] [2024.08] -
SAM_MLoRA: Lu, Xiaoyan and Weng, Qihao.
"Multi-LoRA Fine-Tuned Segment Anything Model for Urban Man-Made Object Extraction." IEEE TGRS (2024). [paper] [code] [2024.08] -
Wenhui Dong and Bo Du and Yongchao Xu.
"Source domain prior-assisted segment anything model for single domain generalization in medical image segmentation." Image and Vision Computing (2024). [paper] [2024.08] -
SAM-UNet: Sihan Yang, Haixia Bi, Hai Zhang, Jian Sun.
"SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images." ArXiv (2024). [paper] [code] [2024.08] -
LCE: Weiji Kong, Xun Gong, Juan Wang.
"LCE: A Framework for Explainability of DNNs for Ultrasound Image Based on Concept Discovery." ArXiv (2024). [paper] [2024.08] -
MM-SAM: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Naoto Yokoya, Shijian Lu.
"Segment Anything with Multiple Modalities." ArXiv (2024). [paper] [code] [2024.08] -
GoodSAM++: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang.
"GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation." ArXiv (2024). [paper] [2024.08] -
IAVVC: Jun Yan, Pengyu Wang, Danni Wang, Weiquan Huang, Daniel Watzenig, Huilin Yin.
"Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving." ArXiv (2024). [paper] [code] [2024.08] -
MVP-TIME: Feiyu Pan, Hao Fang, Runmin Cong, Wei Zhang, Xiankai Lu.
"Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track." ArXiv (2024). [paper] [2024.08] -
Surgical SAM 2: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin.
"Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning." ArXiv (2024). [paper] [code] [2024.08] -
SAM2-UNet: Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li.
"SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
Lin Zhao, Xiao Chen, Eric Z. Chen, Yikang Liu, Terrence Chen, Shanhui Sun.
"Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models." ArXiv (2024). [paper] [2024.08] -
OBMv2: Kai Li, Jingbo Chen, Yupeng Deng, Yu Meng, Diyou Liu, Junxian Ma, Chenhao Wang.
"Extracting polygonal footprints in off-nadir images with Segment Anything Model." ArXiv (2024). [paper] [code] [2024.08] -
MC-SAM SEG: Linghao Zheng, Xinyang Pu, Feng Xu.
"Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation." ArXiv (2024). [paper] [2024.08] -
Sanchayan Vivekananthan.
"Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion." ArXiv (2024). [paper] [2024.08] -
DoRL: Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan.
"Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification." ArXiv (2024). [paper] [code] [2024.08] -
CWMV: Pratik Vora, Sudipan Saha.
"Segment Using Just One Example." ArXiv (2024). [paper] [2024.08] -
S-SAM: Jay N. Paranjape, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel.
"S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation." MICCAI (2024). [paper] [code] [2024.08] -
Osher Rafaeli, Tal Svoray, Ariel Nahlieli.
"Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2." ArXiv (2024). [paper] [2024.08] -
BC-SAM: Yongcheng Li, Lingcong Cai, Ying Lu, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan.
"Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Segment Anything Model." ArXiv (2024). [paper] [code] [2024.08] -
Tahir Ahmad, Sudipan Saha.
"Specialized Change Detection using Segment Anything." ArXiv (2024). [paper] [2024.08] -
Multi SAM 50: Sophia J. Abraham, Jin Huang, Brandon RichardWebster, Michael Milford, Jonathan D. Hauenstein, Walter Scheirer.
"Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms." ArXiv (2024). [paper] [2024.08] -
SAM-FNet: Jia Wei, Yun Li, Meiyu Qiu, Hongyu Chen, Xiaomao Fan, Wenbin Lei.
"SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection." ArXiv (2024). [paper] [code] [2024.08] -
SAM-FNet: Jia Wei, Yun Li, Meiyu Qiu, Hongyu Chen, Xiaomao Fan, Wenbin Lei.
"SAM-FNet: SAM-Guided Fusion Network for Laryngo-Pharyngeal Tumor Detection." ArXiv (2024). [paper] [code] [2024.08] -
Athulya Sundaresan Geetha, Muhammad Hussain.
"From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model." ArXiv (2024). [paper] [2024.08] -
Yosuke Yamagishi, Shouhei Hanaoka, Tomohiro Kikuchi, Takahiro Nakao, Yuta Nakamura, Yukihiro Nomura, Soichiro Miki, Takeharu Yoshikawa, Osamu Abe.
"Zero-shot 3D Segmentation of Abdominal Organs in CT Scans Using Segment Anything Model 2: Adapting Video Tracking Capabilities for 3D Medical Imaging." ArXiv (2024). [paper] [2024.08] -
MCA-SAM: Ke Zhou, Zhongwei Qiu, Dongmei Fu.
"Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes." ArXiv (2024). [paper] [2024.08] -
Polyp SAM 2: Mobina Mansoori, Sajjad Shahabodini, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi.
"Polyp SAM 2: Advancing Zero shot Polyp Segmentation in Colorectal Cancer Detection." ArXiv (2024). [paper] [code] [2024.08] -
One-shot-IRSTS: Bingbing Dan, Meihui Li, Tao Tang, Jing Zhang.
"One Shot is Enough for Sequential Infrared Small Target Segmentation." ArXiv (2024). [paper] [code] [2024.08] -
Andrew Seohwan Yu, Mohsen Hariri, Xuecen Zhang, Mingrui Yang, Vipin Chaudhary, Xiaojuan Li.
"Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2." ArXiv (2024). [paper] [2024.08] -
DuneSA: Lu A, Jiang Z, Wu Z, et al.
"DuneSA: A SAM-based Approach with Domain-Specific Knowledge for Aeolian Dune Segmentation." GoodIT (2024). [paper] [2024.08] -
Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren.
"SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation." ArXiv (2024). [paper] [2024.08] -
SAM2-Adapter: Tianrun Chen, Ankang Lu, Lanyun Zhu, Chaotao Ding, Chunan Yu, Deyi Ji, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang.
"SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More." ArXiv (2024). [paper] [code] [2024.08] -
MDSAM: Shixuan Gao, Pingping Zhang, Tianyu Yan, Huchuan Lu.
"Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection." ACM MM (2024). [paper] [code] [2024.08] -
Sourya Sengupta, Satrajit Chakrabarty, Ravi Soni.
"Is SAM 2 Better than SAM in Medical Image Segmentation?." ArXiv (2024). [paper] [2024.08] -
PaveCap: Blessing Agyei Kyem, Eugene Kofi Okrah Denteh, Joshua Kofi Asamoah, Armstrong Aboah.
"PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation." ArXiv (2024). [paper] [2024.08] -
Yiqing Shen, Hao Ding, Xinyuan Shao, Mathias Unberath.
"Performance and Non-adversarial Robustness of the Segment Anything Model 2 in Surgical Video Segmentation." ArXiv (2024). [paper] [2024.08] -
SAM2-PATH: Mingya Zhang, Liang Wang, Limei Gu, Zhao Li, Yaohui Wang, Tingshen Ling, Xianping Tao.
"SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology." ArXiv (2024). [paper] [code] [2024.08] -
MedSAM: Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang.
"Segment Anything in Medical Images and Videos: Benchmark and Deployment." ArXiv (2024). [paper] [code] [2024.08] -
BioSAM 2: Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun.
"Biomedical SAM 2: Segment Anything in Biomedical Images and Videos." ArXiv (2024). [paper] [2024.08] -
UnderwaterSAM2Eval: Shijie Lian, Hua Li.
"Evaluation of Segment Anything Model 2: The Role of SAM2 in the Underwater Environment." ArXiv (2024). [paper] [code] [2024.08] -
SAM_2_Medical_3D: Chuyun Shen, Wenhao Li, Yuhang Shi, Xiangfeng Wang.
"Interactive 3D Medical Image Segmentation with SAM 2." ArXiv (2024). [paper] [code] [2024.08] -
PromptSAM+: Xingyuan Wei, Yichen Liu, Ce Li, Ning Li, Degang Sun, Yan Wang.
"PromptSAM+: Malware Detection based on Prompt Segment Anything Model." ArXiv (2024). [paper] [2024.08] -
PanicleNeRF: Xin Yang1,2 , Xuqi Lu1,2 , Pengyao Xie1,2 , Ziyue Guo1,2 , Hui Fang1 , Haowei Fu3 , Xiaochun Hu4 , Zhenbiao Sun4 , Haiyan Cen1,2*.
"PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone." ArXiv (2024). [paper] [2024.08] -
TS-SAM: Yang Yu, Chen Xu, Kai Wang.
"TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks." ArXiv (2024). [paper] [code] [2024.08] -
CUB FG: Yuetian Wang, Wenjin Hou, Qinmu Peng, Xinge You.
"What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks." ArXiv (2024). [paper] [2024.08] -
Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble.
"Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2." ArXiv (2024). [paper] [project] [2024.08] -
MedSAM-2: Jiayuan Zhu, Yunli Qi, Junde Wu.
"Medical SAM 2: Segment medical images as video via Segment Anything Model 2." ArXiv (2024). [paper] [code] [2024.08] -
SnapSeg: Yu N, Cai¹ Z, Huang Y, et al.
"SnapSeg: Training-Free Few-Shot Medical Image Segmentation with Segment Anything Model." Trustworthy Artificial Intelligence for Healthcare: Second International Workshop (2024). [paper] [2024.08] -
Feng Liu, Qinlong Zhang, Weijie Zhang, Deqiang Cheng, Feng Zhang, Yating Deng, Guanzhen Yu.
"An analysis on efficacy of applying β-elemene intervention on chemically -induced tongue lesions using SAM algorithm." Anatomia, Histologia, Embryologia (2024). [paper] [2024.08] -
MSP-MVS: Zhenlong Yuan, Cong Liu, Fei Shen, Zhaoxin Li, Tianlu Mao, Zhaoqi Wang.
"MSP-MVS: Multi-granularity Segmentation Prior Guided Multi-View Stereo." AAAI (2024). [paper] [2024.08] -
Sae-Jin Park, Cheonghwa Lee, Kisu Ok and Sung-Hoon Ahn.
"Enhancing Aviation Safety: An Automated System for FOD Detection and Removal in Support Vehicle Tires." AIAA Aviation Forum and ASCEND (2024). [paper] [2024.08] -
Theia: Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant.
"Theia: Distilling Diverse Vision Foundation Models for Robot Learning." ArXiv (2024). [paper] [code] [2024.08] -
Haoyu Dong, Hanxue Gu, Yaqian Chen, Jichen Yang, Maciej A. Mazurowski.
"Segment anything model 2: an application to 2D and 3D medical images." ArXiv (2024). [paper] [2024.08] -
Xiaofeng Liu, Jonghye Woo, Chao Ma, Jinsong Ouyang, Georges El Fakhri.
"Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM." IEEE NSS and MIC (2024). [paper] [2024.08] -
DMESA: Yesheng Zhang, Xu Zhao.
"DMESA: Densely Matching Everything by Segmenting Anything." ArXiv (2024). [paper] [code] [2024.08] -
CC-SAM: Shreyank N Gowda, David A. Clifton.
"CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation." ECCV (2024). [paper] [2024.07] -
SAMCOD: Lv Tang, Bo Li.
"Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2." ArXiv (2024). [paper] [code] [2024.07] -
FLAP-SAM: Mothilal Asokan, Joseph Geo Benjamin, Mohammad Yaqub, Karthik Nandakumar.
"A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation." ArXiv (2024). [paper] [code] [2024.07] -
RoBox-SAM: Yuhao Huang, Xin Yang, Han Zhou, Yan Cao, Haoran Dou, Fajin Dong, Dong Ni.
"Robust Box Prompt based SAM for Medical Image Segmentation." MICCAI MLMI (2024). [paper] [2024.07] -
Pascal Spiegler, Amirhossein Rasoulian, Yiming Xiao.
"Weakly Supervised Intracranial Hemorrhage Segmentation with YOLO and an Uncertainty Rectified Segment Anything Model." ArXiv (2024). [paper] [2024.07] -
ICH: Pascal Spiegler, Amirhossein Rasoulian, Yiming Xiao.
"Uncertainty-Rectified YOLO-SAM for Weakly Supervised ICH Segmentation." SWITCH (2024). [paper] [2024.07] -
ASI-Seg: Zhen Chen, Zongming Zhang, Wenwu Guo, Xingjian Luo, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu.
"ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding." IROS (2024). [paper] [code] [2024.07] -
Library Dataset: Artemis Llabrés, Arka Ujjal Dey, Dimosthenis Karatzas, Ernest Valveny.
"Image-text matching for large-scale book collections." ArXiv (2024). [paper] [code] [2024.07] -
AdaCLIP: Yunkang Cao, Jiangning Zhang, Luca Frittoli, Yuqi Cheng, Weiming Shen, Giacomo Boracchi.
"AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection." ECCV (2024). [paper] [code] [2024.07] -
VDST-Net: Guiqiu Liao, Matjaz Jogan, Sai Koushik, Eric Eaton, Daniel A. Hashimoto.
"Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video." ArXiv (2024). [paper] [2024.07] -
PixLabelCV: Dominik Schraml, et al.
"PixLabelCV - Labeling images for semantic segmentation fast, pixel-precise and offline." WSCG (2024). [paper] [2024.07] -
ESOD: Kai Liu, Zhihang Fu, Sheng Jin, Ze Chen, Fan Zhou, Rongxin Jiang, Yaowu Chen, Jieping Ye.
"ESOD: Efficient Small Object Detection on High-Resolution Images." ArXiv (2024). [paper] [2024.07] -
Ren Z, Zhang Y, Wang S.
"Large Foundation Model for Cancer Segmentation." Technology in Cancer Research & Treatment (2024). [paper] [2024.07] -
Guo Y, Xu Y, Cui H, Dang M, Li S.
"Segment anything model-based crack segmentation using low-rank adaption fine-tuning." Structural Health Monitoring (2024). [paper] [2024.07] -
SSTD: Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Enhai Liu, Rujin Zhao.
"SSTD: Stripe-Like Space Target Detection using Single-Point Supervision." ArXiv (2024). [paper] [2024.07] -
KneeSegmentWithSAM: Yaxi Chen, Aleksandra Ivanova, Shaheer U. Saeed, Rikin Hargunani, Jie Huang, Chaozong Liu, Yipeng Hu.
"Segmentation by registration-enabled SAM prompt engineering using five reference images." WBIR (2024). [paper] [code] [2024.07] -
SAM-MIL: Heng Fang, Sheng Huang, Wenhao Tang, Luwen Huangfu, Bo Liu.
"SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification." ACM MM (2024). [paper] [code] [2024.07] -
SAM-CP: Pengfei Chen, Lingxi Xie, Xinyue Huo, Xuehui Yu, Xiaopeng Zhang, Yingfei Sun, Zhenjun Han, Qi Tian.
"SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation." ArXiv (2024). [paper] [code] [2024.07] -
Jiyeop Kim, Jongwoo Lim.
"Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance." ArXiv (2024). [paper] [2024.07] -
SAM2CLIP2SAM: Dimitrios Kollias, Anastasios Arsenos, James Wingate, Stefanos Kollias.
"SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection." ArXiv (2024). [paper] [2024.07] -
MedSAGa: Navyansh Mahla, Annie D'souza, Shubh Gupta, Bhavik Kanekar, Kshitij Sharad Jadhav.
"MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM." ArXiv (2024). [paper] [2024.07] -
ESP-MedSAM: Qing Xu, Jiaxuan Li, Xiangjian He, Ziyu Liu, Zhen Chen, Wenting Duan, Chenxin Li, Maggie M. He, Fiseha B. Tesema, Wooi P. Cheah, Yi Wang, Rong Qu, Jonathan M. Garibaldi.
"ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation." IEEE TMI (2024). [paper] [2024.07] -
Semantic-CC: Yongshuo Zhu, Lu Li, Keyan Chen, Chenyang Liu, Fugen Zhou, Zhenwei Shi.
"Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance." ArXiv (2024). [paper] [2024.07] -
Seismic Fault SAM: Ran Chen, Zeren Zhang, Jinwen Ma.
"Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection." ArXiv (2024). [paper] [2024.07] -
VISA: Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves.
"VISA: Reasoning Video Object Segmentation via Large Language Models." ArXiv (2024). [paper] [code] [2024.07] -
G-SAM: Xiaoxiao Liu, Yan Zhao, Shigang Wang & Jian Wei .
"G-SAM: GMM-based segment anything model for medical image classification and segmentation." Cluster Comput (2024). [paper] [2024.07] -
UniFSS: Shijie Chang, Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu.
"Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation." ArXiv (2024). [paper] [2024.07] -
TeethDreamer: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui.
"TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs." MICCAI (2024). [paper] [code] [2024.07] -
Corrosion SAM: Chengzhang Chai, Yan Gao, Haijiang Li, Xiaofeng Zhu.
"Corrosion SAM: Adapting Segment Anything Model with Parameter-Efficient Fine-Tuning for Structural Corrosion Inspection." ArXiv (2024). [paper] [2024.07] -
PartImageNet++: Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu.
"PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition." ECCV (2024). [paper] [code] [2024.07] -
TeSO: Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu.
"Can Textual Semantics Mitigate Sounding Object Segmentation Preference?." ECCV (2024). [paper] [code] [2024.07] -
FoodMem: Ahmad AlMughrabi, Adrián Galán, Ricardo Marques, Petia Radeva.
"FoodMem: Near Real-time and Precise Food Video Segmentation." ArXiv (2024). [paper] [2024.07] -
SeFi-CD: Ling Zhao, Zhenyang Huang, Dongsheng Kuang, Chengli Peng, Jun Gan, Haifeng Li.
"SeFi-CD: A Semantic First Change Detection Paradigm That Can Detect Any Change You Want." ArXiv (2024). [paper] [2024.07] -
Jaime Duque-Domingo, et al.
"Segmentaci ́on sem ́antica bajo paradigmaone-shot learning utilizando SAM y CP-CVV." Visión por Computador (2024). [paper] [2024.07] -
DriveSAM: K Kwakye, Y Seong, S Yi, A Aboah.
"DriveSAM: Cognitive Perspective on Driving Maneuvers Based on Drivers’ Attention Using Eye Gaze Data." IEOM International Conference on Smart Mobility and Vehicle Electrification (2024). [paper] [2024.07] -
Swiss DINO: Kirill Paramonov, Jia-Xing Zhong, Umberto Michieli, Jijoong Moon, Mete Ozay.
"Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search." IROS (2024). [paper] [code] [2024.07] -
ECoT: Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine.
"Robotic Control via Embodied Chain-of-Thought Reasoning." ArXiv (2024). [paper] [code] [2024.07] -
CLIPtra: Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su.
"Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation." ECCV (2024). [paper] [code] [2024.07] -
PaveSAM: Neema Jakisa Owor, Yaw Adu-Gyamfi, Armstrong Aboah and Mark Amo-Boateng.
"PaveSAM – segment anything for pavement distress." RMPD (2024). [paper] [2024.07] -
UCE: Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong.
"ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction." ECCV (2024). [paper] [code] [2024.07] -
Pseudo-RIS: Seonghoon Yu, Paul Hongsuck Seo, Jeany Son.
"Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation." ECCV (2024). [paper] [code] [2024.07] -
MeshSegmenter: Ziming Zhong, Yanxu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao.
"MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis." ECCV (2024). [paper] [code] [2024.07] -
SOSS: Mario Francisco Munoz, Hoang Vu Huy, Thanh-Dung Le.
"Hybrid Deep Learning-Based for Enhanced Occlusion Segmentation in PICU Patient Monitoring." ArXiv (2024). [paper] [2024.07] -
Xingyue Zhao, Peiqi Li, Xiangde Luo, Meng Yang, Shi Chang, Zhongyu Li.
"SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching." ISBI (2024). [paper] [2024.07] -
Fryderyk Kögl, Anna Reithmeir, Vasiliki Sideri-Lampretsa, Ines Machado, Rickmer Braren, Daniel Rückert, Julia A. Schnabel, Veronika A. Zimmer.
"General Vision Encoder Features as Guidance in Medical Image Registration." WBIR MICCAI (2024). [paper] [code] [2024.07] -
DSAM: Zhenni Yu, Xiaoqin Zhang, Li Zhao, Yi Bin, Guobao Xiao.
"Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection." ACM MM (2024). [paper] [code] [2024.07] -
OMG-Net: Zhuoyan Shen, Mikael Simard, Douglas Brand, Vanghelita Andrei, Ali Al-Khader, Fatine Oumlil, Katherine Trevers, Thomas Butters, Simon Haefliger, Eleanna Kara, Fernanda Amary, Roberto Tirabosco, Paul Cool, Gary Royle, Maria A. Hawkins, Adrienne M. Flanagan, Charles-Antoine Collins Fekete.
"OMG-Net: A Deep Learning Framework Deploying Segment Anything to Detect Pan-Cancer Mitotic Figures from Haematoxylin and Eosin-Stained Slides." ArXiv (2024). [paper] [2024.07] -
FastSAM-3DSlicer: Yiqing Shen, Xinyuan Shao, Blanca Inigo Romillo, David Dreizin, Mathias Unberath.
"FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification." ArXiv (2024). [paper] [code] [2024.07] -
Crowd-SAM: Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang.
"Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes." ECCV (2024). [paper] [code] [2024.07] -
SLF: Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo.
"Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts." ECCV (2024). [paper] [2024.07] -
Yunya Gao.
"Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations." ArXiv (2024). [paper] [2024.07] -
WPS-SAM: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu.
"WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models." ECCV (2024). [paper] [code] [2024.07] -
Lite-SAM: Jianhai Fu, Yuanjie Yu, Ningchuan Li, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang.
"Lite-SAM Is Actually What You Need for Segment Everything." ECCV (2024). [paper] [2024.07] -
WSESeg: Robin Schön, Daniel Kienzle, Rainer Lienhart.
"WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation." CBMI (2024). [paper] [2024.07] -
RAT: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Zhou, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu.
"Region Attention Transformer for Medical Image Restoration." MICCAI (2024). [paper] [code] [2024.07] -
RveRNet: Seonwhee Jin.
"Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear." ArXiv (2024). [paper] [code] [2024.07] -
CACP: Qiushi Guo.
"Enrich the content of the image Using Context-Aware Copy Paste." ArXiv (2024). [paper] [2024.07] -
PRISM-placenta: Hao Li, Baris Oguz, Gabriel Arenas, Xing Yao, Jiacheng Wang, Alison Pouch, Brett Byram, Nadav Schwartz, Ipek Oguz.
"Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound images." ArXiv (2024). [paper] [code] [2024.07] -
IRSAM: Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang.
"IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection." ECCV (2024). [paper] [code] [2024.07] -
ProtoSAM: Lev Ayzenberg, Raja Giryes, Hayit Greenspan.
"ProtoSAM - One Shot Medical Image Segmentation With Foundational Models." ArXiv (2024). [paper] [2024.07] -
CycleSAM: Aditya Murali, Pietro Mascagni, Didier Mutter, Nicolas Padoy.
"CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM." ArXiv (2024). [paper] [2024.07] -
EWMA: Ahmed Maged, Herman Shen.
"Unsupervised Fault Detection using SAM with a Moving Window Approach." ArXiv (2024). [paper] [2024.07] -
DiffPNG: Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji.
"Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model." ECCV (2024). [paper] [code] [2024.07] -
AO-Planner: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Xiaodan Liang, Kwan-Yee K. Wong.
"Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation." ArXiv (2024). [paper] [2024.07] -
MBA-Net: Yifan Gao, Wei Xia, Wenkui Wang, Xin Gao.
"MBA-Net: SAM-driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation." MICCAI (2024). [paper] [2024.07] -
SAM-TAPIR: Athena Psalta, Vasileios Tsironis, Andreas El Saer, Konstantinos Karantzalos.
"Addressing single object tracking in satellite imagery through prompt-engineered solutions." IGARSS (2024). [paper] [2024.07] -
CPC-SAM: Juzheng Miao, Cheng Chen, Keli Zhang, Jie Chuai, Quanzheng Li, Pheng-Ann Heng.
"Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation." MICCAI (2024). [paper] [code] [2024.07] -
SAM-Med3D-MoE: Guoan Wang, Jin Ye, Junlong Cheng, Tianbin Li, Zhaolin Chen, Jianfei Cai, Junjun He, Bohan Zhuang.
"SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation." ArXiv (2024). [paper] [2024.07] -
Longfei Huang, Feng Yu, Zhihao Guan, Zhonghua Wan, Yang Yang.
"The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge." ArXiv (2024). [paper] [2024.07] -
Xudong Ma, Yuqi Zhang, Chenchong Wang, Wei Xu.
"Revolutionizing Alloy Microstructure Segmentation through SAM and Domain Knowledge without Extra Training." ArXiv (2024). [paper] [2024.07] -
SA4D: Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang.
"Segment Any 4D Gaussians." ArXiv (2024). [paper] [code] [2024.07] -
Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar.
"SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images." WACV (2024). [paper] [2024.07] -
JFS: Seonghyeon Moon, Haein Kong, Muhammad Haris Khan.
"Success or Failure? Analyzing Segmentation Refinement with Few-Shot Segmentation." ArXiv (2024). [paper] [2024.07] -
CS3: Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan.
"CS3: Cascade SAM for Sperm Segmentation." MICCAI (2024). [paper] [2024.07] -
OneSAM: Khanh-Binh Nguyen, Chae Jung Park.
"OneSAM: One model for segment anything model in medical images on Laptop." ArXiv (2024). [paper] [2024.07] -
Zdravko Marinov, Alexander Jaus, Jens Kleesiek, Rainer Stiefelhagen.
"Filters, Thresholds, and Geodesic Distances for Scribble-based Interactive Segmentation of Medical Images." CVPR Workshop (2024). [paper] [2024.07] -
Zdravko Marinov, Alexander Jaus, Jens Kleesiek, Rainer Stiefelhagen.
"Taking a Step Back: Revisiting Classical Approaches for Efficient Interactive Segmentation of Medical Images." CVPR Workshop (2024). [paper] [2024.07] -
SONGXIAO YANG, Yizhou Li, Ye Chen, Zhuofeng Wu, Masatoshi Okutomi.
"A Light-weight Universal Medical Segmentation Network for Laptops Based on Knowledge Distillation." ArXiv (2024). [paper] [2024.07] -
Raphael Stock, Yannick Kirchhoff, Maximilian Rouven Rokuss, Ashis Ravindran, Klaus Maier-Hein.
"Segment Anything in Medical Images with nnUNet." CVPR Workshop (2024). [paper] [2024.07] -
QMedSAM: Haisheng Lu, Yujie Fu, Fan Zhang, Le Zhang.
"Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deployment." ArXiv (2024). [paper] [code] [2024.07] -
RepMedSAM: Zehan Zhang, Rui Huang, Ning Huang.
"RepMedSAM: Segment Anything in Medical Images with Lightweight CNN." CVPR Workshop (2024). [paper] [2024.07] -
DAFT: Alexander Tobias Pfefferle, Lennart Purucker, Frank Hutter.
"DAFT: Data-Aware Fine-Tuning of Foundation Models for Efficient and Effective Medical Image Segmentation." CVPR Workshop (2024). [paper] [2024.07] -
Swin-LiteMedSAM:: Ruochen Gao, Donghang Lyu.
"Swin-LiteMedSAM: A Lightweight Mulitple-Prompt-Based Segmentation Model for Large-Scale Medical Image Datasets." CVPR Workshop (2024). [paper] [code] [2024.07] -
Haotian Guan, Bingze Dai, Jiajing Zhang.
"Lite Class-prompt Tiny-VIT for Multi-Modality Medical Image Segmentation." CVPR Workshop (2024). [paper] [2024.07] -
Zhi Li, YAQI WANG.
"ExpertsMedSAM: Faster Medical Image Segment Anything with Mixture-of-Experts." ArXiv (2024). [paper] [2024.07] -
GraysAnatomySAM: YoungHwan Choi, In Kyu Lee, Jonghoe Ku.
"Gray’s Anatomy for Segment Anything Model: Optimizing Grayscale Medical Images for Fast and Lightweight Segmentation." CVPR Workshop (2024). [paper] [code] [2024.07] -
Wentao Liu, weijin xu, Ruifeng Bian, Haoyuan Li, Tong Tian.
"LiteMedSAM with Low-Rank Adaptation and Multi-Box Efficient Inference for Medical Image Segmentation." ArXiv (2024). [paper] [2024.07] -
Lei Yu.
"Efficient and Robust Medical Image Segmentation Using Lightweight ViT-Tiny based SAM and Model Quantization." ArXiv (2024). [paper] [2024.07] -
Rep-MedSAM: Muxin Wei, Shuqing Chen, Silin Wu, Dabin Xu.
"Rep-MedSAM: Towards Real-time and Universal Medical Image Segmentation." CVPR Workshop (2024). [paper] [2024.07] -
RepViT-MedSAM: Muhammad Qasim Ali, Alexander Wong, Yuhao Chen.
"RepViT-MedSAM: Efficient Segment Anything in the Medical Images." CVPR Workshop (2024). [paper] [code] [2024.07] -
GBMSeg: Xueyu Liu, Guangze Shi, Rui Wang, Yexin Lai, Jianan Zhang, Lele Sun, Quan Yang, Yongfei Wu, MIng Li, Weixia Han, Wen Zheng.
"Feature-prompting GBMSeg: One-Shot Reference Guided Training-Free Prompt Engineering for Glomerular Basement Membrane Segmentation." MICCAI (2024). [paper] [code] [2024.07] -
MedficientSAM: Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Trong-Hieu Nguyen-Mau, Hai-Dang Nguyen, Minh-Triet Tran.
"MedficientSAM: A Robust Medical Segmentation Model with Optimized Inference Pipeline for Limited Clinical Settings." CVPR Workshop (2024). [paper] [code] [2024.07] -
AS-OCT: Boyu Chen, Ameenat L. Solebo, Paul Taylor.
"Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images." ArXiv (2024). [paper] [code] [2024.07] -
YOLOv8n-DDA-SAM: Zhang, Gengming, Hao Cao, Yangwen Jin, Yi Zhong, Anbang Zhao, Xiangjun Zou, and Hongjun Wang.
"YOLOv8n-DDA-SAM: Accurate Cutting-Point Estimation for Robotic Cherry-Tomato Harvesting." Agriculture (2024). [paper] [2024.07] -
MMRo: Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang.
"MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?." ArXiv (2024). [paper] [code] [2024.07] -
OTVP: Takayuki Nishimura, Katsuyuki Kuyo, Motonari Kambara, Komei Sugiura.
"Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models." IROS (2024). [paper] [2024.07] -
MST_MIXER: Adnen Abdessaied, Lei Shi, Andreas Bulling.
"Multi-Modal Video Dialog State Tracking in the Wild." ECCV (2024). [paper] [code] [2024.07] -
MMedAgent: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang.
"MMedAgent: Learning to Use Medical Tools with Multi-modal Agent." ArXiv (2024). [paper] [2024.07] -
U-SAM: Shouhong Wan, hantao zhang, Weidong Guo et al.
"Tuning Vision Foundation Models for Rectal Cancer Segmentation from CT Scans: Development and Validation of U-SAM." ArXiv (2024). [paper] [2024.07] -
DisFormer: Sanket Gandhi, Atul, Samanyu Mahajan, Vishal Sharma, Rushil Gupta, Arnab Kumar Mondal, Parag Singla.
"Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers." ArXiv (2024). [paper] [2024.07] -
BACON: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng.
"BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations." ArXiv (2024). [paper] [code] [2024.07] -
CADe: Furqan Shaukat, Syed Muhammad Anwar, Abhijeet Parida, Van Khanh Lam, Marius George Linguraru, Mubarak Shah.
"Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images." ArXiv (2024). [paper] [2024.07] -
ISAMS: Katja Löwenstein, Johanna Rehrl, Anja Schuster, Michael Gadermayr.
"Virtually Objective Quantification of in vitro Wound Healing Scratch Assays with the Segment Anything Model." ArXiv (2024). [paper] [2024.07] -
HRSAM: You Huang, Wenbin Lai, Jiayi Ji, Liujuan Cao, Shengchuan Zhang, Rongrong Ji.
"HRSAM: Efficiently Segment Anything in High-Resolution Images." ArXiv (2024). [paper] [code] [2024.07] -
Label Anything: Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano.
"Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts." ArXiv (2024). [paper] [code] [2024.07] -
SAVE: Khanh-Binh Nguyen, Chae Jung Park.
"SAVE: Segment Audio-Visual Easy way using Segment Anything Model." ArXiv (2024). [paper] [2024.07] -
Pratyush Tripathy, Kathy Baylis, Kyle Wu, Jyles Watson, Ruizhe Jiang.
"Investigating the Segment Anything Foundation Model for Mapping Smallholder Agriculture Field Boundaries Without Training Labels." ArXiv (2024). [paper] [2024.07] -
MaskField: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Yuwei Guo, Shuyuan Yang.
"Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation." ArXiv (2024). [paper] [2024.07] -
ASPS: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han.
"ASPS: Augmented Segment Anything Model for Polyp Segmentation." MICCAI (2024). [paper] [code] [2024.07] -
Zongshuo Li, Ding Huo, Markus Meurer, Thomas Bergs.
"Efficient Cutting Tool Wear Segmentation Based on Segment Anything Model." MSEC (2024). [paper] [2024.07] -
HATs: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo.
"HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis." ArXiv (2024). [paper] [code] [2024.07] -
SolarSAM: Guohao Wang.
"SolarSAM: Building-scale Photovoltaic Potential Assessment Based on Segment Anything Model (SAM) and Remote Sensing for Emerging City." ArXiv (2024). [paper] [2024.07] -
Depth Anything V2: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
"Depth Anything V2." ArXiv (2024). [paper] [project] [code] [2024.06] -
UnSAM: XuDong Wang, Jingfeng Yang, Trevor Darrell.
"Segment Anything without Supervision." NeurIPS (2024). [paper] [code] [2024.06] -
EVF-SAM: Yuxuan Zhang, Tianheng Cheng, Rui Hu, ei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang.
"EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model." ArXiv (2024). [paper] [2024.06] -
Tianli Liao, Ce Wang, Lei Li, Guangen Liu, Nan Li.
"Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography Warping." ArXiv (2024). [paper] [code] [2024.06] -
RWKV-SAM: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy.
"Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model." ArXiv (2024). [paper] [code] [2024.06] -
REC: Fuseini Mumuni, Alhassan Mumuni.
"Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO." ArXiv (2024). [paper] [2024.06] -
UNest: Vu Minh Hieu Phan, Yutong Xie, Bowen Zhang, Yuankai Qi, Zhibin Liao, Antonios Perperidis, Son Lam Phung, Johan W. Verjans, Minh-Son To.
"Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis." MICCAI (2024). [paper] [code] [2024.06] -
Qiushi Guo.
"A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow." ArXiv (2024). [paper] [2024.06] -
Mingxiao Huo, Pengliang Ji, Haotian Lin, Junchen Liu, Yixiao Wang, Yijun Chen.
"Composition Vision-Language Understanding via Segment and Depth Anything Model." ArXiv (2024). [paper] [code] [2024.06] -
D2GPLan: Jialun Pei, Ruize Cui, Yaoqian Li, Weixin Si, Jing Qin, Pheng-Ann Heng.
"Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection." MICCAI (2024). [paper] [code] [2024.06] -
Point-SAM: Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su.
"Point-SAM: Promptable 3D Segmentation Model for Point Clouds." ArXiv (2024). [paper] [code] [2024.06] -
GIM: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu.
"GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization." ArXiv (2024). [paper] [code] [2024.06] -
TP-DRSeg: Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge.
"TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM." ArXiv (2024). [paper] [code] [2024.06] -
SAM-EG: Quoc-Huy Trinh, Hai-Dang Nguyen, Bao-Tram Nguyen Ngoc, Debesh Jha, Ulas Bagci, Minh-Triet Tran.
"SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation." ArXiv (2024). [paper] [2024.06] -
TraceNet: Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt.
"TraceNet: Segment one thing efficiently." ArXiv (2024). [paper] [2024.06] -
MUTR: Bin Cao, Yisi Zhang, Xuanxu Lin, Xingjian He, Bo Zhao, Jing Liu.
"2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation." ArXiv (2024). [paper] [2024.06] -
SSAD: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng.
"SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis." ArXiv (2024). [paper] [code] [2024.06] -
SF-CLIP: Sepehr Sameni, Kushal Kafle, Hao Tan, Simon Jenni.
"Building Vision-Language Models on Solid Foundations with Masked Distillation." CVPR (2024). [paper] [2024.06] -
LU-AVS: Chen Liu, Peike Patrick Li, Qingtao Yu, Hongwei Sheng, Dadong Wang, Lincheng Li, Xin Yu.
"Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos." CVPR (2024). [paper] [code] [2024.06] -
ROSA: Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi.
"Learning to Segment Referred Objects from Narrated Egocentric Videos." CVPR (2024). [paper] [2024.06] -
TSP-SAM: Wenjun Hui, Zhenfeng Zhu, Shuai Zheng, Yao Zhao.
"Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection." CVPR (2024). [paper] [2024.06] -
OV3D: Li Jiang, Shaoshuai Shi, Bernt Schiele.
"Open-Vocabulary 3D Semantic Segmentation with Foundation Models." CVPR (2024). [paper] [2024.06] -
FM-FSOD: Guangxing Han, Ser-Nam Lim.
"Few-Shot Object Detection with Foundation Models." CVPR (2024). [paper] [2024.06] -
OV-DAR: Keyan Chen, Xiaolong Jiang, Haochen Wang, Cilin Yan, Yan Gao, Xu Tang, Yao Hu & Weidi Xie.
"OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition." IJCV (2024). [paper] [2024.06] -
Yang Su, Shunquan Tan, Jiwu Huang.
"A Novel Universal Image Forensics Localization Model Based on Image Noise and Segment Anything Model." IH&MMSec (2024). [paper] [2024.06] -
SGF: Li, Guanlin and Zhao, Bin and Li, Xuelong.
"Low-light Image Enhancement with SAM-based Structure Priors and Guidance." TMM (2024). [paper] [code] [2024.06] -
Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Yajing Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang.
"An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation." MICCAI (2024). [paper] [2024.06] -
S2C: Hyeokjun Kweon, Kuk-Jin Yoon.
"From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation." CVPR (2024). [paper] [code] [2024.06] -
SAMAug-C: Pengfei Gu, Zihan Zhao, Hongxiao Wang, Yaopeng Peng, Yizhe Zhang, Nishchal Sapkota, Chaoli Wang, Danny Z. Chen.
"Boosting Medical Image Classification with Segmentation Foundation Model." ArXiv (2024). [paper] [2024.06] -
ALPS: Song Zhang, Qingzhong Wang, Junyi Liu, Haoyi Xiong.
"ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model." ArXiv (2024). [paper] [code] [2024.06] -
EBSeg: Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao.
"Open-Vocabulary Semantic Segmentation with Image Embedding Balancing." CVPR (2024). [paper] [code] [2024.06] -
RobustSAM: Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma, Jian Wang.
"RobustSAM: Segment Anything Robustly on Degraded Images." CVPR (2024). [paper] [project] [code] [2024.06] -
4M-21: Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir.
"4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities." ArXiv (2024). [paper] [code] [2024.06] -
ICE-G: Vishnu Jaganathan, Hannah Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira.
"ICE-G: Image Conditional Editing of 3D Gaussian Splats." CVPR AI4CC Workshop (2024). [paper] [code] [2024.06] -
APSeg: Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun.
"APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio." ArXiv (2024). [paper] [2024.06] -
RiVEG: Jinyuan Li, Ziyan Li, Han Li, Jianfei Yu, Rui Xia, Di Sun, Gang Pan.
"Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation." ArXiv (2024). [paper] [code] [2024.06] -
Marian Longa, João F. Henriques.
"Unsupervised Object Detection with Theoretical Guarantees." ArXiv (2024). [paper] [2024.06] -
ST-BAVA: Juhyeong Seon, Woobin Im, Sebin Lee, Jumin Lee, Sung-Eui Yoon.
"Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation." ICIP (2024). [paper] [2024.06] -
CRSTM: Xiaoli Wei, Zhaoqing Wang, Yandong Guo, Chunxia Zhang, Tongliang Liu, Mingming Gong.
"Training-Free Robust Interactive Video Object Segmentation." ArXiv (2024). [paper] [2024.06] -
USE: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren.
"USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation." ArXiv (2024). [paper] [2024.06] -
SAM-PM: Muhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha.
"SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention." ArXiv (2024). [paper] [code] [2024.06] -
Matthias Pijarowski, Alexander Wolpert, Martin Heckmann, Michael Teutsch.
"Utilizing grounded SAM for self-supervised frugal camouflaged human detection." Automatic Target Recognition XXXIV. SPIE(2024). [paper] [2024.06] -
ASDeM: Liu, Xiaohu and Luo, Yichuang and Sun, Wei.
"ASDeM: Augmenting SAM With Decoupled Memory for Video Object Segmentation." ACCESS (2024). [paper] [2024.06] -
LNDVI: Balasundaram, Ananthakrishnan and Sharma, Alabhya and Kumaravelan, Swaathy and Shaik, Ayesha and Kavitha, Muthu Subash.
"An Improved Normalized Difference Vegetation Index (NDVI) Estimation Using Grounded Dino and Segment Anything Model for Plant Health Classification." ACCESS (2024). [paper] [2024.06] -
PDM: Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik.
"Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval." ArXiv (2024). [paper] [2024.06] -
LeSAM: Gu, Yunbo and Wu, Qianyu and Tang, Hui and Mai, Xiaoli and Shu, Huazhong and Li, Baosheng and Chen, Yang.
"LeSAM: Adapt Segment Anything Model for medical lesion segmentation." JBHI (2024). [paper] [2024.06] -
DF2LCZ-Net: Qianqian Wu, Xianping Ma, Jialu Sui and Man-On Pun.
"A SAM-EMPOWERED DUAL-STREAM FRAMEWORK FOR SCENE-LEVEL LOCAL CLIMATE ZONE CLASSIFICATION USING GOOGLE EARTH AND SENTINEL IMAGES." IGARSS (2024). [paper] [2024.06] -
U-SAM: Zhang, Hantao and Guo, Weidong and Wan, Shouhong and Zou, Bingbing and Wang, Wanqin and Qiu, Chenyang and Liu, Kaige and Jin, Peiquan and Yang, Jiancheng.
"Deep-Learning-Assisted Segmentation of Rectal Cancer from CT Scans: Development and Validation of U-SAM." Available at SSRN (2024). [paper] [2024.06] -
SAM-PR: Ricardo Montoya-del-Angel, Marawan Elbatel, Joel Vidal, Robert Martí.
"SAM-PR: enhancing 3D automated breast ultrasound imaging segmentation with probabilistic refinement of SAM." IWBI (2024). [paper] [2024.06] -
Yunho Kim, Jeong Hyun Lee, Choongin Lee, Juhyeok Mun, Donghoon Youm, Jeongsoo Park, Jemin Hwangbo.
"Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy." ArXiv (2024). [paper] [2024.06] -
Heather Doig, Oscar Pizarro, Jacquomo Monk, Stefan Williams.
"Detecting Endangered Marine Species in Autonomous Underwater Vehicle Imagery Using Point Annotations and Few-Shot Learning." IROS (2024). [paper] [2024.06] -
VCP: Kuang, Senyun and Liu, Yang and Wang, Xin and Qu, Xiaobo and Wei, Yintao.
"An Universal Crack Detection Framework for Intelligent Road-Perceptive Vehicles." TIV (2024). [paper] [2024.06] -
SCD-SAM: Mei, Liye and Ye, Zhaoyi and Xu, Chuan and Wang, Hongzhu and Wang, Ying and Lei, Cheng and Yang, Wei and Li, Yansheng.
"SCD-SAM: Adapting Segment Anything Model for Semantic Change Detection in Remote Sensing Imagery." TGRS (2024). [paper] [code] [2024.06] -
MASA: Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu.
"Matching Anything by Segmenting Anything." CVPR (2024). [paper] [project] [code] [2024.06] -
Immunocto: Mikaël Simard, Zhuoyan Shen, Maria A. Hawkins, Charles-Antoine Collins-Fekete.
"Immunocto: a massive immune cell database auto-generated for histopathology." ArXiv (2024). [paper] [code] [2024.06] -
OpenGaussian: Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang.
"OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding." ArXiv (2024). [paper] [code] [2024.06] -
Open-YOLO 3D: Mohamed El Amine Boudjoghra, Angela Dai, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan.
"Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation." ArXiv (2024). [paper] [code] [2024.06] -
FastLGS: Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan.
"FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping." ArXiv (2024). [paper] [code] [2024.06] -
Yang Nan, Guang Yang.
"Deep asymmetric mixture model for unsupervised cell segmentation." ArXiv (2024). [paper] [2024.06] -
SemiRES: Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji.
"SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation." ICML (2024). [paper] [code] [2024.06] -
AlignSAM: Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li.
"AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning." CVPR (2024). [paper] [code] [2024.06] -
AuxOL: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang.
"Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.06] -
SimSAM: Benjamin Towle, Xin Chen, Ke Zhou.
"SimSAM: Zero-shot Medical Image Segmentation via Simulated Interaction." ISBI (2024). [paper] [code] [2024.06] -
SAM-LAD: Yun Peng, Xiao Lin, Nachuan Ma, Jiayuan Du, Chuangwei Liu, Chengju Liu, Qijun Chen.
"SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection." ArXiv (2024). [paper] [2024.06] -
Jimmy Xuekai Li, Tiancheng Zhang, Yiran Zhu, Zhongwei Chen.
"Artificial General Intelligence (AGI) for the oil and gas industry: a review." ArXiv (2024). [paper] [2024.06] -
SAM-VMNet: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao.
"SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation." ArXiv (2024). [paper] [2024.06] -
DISAM: Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang.
"Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts." ICLR (2024). [paper] [code] [2024.05] -
SAM-E: Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li.
"SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation." ICML (2024). [paper] [project] [code] [2024.05] -
Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas.
"View-Consistent Hierarchical 3D SegmentationUsing Ultrametric Feature Fields." ArXiv (2024). [paper] [code] [2024.05] -
FMARS: Edoardo Arnaudo, Jacopo Lungo Vaschetti, Lorenzo Innocenti, Luca Barco, Davide Lisi, Vanina Fissore, Claudio Rossi.
"FMARS: Annotating Remote Sensing Images for Disaster Management using Foundation Models." IGARSS (2024). [paper] [code] [2024.05] -
Finetuned-SAM: FNU Shivam, Megan Leight, Mary Kate Kelly, Claire Davis, Kelsey Clodfelter, Jacob Thrasher, Yenumula Reddy, Prashnna Gyawali.
"Segmentation of Maya hieroglyphs through fine-tuned foundation models." ArXiv (2024). [paper] [2024.05] -
Qi Zhang, Guanyu Xing, Jianwei Zhang, Yanli Liu.
"Adaptive active contours driven by the squared Hellinger distance and local correlation features for inhomogeneous image segmentation." Multimed Tools Appl (2024). [paper] [2024.05] -
FocSAM: You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji.
"FocSAM: Delving Deeply into Focused Objects in Segmenting Anything." CVPR (2024). [paper] [code] [2024.05] -
Reasoning3D: Tianrun Chen, Chunan Yu, Jing Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji, Yong Zhang, Ying Zang, Zejian Li, Lingyun Sun.
"Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models." ArXiv (2024). [paper] [code] [2024.05] -
Aditya Gunturu, Yi Wen, Jarin Thundathil, Nandi Zhang, Rubaiat Habib Kazi, Ryo Suzuki.
"Augmented Physics: A Machine Learning-Powered Tool for Creating Interactive Physics Simulations from Static Diagrams." ArXiv (2024). [paper] [2024.05] -
PLUG: Zhaochen Liu, Limeng Qiao, Xiangxiang Chu, Tingting Jiang.
"PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus." ArXiv (2024). [paper] [2024.05] -
NIDS-Net: Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, Yu Xiang.
"Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation." ArXiv (2024). [paper] [code] [2024.05] -
MemSAM: Xiaolong Deng, Huisi Wu, Runhao Zeng, Jing Qin.
"MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation." CVPR (2024). [paper] [code] [2024.05] -
Part123: Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, Ping Luo, Wenping Wang.
"Part123: Part-aware 3D Reconstruction from a Single-view Image." SIGGRAPH (2024). [paper] [code] [2024.05] -
PP-SAM: Md Mostafijur Rahman, Mustafa Munir, Debesh Jha, Ulas Bagci, Radu Marculescu.
"PP-SAM: Perturbed Prompts for Robust Adaptation of Segment Anything Model for Polyp Segmentation." CVPRW (2024). [paper] [code] [2024.05] -
SA-GS: Butian Xiong, Xiaoyu Ye, Tze Ho Elden Tse, Kai Han, Shuguang Cui, Zhen Li.
"SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain." ArXiv (2024). [paper] [code] [2024.05] -
OV-SAM3D: Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Yabiao Wang, Yong Liu.
"Open-Vocabulary SAM3D: Understand Any 3D Scene." ArXiv (2024). [paper] [project] [code] [2024.05] -
Yuchun Guo, Zhiqing Lu, Yanling Zhou, Xin Jiang.
"Autonomous Quilt Spreading for Caregiving Robots." ArXiv (2024). [paper] [2024.05] -
MoME: Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai.
"A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts." MICCAI (2024). [paper] [2024.05] -
OLIVINE: Yifan Zhang, Junhui Hou.
"Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models." ArXiv (2024). [paper] [code] [2024.05] -
FreeTuner: Youcan Xu, Zhen Wang, Jun Xiao, Wei Liu, Long Chen.
"FreeTuner: Any Subject in Any Style with Training-free Diffusion." ArXiv (2024). [paper] [2024.05] -
UroSAM: Leng, Jixuan and Liu, Junfei and Cheng, Galen and Wang, Haohan and Quarrier, Scott Orzech and Luo, Jiebo and Jain, Rajat.
"Development of UroSAM: a machine learning model to automatically identify kidney stone composition from endoscopic video." Journal of Endourology (2024). [paper] [2024.05] -
DBA-CLIP: Xiaobo Yang, Xiaojin Gong.
"Tuning-free Universally-Supervised Semantic Segmentation." ArXiv (2024). [paper] [2024.05] -
FAM: Qijian Zhang, Junhui Hou, Wenping Wang, Ying He.
"Flatten Anything: Unsupervised Neural Surface Parameterization." ArXiv (2024). [paper] [2024.05] -
Zipeng Qi, Chenyang Liu, Zili Liu, Hao Chen, Yongchang Wu, Zhengxia Zou, Zhenwei Sh.
"Multi-view Remote Sensing Image Segmentation With SAM priors." ArXiv (2024). [paper] [2024.05] -
DTLLM-VLT: Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang.
"DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM." CVPRW (2024). [paper] -
WorldAfford: Changmao Chen, Yuren Cong, Zhen Kan.
"WorldAfford: Affordance Grounding based on Natural Language Instructions." ArXiv (2024). [paper] [2024.05] -
UO-SAM: Tingting Li, Gensheng Pei, Xinhao Cai, Huafeng Liu, Qiong Wang, Yazhou Yao.
"Universal Organizer of SAM for Unsupervised Semantic Segmentation." ICME (2024). [paper] [code] [2024.05] -
TAR: Tharun V. Puthanveettil, Fnu Obaid ur Rahman.
"Track Anything Rapter(TAR)." ArXiv (2024). [paper] [code] [2024.05] -
Zhiyu Xu, Qingliang Chen.
"NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation." ArXiv (2024). [paper] [2024.05] -
Mounes Zaval, Sedat Ozer.
"Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model." IEEE SIU (2024). [paper] [2024.05] -
SAMReg: Shiqi Huang, Tingfa Xu, Ziyi Shen, Shaheer Ullah Saeed, Wen Yan, Dean Barratt, Yipeng Hu.
"One registration is worth two segmentations." MICCAI (2024). [paper] [2024.05] -
USIS10K & USIS-SAM: Lian, Shijie and Zhang, Ziyi and Li, Hua and Li, Wenjie and Yang, Laurence Tianruo and Kwong, Sam and Cong, Runmin.
"Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset." ICML (2024). [paper] [code] [2024.05] -
M4oE: Yufeng Jiang, Yiqing Shen.
"M4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts." ArXiv (2024). [paper] [code] [2024.05] -
SLIP: Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal.
"Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)." ArXiv (2024). [paper] [2024.05] -
SAM3D: Trevor J. Chan, Aarush Sahni, Jie Li, Alisha Luthra, Amy Fang, Alison Pouch, Chamith S. Rajapakse.
"SAM3D: Zero-Shot Semi-Automatic Segmentation in 3D Medical Images with the Segment Anything Model." ArXiv (2024). [paper] [2024.05] -
Jin Kousaka, Atsuko H. Iwane, Yuichi Togashi.
"Automated Cell Structure Extraction for 3D Electron Microscopy by Deep Learning." ArXiv (2024). [paper] [2024.05] -
Elham Ravanbakhsh, Cheng Niu, Yongqing Liang, J. Ramanujam, Xin Li.
"Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach." ArXiv (2024). [paper] [2024.05] -
DiffMatch: Kaiyu Li, Xiangyong Cao, Yupeng Deng, Deyu Meng.
"DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector." ArXiv (2024). [paper] [2024.05] -
SegAD: Aimira Baitieva, David Hurych, Victor Besnier, Olivier Bernard.
"Supervised Anomaly Detection for Complex Industrial Images." ArXiv (2024). [paper] [code] [2024.05] -
WBNet: Yi Wang, et al.
"WBNet: Weakly-supervised salient object detection via scribble and pseudo-background priors." Pattern Recognition (2024). [paper] [code] [2024.05] -
Kevin Charles Bierlich, Sagar Karki, Clara N. Bird, Alan Fern, Leigh G. Torres.
"Automated body length and body condition measurements of whales from drone videos for rapid assessment of population health." Marine Mammal Science (2024). [paper] [2024.05] -
WSPoly-SAM: Tingting Cai and Hongping Yan and Kun Ding and Yan Zhang and Yueyue Zhou.
"WSPoly-SAM: Weakly-Supervised and Self-Guided Fine-Tuning of SAM for Colonoscopy Polyp Segmentation." ArXiv (2024). [paper] [2024.05] -
ELiTe: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin.
"ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation." ICME (2024). [paper] [2024.05] -
PTQ4SAM: Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, Xianglong Liu.
"PTQ4SAM: Post-Training Quantization for Segment Anything." CVPR (2024). [paper] [code] [2024.05] -
UnSAMFlow: Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx.
"UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model." CVPR (2024). [paper] [code] [2024.05] -
YOLO-SAM: Yu Zhu, Qiang Yang, Li Xu.
"Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation." ArXiv (2024). [paper] [2024.05] -
M2Depth: Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang.
"M2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation." ArXiv (2024). [paper] [code] [2024.05] -
Prateek Verma, Minh-Hao Van, Xintao Wu.
"Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis." ArXiv (2024). [paper] [2024.04] -
ASAM: Bo Li, Haoke Xiao, Lv Tang.
"ASAM: Boosting Segment Anything Model with Adversarial Tuning." CVPR (2024). [paper] [code] [2024.04] -
MoPEFT: Rajat Sahay, Andreas Savakis.
"MoPEFT: A Mixture-of-PEFTs for the Segment Anything Model." CVPR Workshops (2024). [paper] [2024.04] -
Shimian Zhang, Qiuhong Lu.
"Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform." ArXiv (2024). [paper] [2024.04] -
SAGHOG: Marco Peer, Florian Kleber, Robert Sablatnig.
"SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval." ICDAR (2024). [paper] [code] [2024.04] -
Auto-Generate-WLs: Tanvi Deshpande, Eva Prakash, Elsie Gyang Ross, Curtis Langlotz, Andrew Ng, Jeya Maria Jose Valanarasu.
"Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segmentation." MIDL (2024). [paper] [code] [2024.04] -
Dr-SAM: Vazgen Zohranyan, Vagner Navasardyan, Hayk Navasardyan, Jan Borggrefe, Shant Navasardyan.
"Dr-SAM: An End-to-End Framework for Vascular Segmentation, Diameter Estimation, and Anomaly Detection on Angiography Images." ArXiv (2024). [paper] [code] [2024.04] -
MAS-SAM: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu.
"MAS-SAM: Segment Any Marine Animal with Aggregated Features." IJCAI (2024). [paper] [code] [2024.04] -
Kuan-I Chung, Daniel Moyer.
"Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain." ArXiv (2024). [paper] [2024.04] -
OMEGAS: Lizhi Wang, Feng Zhou, Jianqin Yin.
"OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation." ArXiv (2024). [paper] [code] [2024.04] -
HOIST-Former: Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang, Minh Hoai.
"HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild." ArXiv (2024). [paper] [code] [2024.04] -
CLIP-GS: Guibiao Liao, Jiankun Li, Zhenyu Bao, Xiaoqing Ye, Jingdong Wang, Qing Li, Kanglin Liu.
"CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding." ArXiv (2024). [paper] [code] [2024.04] -
X-Ray: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee.
"X-Ray: A Sequential 3D Representation for Generation." ArXiv (2024). [paper] [2024.04] -
BUSSAM: Zhengzheng Tu, Le Gu, Xixi Wang, Bo Jiang.
"Ultrasound SAM Adapter: Adapting SAM for Breast Lesion Segmentation in Ultrasound Images." ArXiv (2024). [paper] [code] [2024.04] -
GeoDiffuser: Rahul Sajnani, Jeroen Vanbaar, Jie Min, Kapil Katyal, Srinath Sridhar.
"GeoDiffuser: Geometry-Based Image Editing with Diffusion Models." ArXiv (2024). [paper] [code] [2024.04] -
Yuyan Shi, Jialu Ma, Jin Yang, Shasha Wang, Yichi Zhang.
"Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models." ArXiv (2024). [paper] [2024.04] -
PM-VIS: Zhangjing Yang, Dun Liu, Wensheng Cheng, Jinqiao Wang, Yi Wu.
"PM-VIS: High-Performance Box-Supervised Video Instance Segmentation." ArXiv (2024). [paper] [2024.04] -
Surgical-DeSAM: Yuyang Sheng, Sophia Bano, Matthew J. Clarkson, Mobarakol Islam.
"Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery." ArXiv (2024). [paper] [2024.04] -
UrbanCross: Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang.
"UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation." ArXiv (2024). [paper] [2024.04] -
ELEV-VISION-SAM: Yu-Hsuan Ho, Longxiang Li, Ali Mostafavi.
"ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation." ArXiv (2024). [paper] [2024.04] -
Uni3DR^2: Tao Chu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Qiong Liu, Jiaqi Wang.
"Unified Scene Representation and Reconstruction for 3D Large Language Models." ArXiv (2024). [paper] [code] [2024.04] -
MM-ScatterNet: Yilong Chen, Zongyi Xu, xiaoshui Huang, Ruicheng Zhang, Xinqi Jiang, Xinbo Gao.
"Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation." ArXiv (2024). [paper] [2024.04] -
Zip: Cheng Shi, Sibei Yang.
"The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models." ICLR (2024). [paper] [code] [2024.04] -
PPT: Qiyuan Dai, Sibei Yang.
"Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation." CVPR (2024). [paper] [2024.04] -
FlowSAM: Junyu Xie, Charig Yang, Weidi Xie, Andrew Zisserman.
"Moving Object Segmentation: All You Need Is SAM (and Flow)." ArXiv (2024). [paper] [code] [2024.04] -
SOHES: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang.
"SOHES: Self-supervised Open-world Hierarchical Entity Segmentation." ICLR (2024). [paper] [code] [2024.04] -
Yona Falinie A. Gaus, Neelanjan Bhowmik, Brian K. S. Isaac-Medina, Toby P. Breckon.
"Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery." ArXiv (2024). [paper] [2024.04] -
Yiqun Xie, Zhihao Wang, Weiye Chen, Zhili Li, Xiaowei Jia, Yanhua Li, Ruichen Wang, Kangyang Chai, Ruohan Li, Sergii Skakun.
"When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery." ArXiv (2024). [paper] [2024.04] -
LAECIPS: Shijing Hu, Ruijun Deng, Xin Du, Zhihui Lu, Qiang Duan, Yi He, Shih-Chia Huang, Jie Wu.
"LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System." ArXiv (2024). [paper] [2024.04] -
ESD: Jieming Yu, Long Bai, Guankun Wang, An Wang, Xiaoxiao Yang, Huxin Gao, Hongliang Ren.
"Adapting SAM for Surgical Instrument Tracking and Segmentation in Endoscopic Submucosal Dissection Videos." IEEE ICRA C4SR+ Workshop (2024). [paper] [2024.04] -
Finetune-SAM: Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski.
"How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model." ArXiv (2024). [paper] [code] [2024.04] -
Offline-evt: Fangwei Zhong, Kui Wu, Hai Ci, Churan Wang, Hao Chen.
"Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL." ArXiv (2024). [paper] [code] [2024.04] -
VFMM3D: Bonan Ding, Jin Xie, Jing Nie, Jiale Cao.
"VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection." ArXiv (2024). [paper] [2024.04] -
LLM-Seg: Junchi Wang, Lei Ke.
"LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning." ArXiv (2024). [paper] [code] [2024.04] -
June Moh Goo, Zichao Zeng, Jan Boehm.
"Zero-shot detection of buildings in mobile LiDAR using Language Vision Model." ArXiv (2024). [paper] [2024.04] -
Auto-Prom: Abu Bakor Hayat Arnob, Xiangxue Wang, Yiping Jiao, Xiao Gan, Wenlong Ming, Jun Xu.
"Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation." ArXiv (2024). [paper] [code] [2024.04] -
Robin Schön, Julian Lorenz, Katja Ludwig, Rainer Lienhart.
"Adapting the Segment Anything Model During Usage in Novel Situations." ArXiv (2024). [paper] [2024.04] -
S-RA & T-RA: Yifan Shen, Zhengyuan Li, Gang Wang.
"Practical Region-level Attack against Segment Anything Models." ArXiv (2024). [paper] [2024.04] -
MedRG: Ke Zou, Yang Bai, Zhihao Chen, Yang Zhou, Yidi Chen, Kai Ren, Meng Wang, Xuedong Yuan, Xiaojing Shen, Huazhu Fu.
"MedRG: Medical Report Grounding with Multi-modal Large Language Model." ArXiv (2024). [paper] [2024.04] -
O2V-Mapping: Muer Tie, Julong Wei, Zhengjun Wang, Ke Wu, Shansuai Yuan, Kaizhao Zhang, Jie Jia, Jieru Zhao, Zhongxue Gan, Wenchao Ding.
"O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation." ArXiv (2024). [paper] [2024.04] -
Liu M, Cui M, Wei W, Xu X, Sun C, Li F, Song Z, Lu Y, Zhang J, Tian F, et al.
"Sorting of Mountage Cocoons Based on MobileSAM and Target Detection." Agriculture (2024). [paper] [2024.04] -
ShadowSAM: Zeheng Qian and Wen Wu and Xian-Tao Wu and Xiao-Diao Chen.
"Omni-supervised shadow detection with vision foundation model." JVCI (2024). [paper] [code] [2024.04] -
Moghimi, Armin and Welzel, Mario and Celik, Turgay and Schlurmann, Torsten.
"A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery." IEEE Access (2024). [paper] [code] [Dataset] [2024.04] -
SAMPA: Handi Deng, Yucheng Zhou, Jiaxuan Xiang, Liujie Gu, Yan Luo, Hai Feng, Mingyuan Liu, Cheng Ma.
"Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution." ArXiv (2024). [paper] [code] [2024.04] -
SAM-I-Am: Waqwoya Abebe, Jan Strube, Luanzheng Guo, Nathan R. Tallent, Oceane Bel, Steven Spurgeon, Christina Doty, Ali Jannesari.
"SAM-I-Am: Semantic Boosting for Zero-shot Atomic-Scale Electron Micrograph Segmentation." ArXiv (2024). [paper] [2024.04] -
SaLIP: Sidra Aleem, Fangyijie Wang, Mayug Maniparambil, Eric Arazo, Julia Dietlmeier, Kathleen Curran, Noel E. O'Connor, Suzanne Little.
"Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation." CVPR Workshops (2024). [paper] [code] [2024.04] -
CTL: Anas Gouda, Max Schwarz, Christopher Reining, Sven Behnke, Alice Kirchheim.
"Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping." ArXiv (2024). [paper] [code] [2024.04] -
Dual-SAM: Pingping Zhang, Tianyu Yan, Yang Liu, Huchuan Lu.
"Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM." CVPR (2024). [paper] [code] [2024.04] -
Yu Sheng, Lu Zhang, Xingchen Li, Yifan Duan, Yanyong Zhang, Yu Zhang, Jianmin Ji.
"Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes." ArXiv (2024). [paper] [2024.04] -
DL-EWF: Fatemeh Asghari, Mohammad Reza Soheili, Faezeh Gholamrezaie.
"DL-EWF: Deep Learning Empowering Women's Fashion with Grounded-Segment-Anything Segmentation for Body Shape Classification." ArXiv (2024). [paper] [2024.04] -
MuDI : Sangwon Jang, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang.
"Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models." ArXiv (2024). [paper] [code] [2024.04] -
David Jurado-Rodr´ıguez, et al.
"SAM-Based Detection of Structural Anomalies in 3D Models for Preserving Cultural Heritage." ArXiv (2024). [paper] [2024.04] -
Liqun Shan, Yanchang Liu, Ke Du, Shovon Paul, Xingli Zhang, Xiali Hei.
"Drilling rock image segmentation and analysis using segment anything model." Advances in Geo-Energy Research (2024). [paper] [2024.04] -
Sen Deng, et al.
"Semi-supervised TEE Segmentation via Interacting with SAM Equipped with Noise-Resilient Prompting." AAAI (2024). [paper] [2024.04] -
BSDSNet: Wang, Y.; Zhang, W.; Chen, W.; Chen, C.
"BSDSNet: Dual-Stream Feature Extraction Network Based on Segment Anything Model for Synthetic Aperture Radar Land Cover Classification." Remote Sens (2024). [paper] [2024.04] -
SweepMM: Weichen Xu; Xinxin Xu; Tianhao Fu; Jian Cao; Xiaoyang Xu; Yuetian Huang; Xixin Cao; Xing Zhang.
"SweepMM: A High-Quality Multimodal Dataset for Sweeping Robots in Home Scenarios for Vision-Language Model." ICASSP (2024). [paper] [2024.04] -
3DSAM: Shangjie Wang; Yan Zhang.
"3DSAM: Segment Anything in NeRF." ICASSP (2024). [paper] [2024.04] -
SAM-GEBD: Pranay Kashyap; Sourabh Vasant Gothe; Vibhav Agarwal; Jayesh Rajkumar Vachhani.
"SAM-GEBD: Zero-Cost Approach for Generic Event Boundary Detection." ICASSP (2024). [paper] [2024.04] -
SAM-CD: Zixuan Sun; Huihui Song; Kaihua Zhang; Gang Dong; Lingyan Liang; Yaqian Zhao.
"Segment Anything Model Guided Semantic Knowledge Learning For Remote Sensing Change Detection." ICASSP (2024). [paper] [2024.04] -
SkinSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"SkinSAM: adapting the segmentation anything model for skin cancer segmentation." SPIE (2024). [paper] [2024.04] -
BreastSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"BreastSAM: adapting the segmentation anything model for breast tumor segmentation in ultrasound imaging." SPIE (2024). [paper] [2024.04] -
Yiqiao Liu, et al.
"Universal 3D CT lesion segmentation using SAM with RECIST annotation." SPIE (2024). [paper] [2024.04] -
SS2V: Xing Yao, et al.
"FNPC-SAM: uncertainty-guided false negative/positive control for SAM on noisy medical images." SPIE (2024). [paper] [code] [2024.04] -
SAM-Att: Zhu, Yaqi and Xiong, Changchun and Zhao, Heng and Yao, Yudong.
"SAM-Att: A Prompt-free SAM-related Model with an Attention Module for Automatic Segmentation of the Left Ventricle in Echocardiography." IEEE Access (2024). [paper] [2024.04] -
BarelySAM: Ding, Yuhang and Liu, Hongmin.
"Barely-supervised Brain Tumor Segmentation via Employing Segment Anything Model." TCSVT (2024). [paper] [2024.04] -
Design2Cloth: Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou.
"Design2Cloth: 3D Cloth Generation from 2D Masks." CVPR (2024). [paper] [code] [2024.04] -
FBM: Huang, Peng and Shu, Xiangbo and Yan, Rui and Tu, Zhewei and Tang, Jinhui.
"Appearance-Agnostic Representation Learning for Compositional Action Recognition." TCSVT (2024). [paper] [2024.04] -
Seda Camalan, Muhammad Khalid Khan Niazi, Charles Elmaraghy, Aaron C. Moberly, Metin N. Gurcan.
"Tympanic membrane segmentation of video frames to create composite images using SAM." SPIE (2024). [paper] [2024.04] -
Meyer, A., Mazellier, JP., Dana, J. et al.
"On-the-fly point annotation for fast medical video labeling." Int J CARS (2024). [paper] [2024.04] -
Uygun, T., Ozguven, M.M.
"Determination of tomato leafminer: Tuta absoluta (Meyrick) (Lepidoptera: Gelechiidae) damage on tomato using deep learning instance segmentation method." Eur Food Res Technol (2024). [paper] [2024.04] -
iSeg: Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, Rana Hanocka.
"iSeg: Interactive 3D Segmentation via Interactive Attention." ArXiv (2024). [paper] [code] [2024.04] -
Gen3DSR: Andreea Dogaru, Mert Özer, Bernhard Egger.
"Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View." ArXiv (2024). [paper] [code] [2024.04] -
OW-VISCap: Anwesa Choudhuri, Girish Chowdhary, Alexander G. Schwing.
"OW-VISCap: Open-World Video Instance Segmentation and Captioning." ArXiv (2024). [paper] [code] [2024.04] -
UAD: Jiahao Lu, Xingyi Yang, Xinchao Wang.
"Unsegment Anything by Simulating Deformation." CVPR (2024). [paper] [code] [2024.04] -
FIGA: Krzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek.
"Red-Teaming Segment Anything Model." CVPR Workshop (2024). [paper] [2024.04] -
DHR: Sanghyun Jo, Fei Pan, In-Jae Yu, Kyungsu Kim.
"DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation." ArXiv (2024). [paper] [code] [2024.04] -
DIT: Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji.
"Deep Instruction Tuning for Segment Anything Model." ArXiv (2024). [paper] [code] [2024.04] -
SegNext: Qin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer.
"Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts." CVPR (2024). [paper] [code] [2024.04] -
Detect2Interact: Jialou Wang, Manli Zhu, Yulei Li, Honglei Li, Longzhi Yang, Wai Lok Woo.
"Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs." IEEE Intelligent Systems (2024). [paper] [2024.04] -
CoCoCo: Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang.
"CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility." ArXiv (2024). [paper] [project] [code] [2024.03] -
DeepSeek-VL: Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan.
"DeepSeek-VL: Towards Real-World Vision-Language Understanding." ArXiv (2024). [paper] [code] [2024.03] -
MedCLIP-SAM: Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao.
"MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation." ArXiv (2024). [paper] [2024.03] -
COCO-ReM: Shweta Singh, Aayan Yadav, Jitesh Jain, Humphrey Shi, Justin Johnson, Karan Desai.
"Benchmarking Object Detectors with COCO: A New Path Forward." ArXiv (2024). [paper] [dataset] [code] [2024.03] -
Yuiko Sakuma, Masakazu Yoshimura, Junji Otsuka, Atsushi Irie, Takeshi Ohashi.
"Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter." ArXiv (2024). [paper] [2024.03] -
Total-Decom: Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-tian Sun, Xiaojuang Qi.
"Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction." CVPR (2024). [paper] [code] [2024.03] -
SAM-dPCR: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu Yuan.
"SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model." ArXiv (2024). [paper] [2024.03] -
H-SAM: Zhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou.
"Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding." CVPR (2024). [paper] [code] [2024.03] -
Annolid: Chen Yang, Thomas A. Cleland.
"Annolid: Annotate, Segment, and Track Anything You Need." ArXiv (2024). [paper] [2024.03] -
SAMME: Yihao Liu, Jiaming Zhang, Andres Diaz-Pinto, Haowei Li, Alejandro Martin-Gomez, Amir Kheradmand, Mehran Armand.
"Segment Any Medical Model Extended." ArXiv (2024). [paper] [2024.03] -
EgoLifter: Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney.
"EgoLifter: Open-world 3D Segmentation for Egocentric Perception." ArXiv (2024). [paper] [code] [2024.03] -
David Jurado-Rodr´ıguez, et al.
"SAM-Based Detection of Structural Anomalies in 3D Models for Preserving Cultural Heritage." VISAPP (2024). [paper] [2024.03] -
MAkE-able: Christoph Pohl, Fabian Reister, Fabian Peller-Konrad and Tamim Asfour.
"MAkE-able: Memory-centered and Affordance-based Task Execution Framework for Transferable Mobile Manipulation Skills." ArXiv (2024). [paper] [code] [2024.03] -
GoodSAM: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang.
"GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation." CVPR (2024). [paper] [code] [2024.03] -
SPF+SPD: Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang.
"Distilling Semantic Priors from SAM to Efficient Image Restoration Models." ArXiv (2024). [paper] [2024.03] -
SAM-Road: Congrui Hetang, Haoru Xue, Cindy Le, Tianwei Yue, Wenping Wang, Yihui He.
"Segment Anything Model for Road Network Graph Extraction." ArXiv (2024). [paper] [code] [2024.03] -
SAM_DataAnnotation: Pranav Kulkarni, Adway Kanhere, Dharmam Savani, Andrew Chan, Devina Chatterjee, Paul H. Yi, Vishwa S. Parekh.
"Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations." ArXiv (2024). [paper] [code] [2024.03] -
CT-SAM3D: Heng Guo, Jianfeng Zhang, Jiaxing Huang, Tony C. W. Mok, Dazhou Guo, Ke Yan, Le Lu, Dakai Jin, Minfeng Xu.
"Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans." ArXiv (2024). [paper] [2024.03] -
ALC: Hoyoung Kim, Sehyun Hwang, Suha Kwak, Jungseul Ok.
"Active Label Correction for Semantic Segmentation with Foundation Models." ArXiv (2024). [paper] [2024.03] -
DiffCriticEdit:Ruicheng Wang, Jianfeng Xiang, Jiaolong Yang, Xin Tong.
"Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors." ArXiv (2024). [paper] [code] [2024.03] -
Rafaela Orenga Panizza, et al.
"Labeling Construction, Renovation, and Demolition Waste through Segment Anything Model (SAM)." Construction Research Congress (2024). [paper] [2024.03] -
SAM-AutoMed: Jiakang Sun, Ke Chen, Zhiyi He, Siyuan Ren, Xinyang He, Xu Liu, Cheng Peng .
"Medical Image Analysis using Improved SAM-Med2D: Segmentation and Classification Perspectives." ArXiv (2024). [paper] [2024.03] -
RASP: Minghui Zhao, Junxi Xia, Kaiyuan Hou, Yanchen Liu, Stephen Xia, Xiaofan Jiang.
"RASP: A Drone-based Reconfigurable Actuation and Sensing Platform Towards Ambient Intelligent Systems." ArXiv (2024). [paper] [2024.03] -
WebSAM-Adapter: Ren, B., Qian, Z., Sun, Y., Gao, C., Zhang, C.
"WebSAM-Adapter: Adapting Segment Anything Model for Web Page Segmentation." ECIR (2024). [paper] [2024.03] -
ProMamba: Jianhao Xie, Ruofan Liao, Ziang Zhang, Sida Yi, Yuesheng Zhu, Guibo Luo.
"ProMamba: Prompt-Mamba for polyp segmentation." ArXiv (2024). [paper] [2024.03] -
MTP: Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang.
"MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining." ArXiv (2024). [paper] [code] [2024.03] -
Connor Lee, Saraswati Soedarmadji, Matthew Anderson, Anthony J. Clark, and Soon-Jo Chung.
"Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots." ArXiv (2024). [paper] [code] [2024.03] -
Luna, Miguel, Philip Chikontwe, and Sang Hyun Park.
"Enhanced Nuclei Segmentation and Classification via Category Descriptors in the SAM Model." Bioengineering (2024). [paper] [2024.03] -
MoCA: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu.
"Unsupervised Audio-Visual Segmentation with Modality Alignment." ArXiv (2024). [paper] [2024.03] -
LLaVASeg: Yuqi Yang, Peng-Tao Jiang, Jing Wang, Hao Zhang, Kai Zhao, Jinwei Chen, Bo Li.
"Empowering Segmentation Ability to Multi-modal Large Language Models." ArXiv (2024). [paper] [code] [2024.03] -
MaskSAM: Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan.
"MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation." ArXiv (2024). [paper] [2024.03] -
SAL: Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé.
"Better Call SAL: Towards Learning to Segment Anything in Lidar." ArXiv (2024). [paper] [code] [2024.03] -
SAMCT: Xian Lin, Yangyang Xiang, Zhehao Wang, Kwang-Ting Cheng, Zengqiang Yan, Li Yu.
"SAMCT: Segment Any CT Allowing Labor-Free Task-Indicator Prompts." ArXiv (2024). [paper] [code] [2024.03] -
Efrain Torres-Lomas, Jimena Lado-Jimena, Guillermo Garcia-Zamora, Luis Diaz-Garcia.
"Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties." ArXiv (2024). [paper] [2024.03] -
Roland Gruber, Steffen Rüger, Thomas Wittenberg.
"Adapting SAM for Volumetric X-Ray Data-sets of Arbitrary Sizes." ArXiv (2024). [paper] [2024.03] -
LocalStyleFool: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu.
"LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model." SPW (2024). [paper] [2024.03] -
CCC++: Mrityunjoy Gain, Avi Deb Raha, Rameswar Debnath.
"CCC++: Optimized Color Classified Colorization with Segment Anything Model (SAM) Empowered Object Selective Color Harmonization." ArXiv (2024). [paper] [2024.03] -
CFR: Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao.
"Concatenate, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.03] -
TA-LoRA: Xuehao Wang, Feiyang Ye, Yu Zhang.
"Task-Aware Low-Rank Adaptation of Segment Anything Model." ArXiv (2024). [paper] [2024.03] -
UA-SAM: Mingzhou Jiang, Jiaying Zhou, Junde Wu, Tianyang Wang, Yueming Jin, Min Xu.
"Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for Ambiguous Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.03] -
MS-UGCML: Shichao Kan, Yuhai Deng, Yixiong Liang, Lihui Cen, Zhe Qu, Yigang Cen, Zhihai He.
"Unsupervised Collaborative Metric Learning with Mixed-Scale Groups for General Object Retrieval." ArXiv (2024). [paper] [code] [2024.03] -
SAOM: Mariia Khan, Yue Qiu, Yuren Cong, Jumana Abu-Khalaf, David Suter, Bodo Rosenhahn.
"Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation." ArXiv (2024). [paper] [2024.03] -
FastSAM3D: Yiqing Shen, Jingxing Li, Xinyuan Shao, Blanca Inigo Romillo, Ankush Jindal, David Dreizin, Mathias Unberath.
"FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images." ArXiv (2024). [paper] [code] [2024.03] -
CMR2D+T-SAM: Zhennong Chen, Sekeun Kim, Hui Ren, Quanzheng Li, Xiang Li.
"Cardiac Magnetic Resonance 2D+T Short- and Long-axis Segmentation via Spatio-temporal SAM Adaptation." ArXiv (2024). [paper] [2024.03] -
Group-Mix SAM: Wu Liang, X.-G. Ma.
"Group-Mix SAM: Lightweight Solution for Industrial Assembly Line Applications." ArXiv (2024). [paper] [2024.03] -
TransLandSeg: Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen.
"TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model." ArXiv (2024). [paper] [2024.03] -
Grasp Anything: Malte Mosbach, Sven Behnke.
"Grasp Anything: Combining Teacher-Augmented Policy Gradient Learning with Instance Segmentation to Grasp Arbitrary Objects." ArXiv (2024). [paper] [code] [2024.03] -
RDC: Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen.
"Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning." ArXiv (2024). [paper] [2024.03] -
VISE: Tian Meng, Yang Tao, Ruilin Lyu, Wuliang Yin.
"Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models." ArXiv (2024). [paper] [2024.03] -
DiffuMatting: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji.
"DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation." ArXiv (2024). [paper] [2024.03] -
ClickVOS: Pinxue Guo, Lingyi Hong, Xinyu Zhou, Shuyong Gao, Wanyun Li, Jinglun Li, Zhaoyu Chen, Xiaoqiang Li, Wei Zhang, Wenqiang Zhang.
"ClickVOS: Click Video Object Segmentation." ArXiv (2024). [paper] [code] [2024.03] -
Kong, L.; Huang, M.; Zhang, L.; Chan, L.W.C.
"Enhancing Diagnostic Images to Improve the Performance of the Segment Anything Model in Medical Image Segmentation." Bioengineering (2024). [paper] [2024.03] -
FSViewFusion: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim.
"FSViewFusion: Few-Shots View Generation of Novel Objects." ArXiv (2024). [paper] [2024.03] -
V-PRISM: Herbert Wright, Weiming Zhi, Matthew Johnson-Roberson, Tucker Hermans.
"V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes." ArXiv (2024). [paper] [code] [2024.03] -
CCSpO2Net: Sun, Xiantao and Wen, Tao and Chen, Weihai and Huang, Bin.
"CCSpO2Net: Camera-Based Contactless Oxygen Saturation Measurement Foundation Model in Clinical Settings." TIM (2024). [paper] [2024.03] -
Unveiling the Truth: Cartella, Giuseppe and Cuculo, Vittorio and Cornia, Marcella and Cucchiara, Rita.
"Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images." APL (2024). [paper] [code] [2024.03] -
V-PRISM: Herbert Wright, Weiming Zhi, Matthew Johnson-Roberson, Tucker Hermans.
"V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes." ArXiv (2024). [paper] [code] [2024.03] -
Stefan Denner, David Zimmerer, Dimitrios Bounias, Markus Bujotzek, Shuhan Xiao, Lisa Kausch, Philipp Schader, Tobias Penzkofer, Paul F. Jäger, Klaus Maier-Hein.
"Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology." ArXiv (2024). [paper] [2024.03] -
Chaoyi Wang, Yaozhe Song, Yafeng Zhang, Jun Pei, Lijie Xia, Jianpo Liu.
"Video Generation with Consistency Tuning." ArXiv (2024). [paper] [2024.03] -
Sam-Rsp: Jiaguang Li, et al.
"Sam-Rsp: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts." SSRN (2024). [paper] [code] [2024.03] -
Lumen: Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang.
"Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models." ArXiv (2024). [paper] [code] [2024.03] -
RSBuilding: Mingze Wang, Keyan Chen, Lili Su, Cilin Yan, Sheng Xu, Haotian Zhang, Pengcheng Yuan, Xiaolong Jiang, Baochang Zhang.
"RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model." ArXiv (2024). [paper] [code] [2024.03] -
DragAnything: Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang.
"DragAnything: Motion Control for Anything using Entity Representation." ArXiv (2024). [paper] [code] [homepage] [2024.03] -
ARtVista: Trong-Vu Hoang, Quang-Binh Nguyen, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le.
"ARtVista: Gateway To Empower Anyone Into Artist." CHI (2024). [paper] [code] [2024.03] -
ChemSAM: Bowen Tang, et al.
"Automated molecular structure segmentation from documents using ChemSAM." ArXiv (2024). [paper] [2024.03] -
DAL: Zhang, Fayong and Liu, Kejun and Liu, Yuanyuan and Wang, Chaofan and Zhou, Wujie and Zhang, Hongyan and Wang, Lizhe.
"Multi-target Domain Adaptation Building Instance Extraction of Remote Sensing Imagery with Domain-common Approximation learning." TGRS (2024). [paper] [2024.03] -
VisionGPT: Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zou.
"VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework." ArXiv (2024). [paper] [2024.03] -
WeakSurg: Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou.
"WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity." ArXiv (2024). [paper] [2024.03] -
Ref LDM-Seg: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan.
"Explore In-Context Segmentation via Latent Diffusion Models." ArXiv (2024). [paper] [code] [2024.03] -
GaussianGrasper: Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang.
"GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping." ArXiv (2024). [paper] [code] [2024.03] -
Soroush Seifi, Daniel Olmeda Reino, Fabien Despinoy, Rahaf Aljundi.
"Annotation Free Semantic Segmentation with Vision Foundation Models." ArXiv (2024). [paper] [2024.03] -
SLCF-Net: Helin Cao, Sven Behnke.
"SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net." ICRA (2024). [paper] [2024.03] -
SAM-Lightening: Yanfei Songa, Bangzheng Pua, Peng Wanga, Hongxu Jiang, Dong Donga, Yiqing Shen.
"SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration." ArXiv (2024). [paper] [code] [2024.03] -
DF4LCZ: Qianqian Wu, Xianping Ma, Jialu Sui, and Man-On Pun.
"DF4LCZ: A SAM-Empowered Data Fusion Framework for Scene-Level Local Climate Zone Classification." ArXiv (2024). [paper] [code] [2024.03] -
PosSAM: Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli.
"PosSAM: Panoptic Open-vocabulary Segment Anything." ArXiv (2024). [paper] [code] [2024.03] -
PLM+PMM: Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Bae.
"Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation." ArXiv (2024). [paper] [2024.03] -
WSI-SAM: Hong Liu, Haosen Yang, Paul J. van Diest, Josien P.W. Pluim, Mitko Veta.
"WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images." ArXiv (2024). [paper] [code] [2024.03] -
SAMDA: Yiran Wang, Li Xiao.
"SAMDA: Leveraging SAM on Few-Shot Domain Adaptation for Electronic Microscopy Segmentation." ArXiv (2024). [paper] [2024.03] -
Zijian Wu, Adam Schmidt, Peter Kazanzides, Septimiu E. Salcudean.
"Real-time Surgical Instrument Segmentation in Video Using Point Tracking and Segment Anything." ArXiv (2024). [paper] [2024.03] -
Zijian Wu, Adam Schmidt, Peter Kazanzides, Septimiu E. Salcudean.
FluoroSAM: Benjamin D. Killeen, Liam J. Wang, Han Zhang, Mehran Armand, Russell H. Taylor, Greg Osgood, Mathias Unberath.
"FluoroSAM: A Language-aligned Foundation Model for X-ray Image Segmentation." ArXiv (2024). [paper] [code] [2024.03] -
ReimaginedAct: Lan Wang, Vishnu Boddeti, Sernam Lim.
"Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions." ArXiv (2024). [paper] [2024.03] -
GEOBIA: He, Tao and Chen, Jianyu and Kang, Linchong and Zhu, Qiankun.
"Evaluation of Global-Scale and Local-Scale Optimized Segmentation Algorithms in GEOBIA with SAM on Land Use and Land Cover." JSTARS (2024). [paper] [2024.03] -
ObjectCompose: Hashmat Shadab Malik, Muhammad Huzaifa, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan.
"ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes." ArXiv (2024). [paper] [2024.03] -
Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-Ping Fan, Fahad Shahbaz Khan.
"Effectiveness Assessment of Recent Large Vision-Language Models." ArXiv (2024). [paper] [2024.03] -
Giannakis, I., Bhardwaj, A., Sam, L., and Leontidis, G..
"Segment Anything Model (SAM) for Automatic Crater Detection." EGU General Assembly (2024). [paper] [2024.03] -
Bocchino, F., Sergi, G., Ravanelli, R., and Crespi, M..
"Preliminary analysis of the potentialities of the Segment Anything Model (SAM) in the segmentation of Sentinel-2 imagery for water reservoir monitoring." EGU General Assembly (2024). [paper] [2024.03] -
Ruiqing Yan , et al.
"Weakly-semi supervised extraction of rooftop photovoltaics from high-resolution images based on segment anything model and class activation map." Applied Energy (2024). [paper] [2024.03] -
CSFwinformer: Xie, Zhifeng and Wang, Sen and Yu, Qiucheng and Tan, Xin and Xie, Yuan.
"CSFwinformer: Cross-Space-Frequency Window Transformer for Mirror Detection." TIP (2024). [paper] [code] [2024.03] -
Xiaoyuan Liu, et al.
"Stereo Vision Meta-Lens-Assisted Driving Vision." ACS Photonics (2024). [paper] [2024.03] -
PointSeg: Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang.
"PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models." ArXiv (2024). [paper] [2024.03] -
MEA: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu.
"Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation." ArXiv (2024). [paper] [2024.03] -
GAM-3DSC: Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You.
"Large Generative Model Assisted 3D Semantic Communication." ArXiv (2024). [paper] [2024.03] -
APPLE: Zikang Xu, Fenghe Tang, Quan Quan, Qingsong Yao, S. Kevin Zhou.
"APPLE: Adversarial Privacy-aware Perturbations on Latent Embedding for Unfairness Mitigation." ArXiv (2024). [paper] [2024.03] -
FedFMS: Yuxi Liu, Guibo Luo, Yuesheng Zhu.
"FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.03] -
OmniCount: Anindya Mondal, Sauradip Nag, Xiatian Zhu, Anjan Dutta.
"OmniCount: Multi-label Object Counting with Semantic-Geometric Priors." ArXiv (2024). [paper] [2024.03] -
P^2SAM: Chenhui Zhao, Liyue Shen.
"Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation." ArXiv (2024). [paper] [2024.03] -
SAM-PD: Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chen.
"SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising." ArXiv (2024). [paper] [code] [2024.03] -
SA-ICM: Takahiro Shindo, Kein Yamada, Taiju Watanabe, Hiroshi Watanabe.
"Image Coding for Machines with Edge Information Learning Using Segment Anything." ArXiv (2024). [paper] [2024.03] -
ProMISe: Jinfeng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhou.
"ProMISe: Promptable Medical Image Segmentation using SAM." ArXiv (2024). [paper] [2024.03] -
Popeye: Wei Zhang, Miaoxin Cai, Tong Zhang, Guoqiang Lei, Yin Zhuang, Xuerui Mao.
"Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery." ArXiv (2024). [paper] [2024.03] -
Kevin Shen, Surabhi S Nath, Aenne Brielmann, Peter Dayan.
"Simplicity in Complexity." ArXiv (2024). [paper] [2024.03] -
CCC: Mrityunjoy Gain, Avi Deb Raha, Rameswar Debnath.
"CCC: Color Classified Colorization." ArXiv (2024). [paper] [2024.03] -
CAC: Yuhao Lin, Haiming Xu, Lingqiao Liu, Javen Qinfeng Shi.
"A Simple-but-effective Baseline for Training-free Class-Agnostic Counting." ArXiv (2024). [paper] [2024.03] -
Khatua, A., Bhattacharya, A., Goswami, A.K. et al.
"Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models." Spatial Information Research (2024). [paper] [2024.03] -
Toki Tahmid Inan, Mingrui Liu, Amarda Shehu.
"Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms." AAAI (2024). [paper] [2024.03] -
Xu, Binwei and Jiang, Qiuping and Zhao, Xing and Lu, Chenyang and Liang, Haoran and Liang, Ronghua.
"Multidimensional Exploration of Segment Anything Model for Weakly Supervised Video Salient Object Detection." TCSVT (2024). [paper] [2024.02] -
GVA: Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang.
"GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos." ArXiv (2024). [paper] [code] [2024.02] -
OHTA: Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue.
"OHTA: One-shot Hand Avatar via Data-driven Implicit Priors." CVPR (2024). [paper] [code] [2024.02] -
FusionVision: Safouane El Ghazouali, Youssef Mhirit, Ali Oukhrid, Umberto Michelucci, Hichem Nouira.
"FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything." ArXiv (2024). [paper] [code] [2024.02] -
GROUNDHOG: Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai.
"GROUNDHOG : Grounding Large Language Models to Holistic Segmentation." ArXiv (2024). [paper] [code] [2024.02]
-POC: Pau de Jorge, Riccardo Volpi, Puneet K. Dokania, Philip H. S. Torr, Gregory Rogez.
"Placing Objects in Context via Inpainting for Out-of-distribution Segmentation." ArXiv (2024).
[paper]
[code]
[2024.02]
-
GEA: Xinqi Liu, Chenming Wu, Xing Liu, Jialun Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang.
"GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video." ArXiv (2024). [paper] [code] [2024.02] -
Surgment: Jingying Wang, Haoran Tang, Taylor Kantor, Tandis Soltani, Vitaliy Popov, Xu Wang.
"Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning." ArXiv (2024). [paper] [2024.02] -
sViT: Young Kyung Kim, J. Matías Di Martino, Guillermo Sapiro.
"Vision Transformers with Natural Language Semantics." ArXiv (2024). [paper] [2024.02] -
Huang, Wenjun, Anzhu Yu, Qing Xu, Qun Sun, Wenyue Guo, Song Ji, Bowei Wen, and Chunping Qiu.
"Sea Ice Extraction via Remote Sensing Imagery: Algorithms, Datasets, Applications and Challenges." Remote Sensing (2024). [paper] [2024.02] -
OpenMEDLab: Xiaosong Wang, Xiaofan Zhang, Guotai Wang, Junjun He, Zhongyu Li, Wentao Zhu, Yi Guo, Qi Dou, Xiaoxiao Li, Dequan Wang, Liang Hong, Qicheng Lao, Tong Ruan, Yukun Zhou, Yixue Li, Jie Zhao, Kang Li, Xin Sun, Lifeng Zhu, Shaoting Zhang.
"OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models in Medicine." ArXiv (2024). [paper] [code] [2024.02] -
RSAM-Seg: Jie Zhang, Xubing Yang, Rui Jiang, Wei Shao, Li Zhang.
"RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation." ArXiv (2024). [paper] [code] [2024.02] -
STLM: Chenghao Li, Lei Qi, Xin Geng.
"A SAM-guided Two-stream Lightweight Model for Anomaly Detection." ArXiv (2024). [paper] [2024.02] -
Kanyifeechukwu J. Oguine, Roger D. Soberanis-Mukul, Nathan Drenkow, Mathias Unberath.
"From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments." ArXiv (2024). [paper] [2024.02] -
VRP-SAM: Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li.
"VRP-SAM: SAM with Visual Reference Prompt." CVPR (2024). [paper] [2024.02] -
SAM-DiffSR: Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang.
"SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution." ArXiv (2024). [paper] [code] [2024.02] -
Jintao Ren, Mathis Rasmussen, Jasper Nijkamp, Jesper Grau Eriksen, Stine Korreman.
"Segment anything model for head and neck tumor segmentation with CT, PET and MRI multi-modality images." ICCR (2024). [paper] [2024.02] -
AdaSEEM: Jia Wan, Qiangqiang Wu, Wei Lin, Antoni B. Chan.
"Robust Unsupervised Crowd Counting and Localization with Adaptive Resolution SAM." ArXiv (2024). [paper] [2024.02] -
BLO-SAM: Li Zhang, Youwei Liang, Pengtao Xie.
"BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM." ICML (2024). [paper] [code] [2024.02] -
UN-SAM: Zhen Chen, Qing Xu, Xinyu Liu, Yixuan Yuan.
"UN-SAM: Universal Prompt-Free Segmentation for Generalized Nuclei Images." TMI (2024). [paper] [code] [2024.02] -
TV-SAM: Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Kang Li, Le Zhang.
"Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation." ArXiv (2024). [paper] [code] [2024.02] -
CoFRIDA: Peter Schaldenbrand, Gaurav Parmar, Jun-Yan Zhu, James McCann, Jean Oh.
"CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting." ArXiv (2024). [paper] [code] [2024.02] -
CVLM: Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang.
"Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment." ArXiv (2024). [paper] [2024.02] -
SAM-EDA: Wang, Ziquan, Yongsheng Zhang, Zhenchao Zhang, Zhipeng Jiang, Ying Yu, Li Li, and Lei Li.
"Exploring Semantic Prompts in the Segment Anything Model for Domain Adaptation." Remote Sensing (2024). [paper] [2024.02] -
DSAIL-TreeVision: Cedric Kiplimo and Collins Emasi Epege and Ciira wa Maina and Billy Okal.
"DSAIL-TreeVision: A software tool for extracting tree biophysical parameters from stereoscopic images." SoftwareX (2024). [paper] [code] [2024.02] -
Xia, Jiahao and Gong, Gavin and Liu, Jiawei and Zhu, Zhigang and Tang, Hao.
"Pedestrian-Accessible Infrastructure Inventory: Enabling and Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data for All Pedestrian Types." Journal of Imaging (2024). [paper] [2024.02] -
LIMP: Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris.
"Verifiably Following Complex Robot Instructions with Foundation Models." ArXiv (2024). [paper] [code] [2024.02] -
LMPC: Jacky Liang, Fei Xia, Wenhao Yu, et al.
"Learning to Learn Faster from Human Feedback with Language Model Predictive Control." ArXiv (2024). [paper] [code] [2024.02] -
Chong Di, Jie Gong.
"An AI-based approach to create spatial inventory of safety-related architectural features for school buildings." DBE (2024). [paper] [2024.02] -
WeakSAM: Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang.
"WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition." ArXiv (2024). [paper] [code] [2024.02] -
SeqAE: Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung.
"Subobject-level Image Tokenization." ArXiv (2024). [paper] [code] [2024.02] -
DeiSAM: Hikaru Shindo, Manuel Brack, Gopika Sudhakaran, Devendra Singh Dhami, Patrick Schramowski, Kristian Kersting.
"DeiSAM: Segment Anything with Deictic Prompting." NeurIPS (2024). [paper] [2024.02] -
OBJ-GSP: Wenxiao Cai, Wankou Yang.
"Object-level Geometric Structure Preserving for Natural Image Stitching." ArXiv (2024). [paper] [code] [2024.02] -
ISCUTE: Shir Kozlovsky, Omkar Joglekar, Dotan Di Castro.
"ISCUTE: Instance Segmentation of Cables Using Text Embedding." ArXiv (2024). [paper] [2024.02] -
MATT: James E. Gallagher, Aryav Gogia, Edward J. Oughton.
"A Multispectral Automated Transfer Technique (MATT) for machine-driven image labeling utilizing the Segment Anything Model (SAM)." ArXiv (2024). [paper] [2024.02] -
DPSM: Xin Zhang, Keren Fu, Qijun Zhao.
"Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification." ArXiv (2024). [paper] [2024.02] -
LaserSAM: Alexander Krawciw, Sven Lilge, Timothy D. Barfoot.
"LaserSAM: Zero-Shot Change Detection Using Visual Segmentation of Spinning LiDAR." ArXiv (2024). [paper] [2024.02] -
Zero SAM: Tal Shaharabany, Lior Wolf.
"Zero Shot Medical Image Segmentation Based on Sparse Prompt Using Finetuned SAM." ArXiv (2024). [paper] [2024.02] -
Aviad Dahan, Tal Shaharabany, Raja Giryes, Lior Wolf.
"Video Polyp Segmentation using Implicit Networks." ArXiv (2024). [paper] [2024.02] -
Gurunath Reddy, Dattesh D. Shanbhag, Deepa Anand, Uday Patil.
"Data Adaptive few-shot multi label segmentation with Foundation models." ArXiv (2024). [paper] [2024.02] -
Jiesi Hu, Yang Shang, Yanwu Yang, Guo Xutao, Hanyang Peng, Ting Ma.
"Synergizing In-context Learning Model and SAM in Medical Image Segmentation." ArXiv (2024). [paper] [2024.02] -
UnCLe SAM: Amin Ranem, Mohamed Afham Mohamed Aflal, Moritz Fuchs, Anirban Mukhopadhyay.
"UnCLe SAM: Unleashing SAM’s Potential for Continual Prostate MRI Segmentatio." ArXiv (2024). [paper] [2024.02] -
Lester: Ruben Tous.
"Lester: rotoscope animation through video object segmentation and tracking." ArXiv (2024). [paper] [2024.02] -
Fine-Tune Distillation: .
"Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance." ArXiv (2024). [paper] [code] [2024.02] -
YOLO + SAM: Henry Gann, Josiah Bull, Trevor Gee, Mahla Nejati.
"Improving Pallet Detection Using Synthetic Data." ACRA (2023). [paper] [2024.02] -
EfficientViT-SAM: Zhuoyang Zhang, Han Cai, Song Han.
"EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss." ArXiv (2024). [paper] [code] [2024.02] -
ClickSAM: Aimee Guo, Gace Fei, Hemanth Pasupuletic, Jing Wang.
"ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation." SPIE Medical Imaging Conference (2024). [paper] [2024.02] -
Iris-SAM: Parisa Farmanifard, Arun Ross.
"Iris-SAM: Iris Segmentation Using a Foundational Model." ArXiv (2024). [paper] [2024.02] -
CAT-SAM: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Shijian Lu.
"CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model." ECCV (2024). [paper] [code] [2024.02] -
COMRP: Zihan Ma, Yongshang Li, Ronggui Ma, Chen Liang.
"Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing." ArXiv (2024). [paper] [2024.02] -
SAM+SLIC: Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman T V, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem.
"Region-Based Representations Revisited." ArXiv (2024). [paper] [2024.02] -
Polyp-DAM: Zhuoran Zheng, Chen Wu, Wei Wang, Yeying Jin, Xiuyi Jia.
"Polyp-DAM: Polyp segmentation via depth anything model." ArXiv (2024). [paper] [code] [2024.02] -
AnyChange: Zhuo Zheng, Yanfei Zhong, Liangpei Zhang, Stefano Ermon.
"Segment Any Change." ArXiv (2024). [paper] [2024.02] -
Sureka Thiruchittampalam, Bikram P. Banerjee, Nancy F. Glenn, Simit Raval.
"Comparative Evaluation of Traditional and Deep Learning-Based Segmentation Methods for Spoil Pile Delineation Using UAV Images." ArXiv (2024). [paper] [2024.02] -
Hi-SAM: Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, Dacheng Tao.
"Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
Conv-LoRA: Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan.
"Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model." ICLR (2024). [paper] [2024.01] -
Kangcheng Liu, Xinhu Zheng, Chaoqun Wang, Hesheng Wang, Ming Liu, Kai Tang.
"Online Robot Navigation and and Manipulation with Distilled Vision-Language Models." ICRA (2024). [paper] [2024.01] -
MouSi: Xiaoran Fan, Tao Ji, Changhao Jiang, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang, Rui Zheng, Zhiheng Xi, Yuhao Zhou, Shihan Dou, Junjie Ye, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang.
"MouSi: Poly-Visual-Expert Vision-Language Models." ArXiv (2024). [paper] [code] [2024.01] -
SA-GS: Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang.
"Semantic Anything in 3D Gaussians." ArXiv (2024). [paper] [2024.01] -
SimAda: Yiran Song, Qianyu Zhou, Xuequan Lu, Zhiwen Shao, Lizhuang Ma.
"SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes." ArXiv (2024). [paper] [code] [2024.01] -
MESA: Yesheng Zhang, Xu Zhao.
"MESA: Matching Everything by Segmenting Anything." ArXiv (2024). [paper] [2024.01] -
MixSup: Yuxue Yang, Lue Fan, Zhaoxiang Zhang.
"MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection." ICLR (2024). [paper] [code] [2024.01] -
GEM: Jing Hao, Moyun Liu, Kuo Feng Hung.
"GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis." ArXiv (2024). [paper] [code] [2024.01] -
LoRA-SAM: Zehao Ye, Lucy Lovell, Asaad Faramarzi, Jelena Ninic.
"SAM-based instance segmentation models for the automation of masonry crack detection." ArXiv (2024). [paper] [2024.01] -
SSR: Yanqi Ge, Ye Huang, Wen Li, Lixin Duan.
"SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation." ArXiv (2024). [paper] [2024.01] -
ScaleFlow: Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao.
"General Flow as Foundation Affordance for Scalable Robot Learning." ArXiv (2024). [paper] [code] [2024.01] -
HAZARD: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan.
"HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments." ICLR (2024). [paper] [code] [2024.01] -
Laura J. Brooks , Daniel Pearce, Kenton Kwok , Nikhil Jawade , Man Qi, Erola Fenollosa , Deniz Beker, James Whicker, Katrina Davis, Roberto Salguero-G´omez, Robin Wang, and Steve Chappell.
"A video-rate hyperspectral camera for monitoring plant health and biodiversity." ArXiv (2024). [paper] [2024.01] -
SAM-OBC: Hu, Yixin and Qi, Zhixin and Zhou, Zhexun and Qin, Yan.
"Detection of Benggang in Remote Sensing Imagery through Integration of Segmentation Anything Model with Object-Based Classification." ArXiv (2024). [paper] [2024.01] -
OK-Robot: Peiqi Liu,Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto.
"OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics." ArXiv (2024). [paper] [code] [2024.01] -
Bowei Xue, Han Cheng, Qingqing Yang, Yi Wang, and Xiaoning He.
"Adapting Segment Anything Model to Aerial Land Cover Classification with Low Rank Adaptation." IEEE LGRS (2024). [paper] [2024.01] -
Peng Qian,Tomer Ullman.
"Shape Guides Visual Pretense." ArXiv (2024). [paper] [2024.01] -
MultiDance-Zero: Zhe Xu, Kun Wei, Xu Yang, Cheng Deng.
"Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons." ArXiv (2024). [paper] [2024.01] -
Vary-toy: Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang.
"Small Language Model Meets with Reinforced Vision Vocabulary." ArXiv (2024). [paper] [code] [2024.01] -
WildRGB-D: Hongchi Xia, Yang Fu, Sifei Liu, Xiaolong Wang.
"RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos." ArXiv (2024). [paper] [code] [2024.01] -
Tyche: Marianne Rakic, Hallee E. Wong, Jose Javier Gonzalez Ortiz, Beth Cimini, John Guttag, Adrian V. Dalca.
"Tyche: Stochastic In-Context Learning for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
Grounded SAM: Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang.
"Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks." ArXiv (2024). [paper] [code] [2024.01] -
TriSAM: Jia Wan, Wanhua Li, Atmadeep Banerjee, Jason Ken Adhinarta, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei.
"TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images." ArXiv (2024). [paper] [2024.01] -
Kesi Xu, Lea Goetz, Nasir Rajpoot.
"On generalisability of segment anything model for nuclear instance segmentation in histology images." MIUA (2023). [paper] [2024.01] -
PA-SAM: Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang.
"PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
SAC: Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Junzhou Huang.
"Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation." ArXiv (2024). [paper] [2024.01] -
ClipSAM: Shengze Li, Jianjian Cao, Peng Ye, Yuhan Ding, Chongjun Tu, Tao Chen.
"ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
SegmentAnyBone: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski.
"SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI." ArXiv (2024). [paper] [code] [2024.01] -
Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux.
"A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models." ArXiv (2024). [paper] [2024.01] -
Tunnel SAM Adapter: Chen, Junxin and Yu, Xiaojie and Liu, Shichang and Chen, Tao and Wang, Wei and Jeon, Gwanggil and He, Ben-Guo.
"Tunnel SAM Adapter: Adapting Segment Anything Model for Tunnel Water Leakage Inspection." Geohazard Mechanics (2024). [paper] [2024.01] -
GEMO: Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping Che, Qinru Qiu, Chi Harold Liu, Jian Tang.
"An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation." ArXiv (2024). [paper] [2024.01] -
Zhan, Youyi and Wang, Tuanfeng Y. and Shao, Tianjia and Zhou, Kun.
"Pattern Guided UV Recovery for Realistic Video Garment Texturing." ArXiv (2024). [paper] [2024.01] -
Efficient4D: Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang.
"Fast Dynamic 3D Object Generation from a Single-view Video." ArXiv (2024). [paper] [code] [2024.01] -
Hangbin Zheng, Shimin Liu, Hengjun Zhang, Jiayi Yu and Jinsong Bao.
"Visual-triggered contextual guidance for lithium battery disassembly: a multi-modal event knowledge graph approach." ArXiv (2024). [paper] [2024.01] -
Chenghao Lu , Emmanuel Nnadozie, Moritz Paul Camenzind, Yuncai Hu and Kang Yu.
"Maize plant detection using UAV-based RGB imaging and YOLOv5." Frontiers in Plant Science (2024). [paper] [2024.01] -
OMG-Seg: Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy.
"OMG-Seg: Is One Model Good Enough For All Segmentation?." ArXiv (2024). [paper] [code] [2024.01] -
RAP-SAM: Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang.
"RAP-SAM: Towards Real-Time All-Purpose Segment Anything." ArXiv (2024). [paper] [code] [2024.01] -
PRS: Chen-Bin Feng, Qi Lai, Kangdao Liu, Houcheng Su, Chi-Man Vong.
"Boosting Few-Shot Semantic Segmentation Via Segment Anything Model." ArXiv (2024). [paper] [2024.01] -
Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna Liljedahl, Chandi Witharana, Yili Yang, Brendan M. Rogers, Samantha T. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis.
"Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping." ArXiv (2024). [paper] [2024.01] -
SAM-MCD: Hongruixuan Chen, Jian Song, Naoto Yokoya.
"Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)." ArXiv (2024). [paper] [2024.01] -
GARField: Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa.
"GARField: Group Anything with Radiance Fields." ArXiv (2024). [paper] [code] [2024.01] -
CPAB: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma.
"Continuous Piecewise-Affine Based Motion Model for Image Animation." ArXiv (2024). [paper] [code] [2024.01] -
SAM4UDASS: Weihao Yan, Yeqiang Qian, Xingyuan Chen, Hanyang Zhuang, Chunxiang Wang, Ming Yang.
"SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent Vehicles." ArXiv (2024). [paper] [code] [2024.01] -
Forge_VFM4AD: Xu Yan, Haiming Zhang, Yingjie Cai, Jingming Guo, Weichao Qiu, Bin Gao, Kaiqiang Zhou, Yue Zhao, Huan Jin, Jiantao Gao, Zhen Li, Lihui Jiang, Wei Zhang, Hongbo Zhang, Dengxin Dai, Bingbing Liu.
"Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities." ArXiv (2024). [paper] [code] [2024.01] -
Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon.
"Foundation Models for Biomedical Image Segmentation: A Survey." ArXiv (2024). [paper] [2024.01] -
SAM-OIL: Wenhui Wu, Man Sing Wong, Xinyu Yu, Guoqiang Shi, Coco Yin Tung Kwok, Kang Zou.
"Compositional Oil Spill Detection Based on Object Detector and Adapted Segment Anything Model from SAR Images." ArXiv (2024). [paper] [2024.01] -
UV-SAM: Xin Zhang, Yu Liu, Yuming Lin, Qingming Liao, Yong Li.
"UV-SAM: Adapting Segment Anything Model for Urban Village Identification." AAAI (2024). [paper] [code] [2024.01] -
“AttEN”: Ching-Hao Chiu, Yu-Jen Chen, Yawen Wu, Yiyu Shi, Tsung-Yi Ho.
"Achieve Fairness without Demographics for Dermatological Disease Diagnosis." ArXiv (2024). [paper] [2024.01] -
LandmarkBreaker: Yuezun Li and Pu Sun and Honggang Qi and Siwei Lyu.
"LandmarkBreaker: A proactive method to obstruct DeepFakes via disrupting facial landmark extraction." CVIU (2024). [paper] [2024.01] -
GSC: Luis Bolanos, Shih-Yang Su, Helge Rhodin.
"Gaussian Shadow Casting for Neural Characters." ArXiv (2024).
[paper] [2024.01] -
Liu, Yue, Tao Sun, Kaixing Wu, Hongwei Zhang, Jingwei Zhang, Xinwen Jiang, Quanwei Lin, and Mei Feng..
"Fractal-Based Pattern Quantification of Mineral Grains: A Case Study of Yichun Rare-Metal Granite." Fractal and Fractional (2024). [paper] [2024.01] -
SD-MVS: Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaoqi Wang.
"SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization." AAAI (2024). [paper] [2024.01] -
SamLP: Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang.
"SamLP: A Customized Segment Anything Model for License Plate Detection." ArXiv (2024). [paper] [code] [2024.01] -
RePLan: Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kourosh Darvish, Alán Aspuru-Guzik, Animesh Garg.
"RePLan: Robotic Replanning with Perception and Language Models." ArXiv (2024). [paper] [code] [2024.01] -
SOS-SLAM: Jouko Kinnari, Annika Thomas, Parker Lusk, Kota Kondo, Jonathan P. How.
"SOS-SLAM: Segmentation for Open-Set SLAM in Unstructured Environments." ArXiv (2024). [paper] [code] [2024.01] -
LRV: Yunhua Zhang, Hazel Doughty, Cees G.M. Snoek.
"Low-Resource Vision Challenges for Foundation Models." ArXiv (2024). [paper] [code] [2024.01] -
PartSTAD: Hyunjin Kim, Minhyuk Sung.
"PartSTAD: 2D-to-3D Part Segmentation Task Adaptation." ArXiv (2024). [paper] [2024.01] -
MatSAM: Changtai Li, Xu Han, Chao Yao, Xiaojuan Ban.
"MatSAM: Efficient Materials Microstructure Extraction via Visual Large Model." ArXiv (2024). [paper] [2024.01] -
Galib Muhammad Shahriar Himel, Md. Masudul Islam, Kh Abdullah Al-Aff, Shams Ibne Karim, Md. Kabir Uddin Sikder.
"Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-based Non-invasive Digital System." IJBI (2024). [paper] [2024.01] -
SSPrompt: Learning to Prompt Segment Anything Models.
"Learning to Prompt Segment Anything Models." ArXiv (2024). [paper] [2024.01] -
SBSM: Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu.
"Learning the 3D Fauna of the Web." ArXiv (2023). [paper] [code] [2024.01] -
DeepBID: Binglin Shen, Chenggui Luo, Wen Pang, Yajing Jiang, Wenbo Wu, Rui Hu, Junle Qu, Bobo Gu, Liwei Liu.
"Surmounting photon limits and motion artifacts for biological dynamics imaging via dual-perspective self-supervised learning." PhotoniX (2024). [paper] [2024.01] -
DSALVANet: Jinghui He, Bo Liu, Fan Cao, Jian Xu, Yanshan Xiao.
"Few-Shot Object Counting with Dynamic Similarity-Aware in Latent Space." TGRS (2024). [paper] [code] [2024.01] -
Fengtian Lu, Yuzhi Li, Feng Tian.
"Exploring challenge and explainable shot type classification using SAM-guided approaches." SIVP (2024). [paper] [2024.01] -
RoboFusion: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang.
"RoboFusion: Towards Robust Multi-Modal 3D obiect Detection via SAM." ArXiv (2024). [paper] [2024.01] -
SAM4MIS: Yichi Zhang, Zhenrong Shen, Rushi Jiao.
"Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions." ArXiv (2024). [paper] [code] [2024.01] -
OV-SAM: Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy.
"Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively." ECCV (2024). [paper] [project page] [code] [2024.01] -
DSR : Yanni Wang, Hecheng Jia, Shilei Fu, Huiping Lin, Feng Xu.
"Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer." ArXiv (2024). [paper] [2024.01] -
Thomas Lips, Victor-Louis De Gusseme, Francis wyffels.
"Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data." ArXiv (2024). [paper] [code] [2024.01] -
SwinSAM: Zhoushan Feng, Yuliang Zhanga, Yanhong Chenc, Yu Liua, Wen Sunc , Lili Dua, Dunjin Chen.
"SwinSAM: Fine-Grained Polyp Segmentation in Colonoscopy Images via Segment Anything Model Integrated with a Swin Transformer Decoder." ArXiv (2024). [paper] [2024.01] -
BA-SAM: Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma.
"BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model." CVPR (2024). [paper] [2024.01] -
SAMMed: Hanhui Wang, Huaize Ye, Yi Xia, Xueyan Zhang.
"Leveraging SAM for Single-Source Domain Generalization in Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
CWSAM: Xinyang Pu, Hecheng Jia, Linghao Zheng, Feng Wang, Feng Xu.
"ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation." ArXiv (2024). [paper] [code] [2024.01] -
UCAD: Jiaqi Liu, Kai Wu, Qiang Nie, Ying Chen, Bin-Bin Gao, Yong Liu, Jinbao Wang, Chengjie Wang, Feng Zheng.
"Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt." AAAI (2024). [paper] [code] [2024.01] -
TrackGPT: Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie.
"Tracking with Human-Intent Reasoning." ArXiv (2023). [paper] [code] [2023.12] -
Wild2Avatar: Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli.
"Wild2Avatar: Rendering Humans Behind Occlusions." ArXiv (2023). [paper] [code] [2023.12] -
IS5Net: Xianjie Liu, Keren Fu, Qijun Zhao.
"Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation." ArXiv (2023). [paper] [2023.12] -
DN-SLAM: Chenyu Ruan; Qiuyu Zang; Kehua Zhang; Kai Huang.
"DN-SLAM: A Visual SLAM with ORB Features and NeRF Mapping in Dynamic Environments." IEEE Sensors Journal (2024). [paper] [2023.12] -
ZONE: Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang.
"ZONE: Zero-Shot Instruction-Guided Local Editing." ArXiv (2023). [paper] [2023.12] -
Segment3D: Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann.
"Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels." ArXiv (2023). [paper] [code] [2023.12] -
Unified-IO 2: Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi.
"Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action." ArXiv (2023). [paper] [code] [2023.12] -
EventSAM: Zhiwen Chen, Zhiyu Zhu, Yifan Zhang, Junhui Hou, Guangming Shi, Jinjian Wu.
"Segment Any Events via Weighted Adaptation of Pivotal Tokens." CVPR (2024). [paper] [code] [2023.12] -
SCM: Xiaoliang Tan, Guanzhou Chen, Tong Wang, Jiaqi Wang, Xiaodong Zhang.
"Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings." ArXiv (2023). [paper] [code] [2023.12] -
SAM-G: Ziyu Wang, Yanjie Ze, Yifei Sun, Zhecheng Yuan, Huazhe Xu.
"Generalizable Visual Reinforcement Learning with Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12] -
SAT-Nano: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie.
"One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts." ArXiv (2023). [paper] [code] [2023.12] -
TTP: Keyan Chen, Chengyang Liu, Wenyuan Li, Zili Liu, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi.
"Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection." ArXiv (2023). [paper] [code] [2023.12] -
UniRef++: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo.
"UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces." ArXiv (2023). [paper] [code] [2023.12] -
LangSplat: Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister.
"LangSplat: 3D Language Gaussian Splatting." CVPR (2024). [paper] [project] [code] [2023.12] -
HRFFM: Yan Han, Xiaogang Xu, Yingqi Lin, Jiafei Wu, Zhe Liu.
"Video Frame Interpolation with Region-Distinguishable Priors from SAM." ArXiv (2023). [paper] [2023.12] -
MSCL: Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li.
"Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning." NLPCC (2023). [paper] [2023.12] -
SAPNet: Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han.
"Semantic-aware SAM for Point-Prompted Instance Segmentation." ArXiv (2023). [paper] [2023.12] -
Dingkun Guo.
"Learning Multi-Step Manipulation Tasks from A Single Human Demonstration." ArXiv (2023). [paper] [code] [2023.12] -
ASSISTGUI: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou.
"ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation." ArXiv (2023). [paper] [2023.12] -
VStar: Penghao Wu, Saining Xie.
"V∗: Guided Visual Search as a Core Mechanism in Multimodal LLMs." ArXiv (2023). [paper] [code] [2023.12] -
FM-OV3D: Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang.
"FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection." AAAI (2024). [paper] [code] [2023.12] -
SP-SAM: Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, Zongyuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang.
"Part to Whole: Collaborative Prompting for Surgical Instrument Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
Customize-It-3D: Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang.
"Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior." ArXiv (2023). [paper] [code] [2023.12] -
Ins-HOI: Jiajun Zhang, Yuxiang Zhang, Hongwen Zhang, Boyao Zhou, Ruizhi Shao, Zonghai Hu, Yebin Liu.
"Ins-HOI: Instance Aware Human-Object Interactions Recovery." ArXiv (2023). [paper] [code] [2023.12] -
GPS: Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang.
"Gradient-based Parameter Selection for Efficient Fine-Tuning." ArXiv (2023). [paper] [2023.12] -
PixelLLM: Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid.
"Pixel Aligned Language Models." ArXiv (2023). [paper] [code] [2023.12] -
VectorTalker: Hao Hu, Xuan Wang, Jingxiang Sun, Yanbo Fan, Yu Guo, Caigui Jiang.
"VectorTalker: SVG Talking Face Generation with Progressive Vectorisation." ArXiv (2023). [paper] [2023.12] -
Open3DIS: Phuc D.A. Nguyen, Tuan Duc Ngo, Chuang Gan, Evangelos Kalogerakis, Anh Tran, Cuong Pham, Khoi Nguyen.
"Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance." ArXiv (2023). [paper] [code] [2023.12] -
RadOcc: Haiming Zhang, Xu Yan, Dongfeng Bai, Jiantao Gao, Pan Wang, Bingbing Liu, Shuguang Cui, Zhen Li.
"RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation." AAAI (2024). [paper] [2023.12] -
CreativeConnect: DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, Juho Kim.
"CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI." ArXiv (2023). [paper] [2023.12] -
Fei Pan, Sangryul Jeon, Brian Wang, Frank Mckenna, Stella X. Yu.
"Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models." WACV (2024). [paper] [2023.12] -
MSFM: Shijian Zheng and Rujing Wang and Shitao Zheng and Fenmei Wang and Liusan Wang and Zhigui Liu.
"A Multi-scale feature modulation network for efficient underwater image enhancement." JKSUCI (2023). [paper] [code] [2023.12] -
Pranjay Shyam, HyunJin Yoo.
"Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in Adverse Weather Conditions." WACV (2024). [paper] [2023.12] -
Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar.
"SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images." WACV (2024). [paper] [2023.12] -
PMVC: Chushan Zhang, Jinguang Tong, Tao Jun Lin, Chuong Nguyen, Hongdong Li.
"PMVC: Promoting Multi-View Consistency for 3D Scene Reconstruction." WACV (2024). [paper] [2023.12] -
BBPM: Colbert, Zachery Morton and Arrington, Daniel and Foote, Matthew and Gårding, Jonas and Fay, Dominik and Huo, Michael and Pinkham, Mark and Ramachandran, Prabhakar.
"Repurposing Traditional U-Net Predictions for Sparse SAM Prompting in Medical Image Segmentation." BPEE (2023). [paper] [2023.12] -
Hi-Viscont: Weiwei Gu, Anant Sah, Nakul Gopalan.
"Interactive Visual Task Learning for Robots." AAAI (2024). [paper] [2023.12] -
Sushil Sharma, Aryan Singh, Ganesh Sistu, Mark Halton, Ciarán Eising.
"Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach." EI-AVM (2024). [paper] [2023.12] -
Loitering: Johnny Núñez, Zenjie Li, Sergio Escalera, Kamal Nasrollahi.
"Identifying Loitering Behavior with Trajectory Analysis." WACV Workshop (2024). [paper] [code] [2023.12] -
Emu2: Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang.
"Generative Multimodal Models are In-Context Learners." ArXiv (2023). [paper] [homepage] [code] [2023.12] -
Giorgos Savathrakis, Antonis Argyros.
"An Automated Method for the Creation of Oriented Bounding Boxes in Remote Sensing Ship Detection Datasets." WACV Workshop (2024). [paper] [2023.12] -
TinySAM: Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen.
"TinySAM: Pushing the Envelope for Efficient Segment Anything Model." ArXiv (2023). [paper] [code] [weights] [2023.12] -
SRIN: Haoxing Chen, Yaohui Li, Zhangxuan Gu, Zhuoer Xu, Jun Lan, Huaxiong Li.
"Segment Anything Model Meets Image Harmonization." ICASSP (2024). [paper] [2023.12] -
José Guilherme de Almeida, Nuno M. Rodrigues, Sara Silva, Nickolas Papanikolaou.
"Testing the Segment Anything Model on radiology data." ArXiv (2023). [paper] [2023.12] -
WSOVOD: Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao.
"Weakly Supervised Open-Vocabulary Object Detection." AAAI (2024). [paper] [code] [2023.12] -
SAI3D: Yingda Yin, Yuzheng Liu, Yang Xiao, Daniel Cohen-Or, Jingwei Huang, Baoquan Chen.
"SAI3D: Segment Any Instance in 3D Scenes." ArXiv (2023). [paper] [code] [2023.12] -
SAMBA: Mohannad Barakat, Noha Magdy, Jjuuko George William, Ethel Phiri, Raymond Confidence, Dong Zhang, Udunna C Anazodo.
"Towards SAMBA: Segment Anything Model for Brain Tumor Segmentation in Sub-Sharan African Populations." ArXiv (2023). [paper] [2023.12] -
EVI-SAM: Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu.
"EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping." ArXiv (2023). [paper] [2023.12] -
GSVA: Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang.
"GSVA: Generalized Segmentation via Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.12] -
ABR: Junyu Xie, Weidi Xie, Andrew Zisserman.
"Appearance-based Refinement for Object-Centric Motion Segmentation." ArXiv (2023). [paper] [2023.12] -
Yixin Zhang, Shen Zhao, Hanxue Gu, Maciej A. Mazurowski.
"How to Efficiently Annotate Images for Best-Performing Deep Learning Based Segmentation Models: An Empirical Study with Weak and Noisy Annotations and Segment Anything Model." ArXiv (2023). [paper] [2023.12] -
Isabelle Tingzon, Nuala Margaret Cowan, Pierre Chrzanowski.
"Mapping Housing Stock Characteristics from Drone Images for Climate Resilience in the Caribbean." NeurIPS Workshop (2023). [paper] [2023.12] -
TIFace: Ruijie Zhu, Jiahao Chang, Ziyang Song, Jiahuan Yu, Tianzhu Zhang.
"TIFace: Improving Facial Reconstruction through Tensorial Radiance Fields and Implicit Surfaces." ICCV Workshop (2023). [paper] [code] [2023.12] -
Osprey: Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu.
"Osprey: Pixel Understanding with Visual Instruction Tuning." CVPR (2024). [paper] [code] [2023.12] -
SQA-SAM: Yizhe Zhang, Shuo Wang, Tao Zhou, Qi Dou, Danny Z. Chen.
"SQA-SAM: Segmentation Quality Assessment for Medical Images Utilizing the Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12] -
CLOUDS: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière.
"Collaborating Foundation models for Domain Generalized Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
MobileSAMv2: Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, Choong Seon Hong.
"MobileSAMv2: Faster Segment Anything to Everything." ArXiv (2023). [paper] [code] [2023.12] -
Alpha-CLIP: Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang.
"Alpha-CLIP: A CLIP Model Focusing on Wherever You Want." CVPR (2024). [paper] [homepage] [code] [2023.12] -
WonderJourney: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann.
"WonderJourney: Going from Anywhere to Everywhere." ArXiv (2023). [paper] [code] [2023.12] -
MobileSAM-Track: Liu, Yehui, Yuliang Zhao, Xinyue Zhang, Xiaoai Wang, Chao Lian, Jian Li, Peng Shan, Changzeng Fu, Xiaoyong Lyu, Lianjiang Li, and et al.
"MobileSAM-Track: Lightweight One-Shot Tracking and Segmentation of Small Objects on Edge Devices." Remote Sensing (2023). [paper] [2023.12] -
IT3DEgo: Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes.
"Instance Tracking in 3D Scenes from Egocentric Videos." ArXiv (2023). [paper] [code] [2023.12] -
Rein: Zhixiang Wei, Lin Chen, Yi Jin, Xiaoxiao Ma, Tianle Liu, Pengyang Lin, Ben Wang, Huaian Chen, Jinjin Zheng.
"Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation." CVPR (2024). [paper] [code] [2023.12] -
MR.HARM: Hongzhan Lin, Ziyang Luo, Jing Ma, Long Chen.
"Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models." EMNLP (2023). [paper] [2023.12] -
ControlRoom3D: Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou.
"ControlRoom3D: Room Generation using Semantic Proxy Rooms." ArXiv (2023). [paper] [code] [2023.12] -
SmartEdit: Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan.
"SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.12] -
COMBO: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang.
"Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
Vary: Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang.
"Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models." ArXiv (2023). [paper] [code] [2023.12] -
Annolid: Chen Yang, Jeremy Forest, Matthew Einhorn, Thomas A. Cleland.
"Automated Behavioral Analysis Using Instance Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
FIND: Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang.
"Interfacing Foundation Models’ Embeddings." ArXiv (2023). [paper] [code] [2023.12] -
ScribblePrompt: Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca.
"ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Medical Image." ArXiv (2023). [paper] [code] [2023.12] -
MWSIS: Guangfeng Jiang, Jun Liu, Yuzhi Wu, Wenlong Liao, Tao He, Pai Peng.
"MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving." ArXiv (2023). [paper] [code] [2023.12] -
Mask as Supervision: Yuchen Yang, Yu Qiao, Xiao Sun.
"Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation." ArXiv (2023). [paper] [code] [2023.12] -
IPSL: Yujun Chen, Xin Tan, Zhizhong Zhang, Yanyun Qu, Yuan Xie.
"Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation." ArXiv (2023). [paper] [2023.12] -
SESAME: Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell.
"See, Say, and Segment: Teaching LMMs to Overcome False Premises." ArXiv (2023). [paper] [code] [2023.12] -
RefCOCOm: Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu.
"Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
TAP: Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan.
"Tokenize Anything via Prompting." ECCV (2024). [paper] [code] [2023.12] -
Josh Stein, Maxime Di Folco, Julia A. Schnabel.
"Influence of Prompting Strategies on Segment Anything Model (SAM) for Short-axis Cardiac MRI segmentation." ArXiv (2023). [paper] [2023.12] -
SAM-Graph: Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou.
"SAM-guided Graph Cut for 3D Instance Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
ASLseg: Shiyun Chen, Li Lin, Pujin Cheng, Xiaoying Tang.
"ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation." ArXiv (2023). [paper] [2023.12] -
GenSAM: Jian Hu, Jiayi Lin, Weitong Cai, Shaogang Gong.
"Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects." AAAI (2024). [paper] [code] [2023.12] -
AM-RADIO: Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov.
"AM-RADIO: Agglomerative Model – Reduce All Domains Into One." CVPR (2024). [paper] [code] [2023.12] -
SqueezeSAM: Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra.
"SqueezeSAM: User friendly mobile interactive segmentation." ArXiv (2023). [paper] [2023.12] -
EdgeSAM: Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai.
"EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM." ArXiv (2023). [paper] [code] [2023.12] -
SeCo: Dong Zhao, Ruizhi Yang, Shuang Wang, Qi Zang, Yang Hu, Licheng Jiao, Nicu Sebe, Zhun Zhong.
"Semantic Connectivity-Driven Pseudo-labeling for Cross-domain Segmentation." ArXiv (2023). [paper] [code] [2023.12] -
SemiSAM: Yichi Zhang, Yuan Cheng, Yuan Qi.
"SemiSAM: Exploring SAM for Enhancing Semi-Supervised Medical Image Segmentation with Extremely Limited Annotations." ArXiv (2023). [paper] [2023.12] -
RepViT-SAM: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding.
"RepViT-SAM: Towards Real-Time Segmenting Anything." ArXiv (2023). [paper] [code] [2023.12] -
SlimSAM: Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang.
"0.1% Data Makes Segment Anything Slim." NeurIPS (2024). [paper] [code] [2023.12] -
CAR: Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He.
"CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval." ArXiv (2023). [paper] [2023.12] -
HOLD: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges.
"HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video." ArXiv (2023). [paper] [code] [2023.12] -
ViP-LLaVA: Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
"Making Large Multimodal Models Understand Arbitrary Visual Prompts." ArXiv (2023). [paper] [code] [2023.12] -
SPEC: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu.
"Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding." ArXiv (2023). [paper] [code] [2023.12] -
HiFi Tuner: Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou.
"HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models." ArXiv (2023). [paper] [2023.12] -
TrafficMOT: Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero.
"TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios." ArXiv (2023). [paper] [2023.12] -
VideoBooth: Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu.
"VideoBooth: Diffusion-based Video Generation with Image Prompts." ArXiv (2023). [paper] [code] [2023.12] -
Portrait Diffusion: Jin Liu, Huaibo Huang, Chao Jin, Ran He.
"Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting." ArXiv (2023). [paper] [code] [2023.12] -
NPGs: Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen.
"Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction." ArXiv (2023). [paper] [2023.12] -
Diffusion Handles: Karran Pandey, Paul Guerrero, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh, Niloy Mitra.
"Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D." ArXiv (2023). [paper] [code] [2023.12] -
VLTSeg: Christoph Hümmer, Manuel Schwonberg, Liangwei Zhong, Hu Cao, Alois Knoll, Hanno Gottschalk.
"VLTSeg: Simple Transfer of CLIP-Based Vision-Language Representations for Domain Generalized Semantic Segmentation." ArXiv (2023). [paper] [2023.12] -
CustomNeRF: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu.
"Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training." ArXiv (2023). [paper] [code] [2023.12] -
Yilin Ye, Qian Zhu, Shishi Xiao, Kang Zhang, Wei Zeng.
"The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language Model." CSCW (2024). [paper] [2023.12] -
StoryGPT-V: Xiaoqian Shen, Mohamed Elhoseiny.
"Large Language Models as Consistent Story Visualizers." ArXiv (2023). [paper] [code] [2023.12] -
PixelLM: Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin.
"PixelLM: Pixel Reasoning with Large Multimodal Model." ArXiv (2023). [paper] [code] [2023.12] -
SAGE: Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas.
"SAGE: Bridging Semantic and Actionable Parts for GEneralizable Articulated-Object Manipulation under Language Instructions." ArXiv (2023). [paper] [code] [2023.12] -
TranSegPGD: Xiaojun Jia, Jindong Gu, Yihao Huang, Simeng Qin, Qing Guo, Yang Liu, Xiaochun Cao.
"TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation." ArXiv (2023). [paper] [2023.12] -
MANUS: Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar.
"MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians." ArXiv (2023). [paper] [code] [2023.12] -
Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, Feng Liu.
"Fast View Synthesis of Casual Videos with Soup-of-Planes." ArXiv (2023). [paper] [code] [2023.12] -
UniLSeg: Yong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang.
"Universal Segmentation at Arbitrary Granularity with Language Instruction." ArXiv (2023). [paper] [code] [2023.12] -
LooseControl: Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka.
"LooseControl: Lifting ControlNet for Generalized Depth Conditioning." ArXiv (2023). [paper] [code] [2023.12] -
Yankun Wu, Yuta Nakashima, Noa Garcia.
"Stable Diffusion Exposed: Gender Bias from Prompt to Image." ArXiv (2023). [paper] [2023.12] -
Drag-A-Video: Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li, Xihui Liu.
"Drag-A-Video: Non-rigid Video Editing with Point-based Interaction." ArXiv (2023). [paper] [code] [2023.12] -
SAVE: Yeji Song, Wonsik Shin, Junsoo Lee, Jeesoo Kim, Nojun Kwak.
"SAVE: Protagonist Diversification with Structure Agnostic Video Editing." ArXiv (2023). [paper] [code] [2023.12] -
RA-SRGT: Mengke Song, Linfeng Li, Dunquan Wu, Wenfeng Song, Chenglizhao Chen.
"Rethinking Object Saliency Ranking: A Novel Whole-flow Processing Paradigm." IEEE TIP (2023). [paper] [code] [2023.12] -
TokenCompose: Zirui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu.
"TokenCompose: Grounding Diffusion with Token-level Supervision." ArXiv (2023). [paper] [code] [2023.12] -
FoodFusion: Olivia Markham, Yuhao Chen, Chi-en Amy Tai, Alexander Wong.
"FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation." ArXiv (2023). [paper] [code] [2023.12] -
CrackSAM: Kang Ge, Chen Wang, Yutao Guo.
"Fine-tune vision foundation model for crack segmentation in civil infrastructures." ArXiv (2023). [paper] [2023.12] -
SAMBA: Ronan Docherty, Isaac Squires, Antonis Vamvakeros, Samuel J. Cooper.
"SAMBA: A Trainable Segmentation Web-App with Smart Labelling." ArXiv (2023). [paper] [code] [2023.12] -
Israt Zarin Era, Imtiaz Ahmed, Zhichao Liu, Srinjoy Das.
"An unsupervised approach towards promptable defect segmentation in laser-based additive manufacturing by Segment Anything." ArXiv (2023). [paper] [2023.12] -
PartSLIP++: Yuchen Zhou, Jiayuan Gu, Xuanlin Li, Minghua Liu, Yunhao Fang, Hao Su.
"PartSLIP++: Enhancing Low-Shot 3D Part Segmentation via Multi-View Instance Segmentation and Maximum Likelihood Estimation." ArXiv (2023). [paper] [code] [2023.12] -
Sambor: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian.
"Boosting Segment Anything Model Towards Open-Vocabulary Learning." ArXiv (2023). [paper] [code] [2023.12] -
SAMS: Xiaobo Yang, Xiaojin Gong.
"Foundation Model Assisted Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [2023.12] -
WeSAM: Haojie Zhang, Yongyi Su, Xun Xu, Kui Jia.
"Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation." CVPR (2024). [paper] [code] [2023.12] -
Feature 3DGS: Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi.
"Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields." ArXiv (2023). [paper] [code] [2023.12] -
AI-SAM: Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang.
"AI-SAM: Automatic and Interactive Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12] -
SSRS: Xianping Ma, Qianqian Wu, Xingyu Zhao, Xiaokang Zhang, Man-On Pun, Bo Huang.
"SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints." ArXiv (2023). [paper] [code] [2023.12] -
GranSAM: Rohit Kundu, Sudipta Paul, Rohit Lal, Amit K. Roy-Chowdhury.
"Towards Granularity-adjusted Pixel-level Semantic Annotation." ArXiv (2023). [paper] [2023.12] -
SAGA: Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian.
"Segment Any 3D Gaussians." ArXiv (2023). [paper] [homepage] [2023.12] -
APE: Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji.
"Aligning and Prompting Everything All at Once for Universal Visual Perception." ArXiv (2023). [paper] [code] [2023.12] -
SANeRF-HQ: Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai.
"SANeRF-HQ: Segment Anything for NeRF in High Quality." ArXiv (2023). [paper] [code] [2023.12] -
ObjectChangeDetection: Aikaterini Adam, Konstantinos Karantzalos, Lazaros Grammatikopoulos, Torsten Sattler.
"Has Anything Changed? 3D Change Detection by 2D Segmentation Masks." ArXiv (2023). [paper] [code] [2023.12] -
SCA: Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu.
"Segment and Caption Anything." ArXiv (2023). [paper] [code] [2023.12] -
EfficientSAM: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra.
"EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything." ArXiv (2023). [paper] [2023.12] -
U-BDD++: Yiyun Zhang, Zijian Wang, Yadan Luo, Xin Yu, Zi Huang.
"Learning Efficient Unsupervised Satellite Image-based Building Damage Detection." ArXiv (2023). [paper] [code] [2023.12] -
Gaussian Grouping: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke.
"Gaussian Grouping: Segment and Edit Anything in 3D Scenes." ECCV (2024). [paper] [code] [2023.12] -
SAM-CLNet: Yiming Zhao, Tao Zhou, Yunqi Gu, Yi Zhou, Yizhe Zhang, Ye Wu, Huazhu Fu.
"Segment Anything Model-guided Collaborative Learning Network for Scribble-supervised Polyp Segmentation." ArXiv (2023). [paper] [2023.12] -
S2M: Wenjie Zhao, Jia Li, Xin Dong, Yu Xiang, Yunhui Guo.
"Segment Every Out-of-Distribution Object." ArXiv (2023). [paper] [2023.11] -
ZeroPS: Yuheng Xue, Nenglun Chen, Jun Liu, Wenyun Sun.
"ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation." ArXiv (2023). [paper] [2023.11] -
GigaPose: Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit.
"GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence." ArXiv (2023). [paper] [code] [2023.11] -
ToddlerDiffusion: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny.
"ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model." ArXiv (2023). [paper] [2023.11] -
GaussianEditor: Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin.
"GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting." ArXiv (2023). [paper] [code] [2023.11] -
SEGIC: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang.
"SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
Nguyen, Le Quan, Jihye Shin, Sanghuyn Ryu, L. Minh Dang, Han Yong Park, O New Lee, and Hyeonjoon Moon.
"Innovative Cucumber Phenotyping: A Smartphone-Based and Data-Labeling-Free Model." ArXiv (2023). [paper] [2023.11] -
Nicholas Lui, Bryan Chia, William Berrios, Candace Ross, Douwe Kiela.
"Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision." ArXiv (2023). [paper] [2023.11] -
RO-LLaMA: Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye.
"RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization." ArXiv (2023). [paper] [2023.11] -
SiTH: Hsuan-I Ho, Jie Song, Otmar Hilliges.
"SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion." ArXiv (2023). [paper] [code] [2023.11] -
GaussianEditor: Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian.
"GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions." ArXiv (2023). [paper] [homepage] [2023.11] -
VLPrompt: Zijian Zhou, Miaojing Shi, Holger Caesar.
"VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation." ArXiv (2023). [paper] [2023.11] -
MotionZero: Sitong Su, Litao Guo, Lianli Gao, Hengtao Shen, Jingkuan Song.
"MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation." ArXiv (2023). [paper] [2023.11] -
SEED-Bench-2: Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan.
"SEED-Bench-2: Benchmarking Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.11] -
MLKG: Shupeng Cheng, Ge-Peng Ji, Pengda Qin, Deng-Ping Fan, Bowen Zhou, Peng Xu.
"Large Model Based Referring Camouflaged Object Detection." ArXiv (2023). [paper] [2023.11] -
APAP: Seungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung.
"As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors." ArXiv (2023). [paper] [homepage] [2023.11] -
ROSO: Yusuke Miyashita, Dimitris Gahtidis, Colin La, Jeremy Rabinowicz, Jurgen Leitner.
"ROSO: Improving Robotic Policy Inference via Synthetic Observations." ACRA (2023). [paper] [code] [2023.11] -
Exo2EgoDVC: Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato.
"Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos." ArXiv (2023). [paper] [2023.11] -
ESAM: Chengwen Zhang, Yingwei Zhao.
"Efficient SAM for Medical Image Analysis." ArXiv (2023). [paper] [2023.11] -
MMA-Diffusion: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Nan Xu, Qiang Xu.
"MMA-Diffusion: MultiModal Attack on Diffusion Models." ArXiv (2023). [paper] [2023.11] -
LLM-State: Siwei Chen, Anxing Xiao, David Hsu.
"LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World." ArXiv (2023). [paper] [2023.11] -
Narendra Dev, J. John Soundar Jerome, Hélène Scolan, Jean-Philippe Matas.
"Liquid inertia versus bubble cloud buoyancy in circular plunging jet experiments." ArXiv (2023). [paper] [2023.11] -
HUGS: Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan.
"HUGS: Human Gaussian Splats." ArXiv (2023). [paper] [code] [2023.11] -
SAM-ILP: Aayush Kumar Tyagi, Vaibhav Mishra, Prathosh A. P., Mausam.
"Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images." ArXiv (2023). [paper] [code] [2023.11] -
SAMPro3D: Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, Xiaoguang Han.
"SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
SAM-COBOT: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen.
"Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model." CVPR (2024). [paper] [2023.11] -
SemReID: Siyuan Huang, Yifan Zhou, Ram Prabhakar Kathirvel, Rama Chellappa, Chun Pong Lau.
"Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification." ArXiv (2023). [paper] [2023.11] -
I-MedSAM: Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang.
"I-MedSAM: Implicit Medical Image Segmentation with Segment Anything." ArXiv (2023). [paper] [2023.11] -
RAH-Bench: Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, Jinqiao Wang, Ming Tang.
"Mitigating Hallucination in Visual Language Models with Visual Supervision." ArXiv (2023). [paper] [2023.11] -
Fei He, Zhiyuan Yang, Mingyue Gao, Biplab Poudel, Newgin Sam Ebin Sam Dhas, Rajan Gyawali, Ashwin Dhakal, Jianlin Cheng, Dong Xu.
"Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs." ArXiv (2023). [paper] [2023.11] -
Obj-NeRF: Zhiyi Li, Lihe Ding, Tianfan Xue.
"Obj-NeRF: Extract Object NeRFs from Multi-view Images." ArXiv (2023). [paper] [code] [2023.11] -
Ming Li, Guang Yang.
"Where to Begin? From Random to Foundation Model Instructed Initialization in Federated Learning for Medical Image Segmentation." ArXiv (2023). [paper] [2023.11] -
SAM-6D: Jiehong Lin, Lihua Liu, Dekun Lu, Kui Jia.
"SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation." CVPR (2024). [paper] [code] [2023.11] -
MARIS: Mengxi Zhang, Yiming Liu, Xiangjun Yin, Huanjing Yue, Jingyu Yang.
"MARIS: Referring Image Segmentation via Mutual-Aware Attention Features." ArXiv (2023). [paper] [2023.11] -
Stable-SAM: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang.
"Stable Segment Anything Model." ArXiv (2023). [paper] [code] [2023.11] -
PromptNucSeg: Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Yuxuan Sun, Lin Yang.
"Unleashing the Power of Prompt-driven Nucleus Instance Segmentation." ECCV (2024). [paper] [code] [2023.11] -
Rutuja Gurav, Het Patel, Zhuocheng Shang, Ahmed Eldawy, Jia Chen, Elia Scudiero, Evangelos Papalexakis.
"Can SAM recognize crops? Quantifying the zero-shot performance of a semantic segmentation foundation model on generating crop-type maps using satellite imagery for precision agriculture." NeurIPS (2023). [paper] [code] [2023.11] -
Francesco Croce, Matthias Hein.
"Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models." ArXiv (2023). [paper] [2023.11] -
P2RBox: Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han.
"P2RBox: A Single Point is All You Need for Oriented Object Detection." ArXiv (2023). [paper] [2023.11] -
PG-Video-LLaVA: Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan.
"PG-Video-LLaVA: Pixel Grounding Large Video-Language Models." ArXiv (2023). [paper] [code] [2023.11] -
MetaDreamer: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu.
"MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture." ArXiv (2023). [paper] [code] [2023.11] -
Emu Edit: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman.
"Emu Edit: Precise Image Editing via Recognition and Generation Tasks." ArXiv (2023). [paper] [homepage] [2023.11] -
Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert.
"On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation." ArXiv (2023). [paper] [2023.11] -
Yu Ando, Nora Jee-Young Park and, Gun Oh Chong, Seokhwan Ko, Donghyeon Lee, Junghwan Cho, Hyungsoo Han.
"Interpretable pap smear cell representation for cervical cancer screening." ArXiv (2023). [paper] [2023.11] -
GT-Maps: Yimeng Li, Navid Rajabi, Sulabh Shrestha, Md Alimoor Reza, Jana Kosecka.
"Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models." ArXiv (2023). [paper] [2023.11] -
Nam V. Nguyen, Hieu Trung Huynh, and Phuc-Lu Le.
"Deep Learning Techniques for Segmenting Breast Lesion Regions and Classifying Mammography Images." ArXiv (2023). [paper] [2023.11] -
Ren Li, Corentin Dumery, Benoît Guillard, Pascal Fua.
"Garment Recovery with Shape and Deformation Priors." ArXiv (2023). [paper] [2023.11] -
MROS: Kechen Song, Hongwei Wen, Xiaotong Xue, Liming Huang, Yingying Ji, Yunhui Yan .
"Modality Registration and Object Search Framework for UAV-based Unregistered RGB-T Image Salient Object Detection." ArXiv (2023). [paper] [code] [2023.11] -
CPVLF: Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li.
"Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens." ArXiv (2023). [paper] [2023.11] -
Zixuan Xie, Rengan Xie, Rong Li, Kai Huang, Pengju Qiao, Jingsen Zhu, Xu Yin, Qi Ye, Wei Hua, Yuchi Huo, Hujun Bao.
"Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning." ArXiv (2023). [paper] [2023.11] -
Clarity ChatGPT: Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang.
"Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement." ArXiv (2023). [paper] [2023.11] -
GCDSS: Zhengyuan Peng, Qijian Tian, Jianqing Xu, Yizhang Jin, Xuequan Lu, Xin Tan, Yuan Xie, Lizhuang Ma.
"Generalized Category Discovery in Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
Few-shot SLVM: Xiyu Qi, Yifan Wu, Yongqiang Mao, Wenhui Zhang, Yidan Zhang.
"Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models." ArXiv (2023). [paper] [2023.11] -
OCT-mosaicking: Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz.
"Novel OCT mosaicking pipeline with Feature- and Pixel-based registration." ArXiv (2023). [paper] [code] [2023.11] -
PseCo: Huang Zhizhong, Dai Mingliang, Zhang Yi, Zhang Junping, Shan Hongming.
"Point, Segment and Count: A Generalized Framework for Object Counting." CVPR (2024). [paper] [code] [2023.11] -
FreeKD: Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang.
"FreeKD: Knowledge Distillation via Semantic Frequency Prompt." ArXiv (2023). [paper] [2023.11] -
Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan.
"Enhancing Novel Object Detection via Cooperative Foundational Models." ArXiv (2023). [paper] [code] [2023.11] -
GMISeg: Jing Xu.
"GMISeg: General Medical Image Segmentation without Re-Training." ArXiv (2023). [paper] [2023.11] -
Tian Meng, Yang Tao, Wuliang Yin.
"Few-Shot Classification & Segmentation Using Large Language Models Agent." ArXiv (2023). [paper] [2023.11] -
MorSeg-CAM-SAM: Xin Yue, Qing Zhao, Jianqiang Li, Xiaoling Liu, Changwei Song, Suqin Liu, Guanghui Fu.
"Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
SA-Med2D-20M: Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao.
"SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks." ArXiv (2023). [paper] [code] [2023.11] -
OmniSeg3D: Haiyang Ying, Yixuan Yin, Jinzhi Zhang, Fan Wang, Tao Yu, Ruqi Huang, Lu Fang.
"OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning." ArXiv (2023). [paper] [homepage] [2023.11] -
GeoSAM: Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu.
"GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure." ArXiv (2023). [paper] [2023.11] -
CellSAM : Uriah Israel, Markus Marks, Rohit Dilip, Qilin Li, Morgan Schwartz, Elora Pradhan, Edward Pao, Shenyi Li, Alexander Pearson-Goulart, Pietro Perona, Georgia Gkioxari, Ross Barnowski, Yisong Yue, David Van Valen.
"A Foundation Model for Cell Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
RockSAM: Zhaoyang Ma, Xupeng He, Shuyu Sun, Bicheng Yan, Hyung Kwak, Jun Gao.
"Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model." ArXiv (2023). [paper] [2023.11] -
DMV3D: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang.
"DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model." ArXiv (2023). [paper] [code] [2023.11] -
InterpAny-Clearer: Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang.
"Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation." ArXiv (2023). [paper] [homepage] [code] [2023.11] -
OSM: Qihang Yu, Xiaohui Shen, Liang-Chieh Chen.
"Towards Open-Ended Visual Recognition with Large Language Model." ArXiv (2023). [paper] [code] [2023.11] -
UR-SAM: Yichi Zhang, Shiyao Hu, Chen Jiang, Yuan Cheng, Yuan Qi.
"Segment Anything Model with Uncertainty Rectification for Auto-Prompting Medical Image Segmentation." ArXiv (2023). [paper] [2023.11] -
DefectSAM: Bozhen Hu, Bin Gao, Cheng Tan, Tongle Wu, Stan Z. Li.
"Segment Anything in Defect Detection." ArXiv (2023). [paper] [2023.11] -
UnifiedVisionGPT: Chris Kelly, Luhui Hu, Cindy Yang, Yu Tian, Deshun Yang, Bang Yang, Zaoshan Huang, Zihao Li, Yuexian Zou.
"UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework." ArXiv (2023). [paper] [code] [2023.11] -
Slide-SAM: Quan Quan, Fenghe Tang, Zikang Xu, Heqin Zhu, S. Kevin Zhou.
"Slide-SAM: Medical SAM Meets Sliding Window." ArXiv (2023). [paper] [2023.11] -
MM-Navigator: An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang.
“GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation." ArXiv (2023). [paper] [code] [2023.11] -
TriDental: Tomáš Kunzo, Viktor Kocur, Lukáš Gajdošech, Martin Madaras.
"Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning." DISA (2023). [paper] [2023.11] -
Hyungeun Lee, Ung Hwang, Seungwon Yu, Chang-Hun Lee, Kijung Yoon.
"Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning." ML4H (2023). [paper] [2023.11] -
AdapterShadow: Leiping Jie, Hui Zhang.
"AdapterShadow: Adapting Segment Anything Model for Shadow Detection." ArXiv (2023). [paper] [code] [2023.11] -
Uni-COAL: Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang.
"Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images." ArXiv (2023). [paper] [2023.11] -
SAMIHS: Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, XiangZhi Bai.
"SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
Virmarie Maquiling, Sean Anthony Byrne, Diederick C. Niehorster, Marcus Nyström, Enkelejda Kasneci.
"Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.11] -
GlanceSeg: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu.
"GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy." ArXiv (2023). [paper] [2023.11] -
EviPrompt: Yinsong Xu, Jiaqi Tang, Aidong Men, Qingchao Chen.
"EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images." ArXiv (2023). [paper] [2023.11] -
FDNet: Xiang Feng, Chengkai Wang, Chengyu Wu, Yunxiang Li, Yongbo He, Shuai Wang, Yaiqi Wang.
"FDNet: Feature Decoupled Segmentation Network for Tooth CBCT Image." ArXiv (2023). [paper] [2023.11] -
GISCup23: Xuanshu Luo, Paul Walther, Wejdene Mansour, Balthasar Teuscher, Johann Maximilian Zollner, Hao Li, Martin Werner.
"Exploring GeoAI Methods for Supraglacial Lake Mapping on Greenland Ice Sheet." ArXiv (2023). [paper] [code] [2023.11] -
u-LLaVA: Jinjin Xu, Liwu Xu, Yuzhe Yang, Xiang Li, Yanchun Xie, Yi-Jie Huang, Yaqian Li.
"u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model." ArXiv (2023). [paper] [2023.11] -
LLaVA-Plus: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li.
"LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents." ArXiv (2023). [paper] [code] [2023.11] -
EVA-VOS: Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos.
"Learning the What and How of Annotation in Video Object Segmentation." WACV (2023). [paper] [code] [2023.11] -
NExT-Chat: Ao Zhang, Liming Zhao, Chen-Wei Xie, Yun Zheng, Wei Ji, Tat-Seng Chua.
"NExT-Chat: An LMM for Chat, Detection and Segmentation." ArXiv (2023). [paper] [code] [2023.11] -
SAMVG: Haokun Zhu, Juang Ian Chong, Teng Hu, Ran Yi, Yu-Kun Lai, Paul L. Rosin.
"SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model." ArXiv (2023). [paper] [2023.11] -
Danielle Ferreira, Rima Arnaout.
"Are foundation models efficient for medical image segmentation?" ArXiv (2023). [paper] [code] [2023.11] -
VFMV: Kejun Wu, Qiong Liu, Kim-Hui Yap, and You Yang.
"High dimensional optical data — varifocal multiview imaging, compression and evaluation." Optics Express (2023). [paper] [2023.11] -
T-NT: Zhenjun Yu, Wenqiang Xu, Siqiong Yao, Jieji Ren, Tutian Tang, Yutong Li, Guoying Gu, Cewu Lu.
"Precise Robotic Needle-Threading with Tactile Perception and Reinforcement Learning." ArXiv (2023). [paper] [code] [2023.11] -
GLaMM: Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan.
"GLaMM: Pixel Grounding Large Multimodal Model." ArXiv (2023). [paper] [code] [2023.11] -
Masking: Elias Arbash, Andréa de Lima Ribeiro, Sam Thiele, Nina Gnann, Behnood Rasti, Margret Fuchs, Pedram Ghamisi, Richard Gloaguen.
"Masking Hyperspectral Imaging Data with Pretrained Models." ArXiv (2023). [paper] [code] [2023.11] -
Yiran Li, Junpeng Wang, Prince Aboagye, Michael Yeh, Yan Zheng, Liang Wang, Wei Zhang, Kwan-Liu Ma.
"Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning." ArXiv (2023). [paper] [2023.11] -
CSF: Shichao Dong, Fayao Liu, Guosheng Lin.
"Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation." ArXiv (2023). [paper] [2023.11] -
RegionSpot: Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu.
"Recognize Any Regions." ArXiv (2023). [paper] [code] [2023.11] -
MSMedCap: Gaoang Wang, Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li.
"Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning." ArXiv (2023). [paper] [2023.11] -
MVS: Mykhailo Shvets, Dongxu Zhao, Marc Niethammer, Roni Sengupta, Alexander C. Berg.
"Joint Depth Prediction and Semantic Segmentation with Multi-View SAM." WACV (2024). [paper] [2023.11] -
EditAnything: Shanghua Gao, Zhijie Lin, Xingyu Xie, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan.
"EditAnything: Empowering Unparalleled Flexibility in Image Editing and Generation." ACM MM (2023). [paper] [code] [2023.10] -
ImEW: ImEW: A Framework for Editing Image in the Wild.
"Tasnim Mohiuddi, Tianyi Zhang, Maowen Nie, Jing Huang, Qianqian Chen, Wei Shi." LGM3A Workshop (2023). [paper] [2023.10] -
Fen Fang, Yi Cheng, Ying Sun, Qianli Xu.
"Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object Segmentation Challenge 2023." ArXiv (2023). [paper] [2023.10] -
InsDet : Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong.
"A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture." NeurIPS Datasets and Benchmarks Track (2023). [paper] [code] [2023.10] -
Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout.
"One-shot Localization and Segmentation of Medical Images with Foundation Models." NeurIPS Workshop (2023). [paper] [2023.10] -
AVIS : Ruohao Guo, Yaru Chen, Yanyu Qi, Wenzhen Yue, Dantong Niu, Xianghua Ying.
"Audio-Visual Instance Segmentation." ArXiv (2023). [paper] [2023.10] -
ProMISe: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz.
"Promise:Prompt-driven 3D Medical Image Segmentation Using Image Models." ArXiv (2023). [paper] [code] [2023.10] -
Joana Palés Huix, Adithya Raju Ganeshan, Johan Fredin Haslum, Magnus Söderberg, Christos Matsoukas, Kevin Smith.
"Are Natural Domain Foundation Models Useful for Medical Image Classification?." ArXiv (2023). [paper] [2023.10] -
OBM: Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma.
"Rebuild City Buildings from Off-Nadir Aerial Images with Offset-Building Model (OBM)." ArXiv (2023). [paper] [code] [2023.10] -
TGVE: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola.
"CVPR 2023 Text Guided Video Editing Competition." ArXiv (2023). [paper] [code] [2023.10] -
ViewControl: Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou.
"Integrating View Conditions for Image Synthesis." ArXiv (2023). [paper] [2023.10] -
SparseDFF: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas.
"SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation." ArXiv (2023). [paper] [2023.10] -
SAMPOT: Rachana Sathish, Rahul Venkataramani, K S Shriram, Prasad Sudhakar.
"Task-driven Prompt Evolution for Foundation Models." ArXiv (2023). [paper] [2023.10] -
SonoSAM: Hariharan Ravishankar, Rohan Patil, Vikram Melapudi, Parminder Bhatia, Kass-Hout Taha, Pavan Annangi.
"SonoSAM -- Segment Anything on Ultrasound Images." ASMUS (2023). [paper] [2023.10] -
Bertrand Chauveau, Pierre Merville.
"Segment Anything by Meta as a foundation model for image segmentation: a new era for histopathological images." Pathology (2023). [paper] [2023.10] -
MAFT: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi.
"Learning Mask-aware CLIP Representations for Zero-Shot Segmentation." NeurIPS (2023). [paper] [code] [2023.10] -
Ardiansyah Koeshidayatullah.
"Riding the Wave: One-Touch Automatic Salt Segmentation by Coupling SAM and SegGPT ." ArXiv (2023). [paper] [2023.10] -
LuGSAM: Dhanush Babu Ramesh, Rishika Iytha Sridhar, Pulakesh Upadhyaya and Rishikesan Kamaleswaran.
"Lung Grounded-SAM (LuGSAM): A Novel Framework for Integrating Text prompts to Segment Anything Model (SAM) for Segmentation Tasks of ICU Chest X-Rays." ArXiv (2023). [paper] [2023.10] -
Zero123++: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su.
"Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model." ArXiv (2023). [paper] [code] [2023.10] -
CoralVOS: Zheng Ziqiang, Xie Yaofeng, Liang Haixin, Yu Zhibin, Sai-Kit Yeung.
"CoralVOS: Dataset and Benchmark for Coral Video Segmentation." ArXiv (2023). [paper] [2023.10] -
ConceptFusion: Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba.
"ConceptFusion: Open-set Multimodal 3D Mapping." RSS (2023). [paper] [code] [2023.10] -
CryoSegNet: Rajan Gyawali, Ashwin Dhakal, Liguo Wang, Jianlin Cheng.
"Accurate cryo-EM protein particle picking by integrating the foundational AI image segmentation model and specialized U-Net." ArXiv (2023). [paper] [2023.10] -
CISRU: Silvia Romero-Azpitartea, Cristina Lunaa, Alba Guerraa, Mercedes Alonsoa, Pablo Romeo Manriquea, Marina L. Seoanea, Daniel Olayoa, Almudena Morenoa, Pablo Castellanosa, Fernando Gandíaa, Gianfranco Visentinb.
"Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction." IAC (2023). [paper] [2023.10] -
DiffPrompter: Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna.
"DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions." ArXiv (2023). [paper] [code] [2023.10] -
Alessandro Saviolo, Pratyaksh Rao, Vivek Radhakrishnan, Jiuhong Xiao, Giuseppe Loianno.
"Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories." ArXiv (2023). [paper] [2023.10] -
Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li.
"Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning." NLPCC (2023). [paper] [2023.10] -
Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda.
"Compositional Semantics for Open Vocabulary Spatio-semantic Representations." ArXiv (2023). [paper] [2023.10] -
InstructDET : Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song.
"InstructDET: Diversifying Referring Object Detection with Generalized Instructions." ArXiv (2023). [paper] [code] [2023.10] -
HICOME: Peng Zheng.
"Discriminative Consensus Mining with A Thousand Group for More Accurate Co-Salient Object Detection." ArXiv (2023). [paper] [code] [2023.10] -
Ferret: Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang.
"." ArXiv (2023). [paper] [code] [2023.10] -
OVTracktor: Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki.
"Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models." ArXiv (2023). [paper] [code] [2023.10] -
OpenAnnotate3D: Yijie Zhou, Likun Cai, Xianhui Cheng, Zhongxue Gan, Xiangyang Xue, Wenchao Ding.
"OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data." ArXiv (2023). [paper] [code] [2023.10] -
SSC: Francisco Eiras, Kemal Oksuz, Adel Bibi, Philip H.S. Torr, Puneet K. Dokania.
"Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation." ArXiv (2023). [paper] [code] [2023.10] -
Shichang Liu, Junxin Chen, Ben-Guo He, Tao Chen, Gwanggil Jeon, Wei Wang.
"Adapting Segment Anything Model for Shield Tunnel Water Leakage Segmentation." AMC-SME Workshop (2023). [paper] [2023.10] -
Sofia H. Gelado, César Quilodrán-Casas, Loïc Chagot.
"Enhancing Microdroplet Image Analysis with Deep Learning." Micromachines (2023). [paper] [2023.10] -
EdgeCalib: Xingchen Li, Yifan Duan, Beibei Wang, Haojie Ren, Guoliang You, Yu Sheng, Jianmin Ji, Yanyong Zhang.
"EdgeCalib: Multi-Frame Weighted Edge Features for Automatic Targetless LiDAR-Camera Calibration." ArXiv (2023). [paper] [2023.10] -
Open-NeRF: Hao Zhang, Fang Li, Narendra Ahuja.
"Open-NeRF: Towards Open Vocabulary NeRF Decomposition." WACV (2024). [paper] [2023.10] -
SAM-CLIP: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari.
"SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding." ArXiv (2023). [paper] [2023.10] -
SAM-Med3D: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao.
"SAM-Med3D." ArXiv (2023). [paper] [code] [2023.10] -
SAMCLR: Benjamin Missaoui, Chongbin Yuan.
"SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling." ArXiv (2023). [paper] [2023.10] -
Zhaozheng Chen, Qianru Sun.
"Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models." ArXiv (2023). [paper] [2023.10] -
Sumit Pandey, Kuan-Fu Chen, Erik B. Dam.
"Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models." ArXiv (2023). [paper] [2023.10] -
Mammo-SAM: Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Li.
"Mammo-SAM: Adapting Foundation Segment Anything Model for Automatic Breast Mass Segmentation in Whole Mammograms." ResearchGate (2023). [paper] [2023.10] -
Dongshen Han, Sheng Zheng, Chaoning Zhang.
"Segment Anything Meets Universal Adversarial Perturbation." ArXiv (2023). [paper] [2023.10] -
SoM-GPT4V: Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao.
"Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V." ArXiv (2023). [paper] [homepage] [code] [2023.10] -
IPSeg: Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li.
"Towards Training-free Open-world Segmentation via Image Prompting Foundation Models." ArXiv (2023). [paper] [2023.10] -
SAM_Interactive_Histopathology: SeungKyu Kim, Hyun-Jic Oh, Seonghui Min, Won-Ki Jeong.
"Evaluation and improvement of Segment Anything Model for interactive histopathology image segmentation." MICCAI Workshop (2023). [paper] [code] [2023.10] -
Yao Qianxiang, Bin Jiang.
"Recursive Segmentation Living Image: An eXplainable AI (XAI) Approach for Computing Structural Beauty of Images or the Livingness of Space." ArXiv (2023). [paper] [2023.10] -
Sheng Zheng, Chaoning Zhang.
"Black-box Targeted Adversarial Attack on Segment Anything (SAM)." ArXiv (2023). [paper] [2023.10] -
Jiahao Xia, Gavin Gong 2, Jiawei Liu, Zhigang Zhu, Hao Tang.
"Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data." ArXiv (2023). [paper] [2023.10] -
PUCD : Youngtack Oh, Minseok Seo, Doyi Ki, Junghoon Seo.
"Prototype-oriented Unsupervised Change Detection for Disaster Management." ArXiv (2023). [paper] [2023.10] -
SAM-guided UDA: Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Tai Wang, Xinge Zhu, Yuexin Ma.
"SAM-guided Unsupervised Domain Adaptation for 3D Segmentation." ArXiv (2023). [paper] [2023.10] -
SemCom: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong.
"Generative AI-driven Semantic Communication Framework for NextG Wireless Network." ArXiv (2023). [paper] [2023.10] -
Christian A. Schiller.
"Virtual Augmented Reality for Atari Reinforcement Learning." ArXiv (2023). [paper] [2023.10] -
MCREA: Xu Chen, Yunde Jia, Yuwei Wu.
"Fine-Grained Annotation for Face Anti-Spoofing." ArXiv (2023). [paper] [2023.10] -
SAM-OCTA: Xinrun Chen, Chengliang Wang, Haojian Ning, Shiying Li.
"SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation." ArXiv (2023). [paper] [code] [2023.10] -
MED: Haijie Ren, Weiqiang Wang, Wentao Tang, Rui Zhang.
"Machine Eye for Defects: Machine Learning-Based Solution to Identify and Characterize Topological Defects in Textured Images of Nematic Materials." ArXiv (2023). [paper] [2023.10] -
Mohammad Peivandi, Jason Zhang, Michael Lu, Dongxiao Zhu, Zhifeng Kou.
"Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation." ArXiv (2023). [paper] [2023.10] -
Tree-GPT: Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, Renzhong Guo.
"Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis." ArXiv (2023). [paper] [2023.10] -
TiC: Song Zhang, Qingzhong Wang, Jiang Bian, Haoyi Xiong.
"TiC: Exploring Vision Transformer in Convolution." ArXiv (2023). [paper] [code] [2023.10] -
SLP: David Balaban, Justin Medich, Pranay Gosar, Justin Hart.
"Propagating Semantic Labels in Video Data." ArXiv (2023). [paper] [homepage] [2023.10] -
Amin Ranem, Niklas Babendererde, Moritz Fuchs, Anirban Mukhopadhyay.
"Exploring SAM Ablations for Enhancing Medical Segmentation in Radiology and Pathology." ArXiv (2023). [paper] [2023.10] -
Xiangru Li, Yifei Zhang, Liang Zhao.
"Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation." ArXiv (2023). [paper] [2023.10] -
Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor, Ali J. Ghandour.
"Zero-Shot Refinement of Buildings' Segmentation Models using SAM." ArXiv (2023). [paper] [2023.10] -
GroupPrompter: Yichuang Luo, Fang Wang, Jing Xing, Xiaohu Liu.
"GroupPrompter: A Prompting Method for Semantic Segmentation Based on SAM." IEEE Access (2023). [paper] [2023.09] -
GAVS: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li.
"Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer." AAAI (2024). [paper] [2023.09] -
Raha, Avi Deb and Adhikary, Apurba and Munir, Md. Shirajum and Qiao, Yu and Hong, Choong Seon.
"Segment Anything Model Aided Beam Prediction for the Millimeter Wave Communication." APNOMS (2023). [paper] [2023.09] -
PVLFF: Haoran Chen, Kenneth Blomqvist, Francesco Milano, Roland Siegwart.
"Panoptic Vision-Language Feature Fields." ArXiv (2023). [paper] [code] [2023.09] -
SAMStyler: Psychogyios, Konstantinos and Leligou, Helen C. and Melissari, Filisia and Bourou, Stavroula and Anastasakis, Zacharias and Zahariadis, Theodore.
"SAMStyler: Enhancing Visual Creativity With Neural Style Transfer and Segment Anything Model (SAM)." IEEE Access (2023). [paper] [2023.09] -
Aneesh Rangnekar, Jue Jiang, Harini Veeraraghavan.
"3D Swin Transformer for Partial Medical Auto Segmentation." MICCAI-FLARE (2023). [paper] [2023.09] -
ASA: Yaqin Li, Dandan Wang, Cao Yuan, Hao Li, Jing Hu.
"Enhancing Agricultural Image Segmentation with an Agricultural Segment Anything Model Adapter." Sensors (2023). [paper] [2023.09] -
SCROD: Valentyn Boreiko, Matthias Hein, Jan Hendrik Metzen.
"Identifying Systematic Errors in Object Detectors with the SCROD Pipeline." ICCV Workshop (2023). [paper] [2023.09] -
Iraklis Giannakis, Anshuman Bhardwaj, Lydia Sam, Georgios Leontidis.
"A flexible deep learning crater detection scheme using Segment Anything Model (SAM)." ICARUS (2023). [paper] [2023.09] -
SuPerPM: Shan Lin, Albert J. Miao, Ali Alabiad, Fei Liu, Kaiyuan Wang, Jingpei Lu, Florian Richter, Michael C. Yip.
"SuPerPM: A Large Deformation-Robust Surgical Perception Framework Based on Deep Point Matching Learned from Physical Constrained Simulation Data" ArXiv (2023). [paper] [2023.09] -
Bi-SAM: Ying Zhao, Kechen Song, Wenqi Cui, Hang Ren, Yunhui Yan.
"MFS enhanced SAM: Achieving superior performance in bimodal few-shot segmentation." JVCIR (2023). [paper] [code] [2023.09] -
BaDLAD: Kazi Reyazul Hasan, Mubasshira Musarrat, Sadif Ahmed, Shahriar Raj.
"Framework and Model Analysis on Bengali Document Layout Analysis Dataset: BaDLAD." ArXiv (2023). [paper] [2023.09] -
SAM-Adapter: Tianrun Chen, Lanyun Zhu, Chaotao Deng, Runlong Cao, Yan Wang, Shangzhan Zhang, Zejian Li, Lingyun Sun, Ying Zang, Papa Mao.
"SAM-Adapter: Adapting Segment Anything in Underperformed Scenes." ICCV Workshop (2023). [paper] [code] [2023.09] -
UniQuadric: Linghao Yang, Yanmin Wu, Yu Deng, Rui Tian, Xinggang Hu, Tiefeng Ma.
"UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling." ArXiv (2023). [paper] [2023.09] -
SAMFeat: Jingqian Wu, Rongtao Xu, Zach Wood-Doughty, Changwei Wang.
"Segment Anything Model is a Good Teacher for Local Feature Learning." ArXiv (2023). [paper] [code] [2023.09] -
nnSAM: Yunxiang Li, Bowen Jing, Xiang Feng, Zihan Li, Yongbo He, Jing Wang, You Zhang.
"nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance." ArXiv (2023). [paper] [code] [2023.09] -
Mayara E. Bonani, Max Schwarz, Sven Behnke.
"Learning from SAM: Harnessing a Segmentation Foundation Model for Sim2Real Domain Adaptation through Regularization." ArXiv (2023). [paper] [2023.09] -
Khoa Dang Nguyen, Thanh-Hai Phung, Hoang-Giang Cao.
"A SAM-based Solution for Hierarchical Panoptic Segmentation of Crops and Weeds Competition." ICCV Workshop (2023). [paper] [2023.09] -
MediViSTA-SAM: Sekeun Kim, Kyungsang Kim, Jiang Hu, Cheng Chen, Zhiliang Lyu, Ren Hui, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Xiang Li, Tianming Liu, Quanzheng Li.
"MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation." ArXiv (2023). [paper] [code] [2023.09] -
PointSSC: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu.
"PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion." ICRA (2024). [paper] [2023.09] -
NOC: Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang.
"NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything." ArXiv (2023). [paper] [2023.09] -
SAM-OCTA: Chengliang Wang, Xinrun Chen, Haojian Ning, Shiying Li.
"SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks." ArXiv (2023). [paper] [code] [2023.09] -
MoPA: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie.
"MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.09] -
Deshadow-Anything: Xiao Feng Zhang, Tian Yi Song, Jia Wei Yao.
"Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal." ArXiv (2023). [paper] [2023.09] -
3D-U-SAM: Yifu Zhang, Zuozhu Liu, Yang Feng, Renjing Xu.
"3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images." ArXiv (2023). [paper] [2023.09] -
OCTA-FRNet: Haojian Ning, Chengliang Wang, Xinrun Chen, Shiying Li.
"An Accurate and Efficient Neural Network for OCTA Vessel Segmentation and a New Dataset." ArXiv (2023). [paper] [code] [2023.09] -
MA-SAM: Cheng Chen, Juzheng Miao, Dufan Wu, Zhiling Yan, Sekeun Kim, Jiang Hu, Aoxiao Zhong, Zhengliang Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, Quanzheng Li.
"MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.09] -
samgeo: Qiusheng Wu and Lucas Prado Osco.
"samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM)." JOSS (2023). [paper] [code] [2023.09] -
Peng Zhang, Yaping Wang.
"Segment Anything Model for Brain Tumor Segmentation." ArXiv (2023). [paper] [2023.09] -
SAMUS: Xian Lin, Yangyang Xiang, Li Zhang, Xin Yang, Zengqiang Yan, Li Yu.
"SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation." ArXiv (2023). [paper] [code] [2023.09] -
CMSF: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu.
"Leveraging Foundation models for Unsupervised Audio-Visual Segmentation." ArXiv (2023). [paper] [2023.09] -
Xiaodan Xing, Chunling Tang, Yunzhe Guo, Nicholas Kurniawan, Guang Yang.
"SegmentAnything helps microscopy images based automatic and quantitative organoid detection and analysis." ArXiv (2023). [paper] [2023.09] -
Chenbin Liu, Zhengliang Liu, Jason Holmes, Lu Zhang, Lian Zhang, Yuzhen Ding, Peng Shu, Zihao Wu, Haixing Dai, Yiwei Li, Dinggang Shen, Ninghao Liu, Quanzheng Li, Xiang Li, Dajiang Zhu, Tianming Liu, Wei Liu.
"Artificial General Intelligence for Radiation Oncology." ArXiv (2023). [paper] [2023.09] -
DEVA: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee.
"Tracking Anything with Decoupled Video Segmentation." ICCV (2023). [paper] [project page] [code] [2023.09] -
SAM3D: Nhat-Tan Bui, Dinh-Hieu Hoang, Minh-Triet Tran, Ngan Le.
"SAM3D: Segment Anything Model in Volumetric Medical Images." ArXiv (2023). [paper] [2023.09] -
CropFormer: Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang.
"High-Quality Entity Segmentation." ICCV (2023). [paper] [project page] [code] [中文解读] [2023.09] -
CIP-WPIS: Qingtao Yu, Heming Du, Chen Liu, Xin Yu.
"When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision." ArXiv (2023). [paper] [2023.09] -
SAM-Deblur: Siwei Li, Mingxuan Liu, Yating Zhang, Shu Chen, Haoxiang Li, Hong Chen, Zifei Dou.
"SAM-Deblur: Let Segment Anything Boost Image Deblurring." ArXiv (2023). [paper] [code] [2023.09] -
Hassan El-Hajj, Matteo Valleriani.
"Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models." ICIAP (2023), AI4DH workshop. [paper] [2023.09] -
SAM-CD: Lei Ding, Kun Zhu, Daifeng Peng, Hao Tang, Haitao Guo.
"Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images." ArXiv (2023). [paper] [2023.09] -
SAM-LIV: Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman.
"Plug-And-Play Object-Centric Representations From “What” and “Where” Foundation Models." ArXiv (2023). [paper] [2023.08] -
UGainS: Alexey Nekrasov, Alexander Hermans, Lars Kuhnert, Bastian Leibe.
"UGainS: Uncertainty Guided Anomaly Instance Segmentation." GCPR (2023). [paper] [code] [2023.08] -
Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang.
"Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection." ArXiv (2023). [paper] [2023.08] -
Dwith Chenna, Suyash Bhogawar.
"Segment Anything Model (SAM) For Brain Extraction in fMRI Studies." IJAIMED (2023). [paper] [2023.08] -
OSTRA : Jiexiong Xu, Weikun Zhao, Zhiyan Tang, Xiangchao Gan.
"A One Stop 3D Target Reconstruction and multilevel Segmentation Method." ArXiv (2023). [paper] [code] [2023.08] -
ROSGPT_Vision: Bilel Benjdira, Anis Koubaa, Anas M. Ali.
"ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts." ArXiv (2023). [paper] [code] [2023.08] -
CoDeF: Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen.
"CoDeF: Content Deformation Fields for Temporally Consistent Video Processing." ArXiv (2023). [paper] [code] [2023.08] -
WALL-E: Tianyu Wang, Yifan Li, Haitao Lin, Xiangyang Xue, Yanwei Fu.
"WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model." ArXiv (2023). [paper] [code] [2023.08] -
Ref-Diff: Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo.
"Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models." ArXiv (2023). [paper] [code] [2023.08] -
Anwai Archit, Sushmita Nair, Nabeel Khalid, Paul Hilt, Vikas Rajashekar, Marei Freitag, Sagnik Gupta, Andreas Dengel, Sheraz Ahmed, Constantin Pape.
"Segment Anything for Microscopy." ResearchGate (2023). [paper] [2023.08] -
Su Myat Noe.
"Efficient Segment-Anything Model for Automatic Mask Region Extraction in Livestock Monitoring." IEEE ICCT(2023). [paper] [2023.08] -
SSM-SAM: Yiming Zhang, Tianang Leng, Kun Han, Xiaohui Xie.
"Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning." ArXiv (2023). [paper] [2023.08] -
SAM-Med2D: Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, Yu Qiao.
"SAM-Med2D." ArXiv (2023). [paper] [code] [2023.08] -
AutoSAM Adapter: Chengyin Li, Prashant Khanduri, Yao Qiang, Rafi Ibn Sultan, Indrin Chetty, Dongxiao Zhu.
"Auto-Prompting SAM for Mobile Friendly 3D Medical Image Segmentation." ArXiv (2023). [paper] [2023.08] -
Leo Fillioux, Emilie Gontran, Jérôme Cartry, Jacques RR Mathieu, Sabrina Bedja, Alice Boilève, Paul-Henry Cournède, Fanny Jaulin, Stergios Christodoulidis, Maria Vakalopoulou.
"Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy." ICCV Workshop (2023). [paper] [2023.08] -
SAM-PARSER: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Xiaokang Yang, Wei Shen.
"SAM-PARSER: Fine-tuning SAM Efficiently by Parameter Space Reconstruction." ArXiv (2023). [paper] [2023.08] -
Weijia Feng, Lingting Zhu, Lequan Yu.
"Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars." MICCAI BrainLes Workshop (2023). [paper] [2023.08] -
Zihan Dong, ZhengDong Zhang.
"Enhancing Bloodstain Analysis Through AI-Based Segmentation: Leveraging Segment Anything Model for Crime Scene Investigation." KDD Workshop (2023). [paper] [code] [2023.08] -
SCESAME: Hiroaki Yamagiwa, Yusuke Takase, Hiroyuki Kambe, Ryosuke Nakamoto.
"Zero-Shot Edge Detection With SCESAME: Spectral Clustering-Based Ensemble for Segment Anything Model Estimation." WACV Workshop (2024). [arXiv] [paper] [code] [2023.08] -
SamDSK: Yizhe Zhang, Tao Zhou, Shuo Wang, Ye Wu, Pengfei Gu, Danny Z. Chen.
"SamDSK: Combining Segment Anything Model with Domain-Specific Knowledge for Semi-Supervised Learning in Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.08] -
RSISeg: Zhe Wang, Shoukun Sun, Xiang Que, Xiaogang Ma.
"Interactive segmentation in aerial images: a new benchmark and an open access web-based tool." ArXiv (2023). [paper] [2023.08] -
DiffSeg: Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco.
"Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion." ArXiv (2023). [paper] [2023.08] -
SPPNet: Qing Xu, Wenwei Kuang, Zeyu Zhang, Xueyao Bao, Haoran Chen, Wenting Duan.
"SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation." ArXiv (2023). [paper] [code] [2023.08] -
SAMSNeRF: Ange Lou, Yamin Li, Xing Yao, Yike Zhang, Jack Noble.
"SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)." Image-Guided Procedures, Robotic Interventions, and Modeling (2024). [paper] [code] [2023.08] -
SS2V: Xing Yao, Han Liu, Dewei Hu, Daiwei Lu, Ange Lou, Hao Li, Ruining Deng, Gabriel Arenas, Baris Oguz, Nadav Schwartz, Brett C Byram, Ipek Oguz.
"False Negative/Positive Control for SAM on Noisy Medical Images." ArXiv (2023). [paper] [code] [2023.08] -
SurgicalSAM: Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang.
"SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation." AAAI (2024). [paper] [code] [2023.08] -
SAMedOCT: Botond Fazekas, José Morano, Dmitrii Lachinov, Guilherme Aresta, Hrvoje Bogunović.
"SAMedOCT: Adapting Segment Anything Model (SAM) for Retinal OCT." ArXiv (2023). [paper] [2023.08] -
U-SAM: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin.
"CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation." ArXiv (2023). [paper] [2023.08] -
Few-Shot-Self-Prompt-SAM: Qi Wu, Yuyao Zhang, Marawan Elbatel.
"Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation." MICCAI DART Workshop (2023). [paper] [code] [2023.08] -
Dancing Avatar: Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang.
"Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model." ArXiv (2023). [paper] [2023.08] -
SurgicalSAM: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren.
"SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation." MICCAI MedAGI Workshop (2023). [paper] [2023.08] -
OSTRA: Jiexiong Xu, Weikun Zhao, Zhiyan Tang, Xiangchao Gan.
"A One Stop 3D Target Reconstruction and multilevel Segmentation Method." ArXiv (2023). [paper] [code] [2023.08] -
CEmb-SAM: Dongik Shin, Beomsuk Kim, Seungjun Baek.
"CEmb-SAM: Segment Anything Model with Condition Embedding for Joint Learning from Heterogeneous Datasets." ArXiv (2023). [paper] [2023.08] -
CLE Diffusion: Yuyang Yin, Dejia Xu, Chuangchuang Tan, Ping Liu, Yao Zhao, Yunchao Wei.
"CLE Diffusion: Controllable Light Enhancement Diffusion Model." ACM MM (2023). [paper] [code] [2023.08] -
Polyp-SAM++: Risab Biswas.
"Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?" ArXiv (2023). [paper] [code] [2023.08] -
TongueSAM: Shan Cao, Qunsheng Ruan, Qingfeng Wu.
"TongueSAM: An Universal Tongue Segmentation Model Based on SAM with Zero-Shot." ArXiv (2023). [paper] [code] [2023.08] -
FoodSAM: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue.
"FoodSAM: Any Food Segmentation." ArXiv (2023). [paper] [code] [2023.08] -
SAM-L: Xueyuan Li, Ruining Deng, Yucheng Tang, Shunxing Bao, Haichun Yang, Yuankai Huo.
"Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning." ArXiv (2023). [paper] [2023.08] -
FAn: Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus.
"Follow Anything: Open-set detection, tracking, and following in real-time." ArXiv (2023). [paper] [code] [demo] [2023.08] -
SSOM: Ruikai Cui, Siyuan He, Shi Qiu.
"Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection." ArXiv (2023). [paper] [2023.08] -
AquaSAM: Muduo Xu, Jianhao Su, Yutao Liu.
"AquaSAM: Underwater Image Foreground Segmentation." ArXiv (2023). [paper] [2023.08] -
AdaptiveSAM: Jay N. Paranjape, Nithin Gopalakrishnan Nair, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel.
"AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation." ArXiv (2023). [paper] [code] [2023.08] -
Ziyi Huang, Hongshan Liu, Haofeng Zhang, Fuyong Xing, Andrew Laine, Elsa Angelini, Christine Hendon, Yu Gan.
"Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation." ArXiv (2023). [paper] [2023.08] -
Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister.
"Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models." ArXiv (2023). [paper] [2023.08] -
DEFT: Aditya Kannan.
"Learning from Human Videos for Robotic Manipulation." ArXiv (2023). [paper] [code] [2023.07] -
FOCUS : Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt.
"FOCUS: Object-Centric World Models for Robotics Manipulation." ArXiv (2023). [paper] [code] [2023.07] -
DisCo: Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang.
"DisCo: Disentangled Control for Realistic Human Dance Generation." ArXiv (2023). [paper] [code] [2023.07] -
Vaibhav Vavilala, David Forsyth.
"Applying a Color Palette with Local Control using Diffusion Models." ArXiv (2023). [paper] [2023.07] -
SegAnimeChara: Andy Yu-Hsiang Tseng, Wen-Fan Wang, Bing-Yu Chen.
"SegAnimeChara: Segmenting Anime Characters Generated by AI." ACM SIGGRAPH (2023). [paper] [2023.07] -
TASS: Mengqi He, Jing Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai.
"Transferable Attack for Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.07] -
SAM zero-shot segmentator: Loris Nanni, Carlo Fantozzi, Alberto Pretto , Daniel Fusaro.
"Improving Existing Segmentators Performance with Zero-Shot Segmentators." ArXiv (2023). [paper] [2023.07] -
SAMFlow: Shili Zhou, Ruian He, Weimin Tan, Bo Yan.
"SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model." AAAI (2024). [paper] [2023.07] -
HQTrack: Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li.
"Tracking Anything in High Quality." ArXiv (2023). [paper] [code] [2023.07] -
Fashion Matrix: Zheng Chong, Xujie Zhang, Fuwei Zhao, Zhenyu Xie, Xiaodan Liang.
"Fashion Matrix: Editing Photos by Just Talking." ArXiv (2023). [paper] [homepage] [code] [2023.07] -
RoboChop: Atharva Dikshit, Alison Bartsch, Abraham George, Amir Barati Farimani.
"RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models." ArXiv (2023). [paper] [2023.07] -
Industrial-SA: Keno Moenck, Arne Wendt, Philipp Prünte, Julian Koch, Arne Sahrhage, Johann Gierecker, Ole Schmedemann, Falko Kähler, Dirk Holst, Martin Gomse, Thorsten Schüppstuhl, Daniel Schoepflin.
"Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul." ArXiv (2023). [paper] [2023.07] -
CNOS: Van Nguyen Nguyen, Tomas Hodan, Georgy Ponimatkin, Thibault Groueix, Vincent Lepetit.
"CNOS: A Strong Baseline for CAD-based Novel Object Segmentation." ICCV Workshop (2023). [paper] [code] [2023.07] -
SAM-Path: Jingwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, Dimitris Samaras.
"SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology." ArXiv (2023). [paper] [2023.07] -
BuboGPT: Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang.
"BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs." ArXiv (2023). [paper] [code] [2023.07] -
OpenSU: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen.
"Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments." ICCV Workshop (2023). [paper] [code] [2023.07] -
OG: Zichao Dong, Hang Ji, Weikun Zhang, Xufeng Huang, Junbo Chen.
"OG: Equip vision occupancy with instance segmentation and visual grounding." ArXiv (2023). [paper] [2023.07] -
$SAM^{Med}$: Chenglong Wang, Dexuan Li, Sucheng Wang, Chengxiu Zhang, Yida Wang, Yun Liu, Guang Yang.
$SAM^{Med}$: A medical image annotation framework based on large vision model. ArXiv (2023). [paper] [2023.07] -
SAM-U: Guoyao Deng, Ke Zou, Kai Ren, Meng Wang, Xuedong Yuan, Sancong Ying, Huazhu Fu.
"SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image." ArXiv (2023). [paper] [2023.07] -
Semantic-SAM: Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
"Semantic-SAM: Segment and Recognize Anything at Any Granularity." ECCV (2024). [paper] [code] [2023.07] -
SAM-IQA: Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu.
"SAM-IQA: Can Segment Anything Boost Image Quality Assessment?." ArXiv (2023). [paper] [code] [2023.07] -
Cross-SAM: Xiaoyu Bai, Fan Bai, Xiaofei Huo, Jia Ge, Tony C. W. Mok, Zi Li, Minfeng Xu, Jingren Zhou, Le Lu, Dakai Jin, Xianghua Ye, Jingjing Lu, Ke Yan.
"Matching in the Wild: Learning Anatomical Embeddings for Multi-Modality Images." ArXiv (2023). [paper] [2023.07] -
LAM-SC: Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You.
"Large AI Model-Based Semantic Communications." ArXiv (2023). [paper] [2023.07] -
MSDeAOT: Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang.
"ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking." ArXiv (2023). [paper] [2023.07] -
EM-SAM: Ao Cheng, Guoqiang Zhao, Lirong Wang, Ruobing Zhang.
"AxonCallosumEM Dataset: Axon Semantic Segmentation of Whole Corpus Callosum cross section from EM Images." ArXiv (2023). [paper] [2023.07] -
SAM-PT: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu.
"Segment Anything Meets Point Tracking." ArXiv (2023). [paper] [code] [2023.07] -
SAMAug: Haixing Dai, Chong Ma, Zhengliang Liu, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Dajiang Zhu, Wei Liu, Quanzheng Li, Tianming Liu, Xiang Li.
"SAMAug: Point Prompt Augmentation for Segment Anything Model." ArXiv (2023). [paper] [2023.07] -
SAM-DA: Liangliang Yao, Haobo Zuo, Guangze Zheng, Changhong Fu, Jia Pan.
"SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation." ArXiv (2023). [paper] [code] [2023.07] -
RefSAM: Yonglin Li, Jing Zhang, Xiao Teng, Long Lan, Xinwang Liu.
"RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation." ArXiv (2023). [paper] [code] [2023.07] -
All-in-SAM: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo.
"All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning." ArXiv (2023). [paper] [2023.07] -
Zenglin Shi, Ying Sun, Mengmi Zhang.
"Training-free Object Counting with Prompts." ArXiv (2023). [paper] [code] [2023.07] -
Xiaoyu Shi, Shurong Chai, Yinhao Li, Jingliang Cheng, Jie Bai, Guohua Zhao, Yen-Wei Chen.
"Cross-modality Attention Adapter: A Glioma Segmentation Fine-tuning Method for SAM Using Multimodal Brain MR Images." ArXiv (2023). [paper] [2023.07] -
TDA: Ruben Glatt, Shusen Liu.
"Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging." ArXiv (2023). [paper] [2023.06] -
Xavier F. Cadet, Ranya Aloufi, Alain Miranville, Sara Ahmadi-Abhari, Hamed Haddadi.
"Evaluating The Robustness of Self-Supervised Representations to Background/Foreground Removal." ArXiv (2023). [paper] [2023.06] -
3D Shape Match: Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovsjanikov, Peter Wonka.
"Zero-Shot 3D Shape Correspondence." SIGGRAPH ASIA (2023). [paper] [code] [2023.06] -
Siddharth Shankar, Leigh A. Stearns, Cornelis J. van der Veen.
"Segment Anything in Glaciology: An initial study implementing the Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.06] -
Xiang Li, Lu Zhang, Zihao Wu, Zhengliang Liu, Lin Zhao, Yixuan Yuan, Jun Liu, Gang Li, Dajiang Zhu, Pingkun Yan, Quanzheng Li, Wei Liu, Tianming Liu, Dinggang Shen.
"Artificial General Intelligence for Medical Imaging." ArXiv (2023). [paper] [2023.06] -
ViDA: Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang.
"ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation." ArXiv (2023). [paper] [code] [2023.06] -
FGVP: Lingfeng Yang, Yueze Wang, Xiang Li, Xinlong Wang, Jian Yang.
"Fine-Grained Visual Prompting." ArXiv (2023). [paper] [2023.06] -
AssistGPT: Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou.
"AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn." ArXiv (2023). [paper] [code] [2023.06] -
Matthew Baugh, James Batten, Johanna P. Müller, Bernhard Kainz.
"Zero-Shot Anomaly Detection with Pre-trained Segmentation Models." ArXiv (2023). [paper] [2023.06] -
Guochen Ning, Hanyin Liang, Zhongliang Jiang, Hui Zhang, Hongen Liao.
"The potential of 'Segment Anything' (SAM) for universal intelligent ultrasound image guidance." BioScience Trends (2023). [paper] [2023.06] -
SeaDronesSee-3D and BOArienT: Benjamin Kiefer, Timon Höfer, Andreas Zell.
"Stable Yaw Estimation of Boats from the Viewpoint of UAVs and USVs." ECMR (2023). [paper] [2023.06] -
DADF: Yingxin Lai, Zhiming Luo, Zitong Yu.
"Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization." Chinese Conference on Biometric Recognition (2023). [paper] [code] [2023.06] -
Lucas Prado Osco, Qiusheng Wu, Eduardo Lopes de Lemos, Wesley Nunes Gonçalves, Ana Paula Marques Ramos, Jonathan Li, José Marcato Junior.
"The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot." ArXiv (2023). [paper] [2023.06] -
RSPrompter: Keyan Chen, Chenyang Liu, Hao Chen, Haotian Zhang, Wenyuan Li, Zhengxia Zou, Zhenwei Shi.
"RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model." ArXiv (2023). [paper] [code] [2023.06] -
Zhewei Chen, Wai Keung Wong, Zuofeng Zhong, Jinpiao Liao, Ying Qu.
"Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection." ArXiv (2023). [paper] [2023.06] -
Zheyan Jin, Shiqi Chen, Yueting Chen, Zhihai Xu, Huajun Feng.
"Let Segment Anything Help Image Dehaze." ArXiv (2023). [paper] [2023.06] -
CLIP-SAM: Evan Kellener, Ihina Nath, An Ngo, Thomas Nguyen, Joshua Schuman, Coen Adler, Arnav Kartikeya.
"Utilizing Segment Anything Model For Assessing Localization of GRAD-CAM in Medical Imaging." ArXiv (2023). [paper] [2023.06] -
MESS: Benedikt Blumenstiel, Johannes Jakubik, Hilde Kühne, Michael Vössing.
"What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.06] -
MMPM: Jiange Yang, Wenhui Tan, Chuhao Jin, Bei Liu, Jianlong Fu, Ruihua Song, Limin Wang.
"Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots." ArXiv (2023). [paper] [YouTube] [Bilibili] [2023.06] -
CellViT: Fabian Hörst, Moritz Rempe, Lukas Heine, Constantin Seibold, Julius Keyl, Giulia Baldini, Selma Ugurel, Jens Siveke, Barbara Grünwald, Jan Egger, Jens Kleesiek.
"CellViT: Vision Transformers for Precise Cell Segmentation and Classification." ArXiv (2023). [paper] [code] [2023.06] -
MedLSAM: Wenhui Lei, Xu Wei, Xiaofan Zhang, Kang Li, Shaoting Zhang.
"MedLSAM: Localize and Segment Anything Model for 3D Medical Images." ArXiv (2023). [paper] [code] [2023.06] -
MobileSAM: Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong.
"Faster Segment Anything: Towards Lightweight SAM for Mobile Applications." ArXiv (2023). [paper] [code] [2023.06] -
SonarSAM: Lin Wang, Xiufen Ye, Liqiang Zhu, Weijie Wu, Jianguo Zhang, Huiming Xing, Chao Hu.
"When SAM Meets Sonar Images." ArXiv (2023). [paper] [code] [2023.06] -
AutoSAM: Xinrong Hu, Xiaowei Xu, Yiyu Shi.
"How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images." ArXiv (2023). [paper] [code] [2023.06] -
3DSAM-adapter: Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou.
"3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.06] -
Xinru Shan, Chaoning Zhang.
"Robustness of Segment Anything Model (SAM) for Autonomous Driving in Adverse Weather Conditions." ArXiv (2023). [paper] [2023.06] -
SAM-LST: Shurong Chai, Rahul Kumar Jain, Shiyu Teng, Jiaqing Liu, Yinhao Li, Tomoko Tateyama, Yen-wei Chen.
"Ladder Fine-tuning approach for SAM integrating complementary network." ArXiv (2023). [paper] [code] [2023.06] -
Mohsen Ahmadi, Masoumeh Farhadi Nia, Sara Asgarian, Kasra Danesh, Elyas Irankhah, Ahmad Gholizadeh Lonbar, Abbas Sharifi.
"Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images." ArXiv (2023). [paper] [2023.06] -
FastSAM: Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, Jinqiao Wang.
"Fast Segment Anything." ArXiv (2023). [paper] [code] [2023.06] -
Seal: Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu.
"Segment Any Point Cloud Sequences by Distilling Vision Foundation Models." ArXiv (2023). [paper] [code] [homepage] [2023.06] -
Lian Zhang, Zhengliang Liu, Lu Zhang, Zihao Wu, Xiaowei Yu, Jason Holmes, Hongying Feng, Haixing Dai, Xiang Li, Quanzheng Li, Dajiang Zhu, Tianming Liu, Wei Liu.
"Segment Anything Model (SAM) for Radiation Oncology." ArXiv (2023). [paper] [2023.06] -
Enlighten-Anything: Qihan Zhao, Xiaofeng Zhang, Hao Tang, Chaochen Gu, Shanying Zhu.
"Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement." ArXiv (2023). [paper] [code] [2023.06] -
SAA+: Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Liang Gao, Weiming Shen.
"Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection." CVPR2023 Workshop. [paper] [code] [2023.06] -
TEPO: Chuyun Shen, Wenhao Li, Ya Zhang, Xiangfeng Wang.
"Temporally-Extended Prompts Optimization for SAM in Interactive Medical Image Segmentation." ArXiv (2023). [paper] [2023.06] -
TomoSAM: Federico Semeraro, Alexandre Quintart, Sergio Fraile Izquierdo, Joseph C. Ferguson.
"TomoSAM: a 3D Slicer extension using SAM for tomography segmentation." ArXiv (2023). [paper] [code] [2023.06] -
Madeline Chantry Schiappa, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet.
"Robustness Analysis on Foundational Segmentation Models." ArXiv (2023). [paper] [code] [2023.06] -
Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Shehbaz Tariq, Chenshuang Zhang, Choong Seon Hong.
"Robustness of SAM: Segment Anything Under Corruptions and Beyond." ArXiv (2023). [paper] [2023.06] -
AutoSAM: Tal Shaharabany, Aviad Dahan, Raja Giryes, Lior Wolf.
"AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder." ArXiv (2023). [paper] [2023.06] -
SAM-shadow: Xiaofeng Zhang, Chaochen Gu, Shanying Zhu.
"SAM-helps-Shadow:When Segment Anything Model meet shadow removal." ArXiv (2023). [paper] [code] [2023.06] -
Chaoning Zhang, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Sung-Ho Bae, Choong Seon Hong.
"A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering." ArXiv (2023). [paper] [2023.06] -
MAM: Jiachen Li, Jitesh Jain, Humphrey Shi.
"Matting Anything." ArXiv (2023). [paper] [code] [2023.06] -
Haochen Xue, Mingyu Jin, Chong Zhang, Yuxuan Huang, Qian Weng, Xiaobo Jin.
"Automatic Image Blending Algorithm Based on SAM and DINO." ArXiv (2023). [paper] [2023.06] -
MatAny: Jingfeng Yao, Xinggang Wang, Lang Ye, Wenyu Liu.
"Matte Anything: Interactive Natural Image Matting with Segment Anything Models." ArXiv (2023). [paper] [code] [2023.06] -
CNS: Runnan Chen, Youquan Liu, Lingdong Kong, Nenglun Chen, Xinge Zhu, Yuexin Ma, Tongliang Liu, Wenping Wang.
"Towards Label-free Scene Understanding by Vision Foundation Models." ArXiv (2023). [paper] [code] [2023.06] -
SAM3D: Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu.
"SAM3D: Segment Anything in 3D Scenes." ArXiv (2023). [paper] [code] [2023.06] -
Calib-Anything: Zhaotong Luo, Guohang Yan, Yikang Li.
" Calib-Anything: Zero-training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything." ArXiv (2023). [paper] [code] [2023.06] -
Shijie Chang, Zeqi Hao, Ben Kang, Xiaoqi Zhao, Jiawen Zhu, Zhenyu Chen, Lihe Zhang, Lu Zhang, Huchuan Lu.
" 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW." ArXiv (2023). [paper] [2023.06] -
USD: Yulin He, Wei Chen, Yusong Tan, Siqi Wang.
" USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and Segment Anything Model." ArXiv (2023). [paper] [2023.06] -
SAM3D: Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai.
"SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model." ArXiv (2023). [paper] [code] [2023.06] -
Shehbaz Tariq, Brian Estadimas Arfeto, Chaoning Zhang, Hyundong Shin.
"Segment Anything Meets Semantic Communication." ArXiv (2023). [paper] [2023.06] -
HQ-SAM: Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu.
"Segment Anything in High Quality." NeurIPS (2023). [paper] [code] [2023.06] -
DeSAM: Yifan Gao, Wei Xia, Dingdu Hu, Xin Gao.
"DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.06] -
SAM-Track: Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang.
"Segment and Track Anything." ArXiv (2023). [paper] [code] [2023.05] -
SAMText: Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, Dacheng Tao.
"Scalable Mask Annotation for Video Text Spotting." ArXiv (2023). [paper] [code] [2023.05] -
NP-SAM: Rasmus Larsen, Torben L. Villadsen, Jette K. Mathiesen, Kirsten M. Ø. Jensen, Espen D. Bøjesen.
"NP-SAM: Implementing the Segment Anything Model for Easy Nanoparticle Segmentation in Electron Microscopy Images." ArXiv (2023). [paper] [code] [2023.05] -
EfficientViT: Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han.
"EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction." ICCV (2023). [paper] [code] [2023.05] -
POPE: Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Dejia Xu, Hanwen Jiang, Zhangyang Wang.
"POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference." ArXiv (2023). [paper] [code] [2023.05] -
Bridge3D: Zhimin Chen, Bing Li.
"Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models." ArXiv (2023). [paper] [2023.05] -
Make-A-Protagonist: Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee.
"Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts." ArXiv (2023). [paper] [code] [2023.05] -
ZeroPose: Jianqiu Chen, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He.
"ZeroPose: CAD-Model-based Zero-Shot Pose Estimation." ArXiv (2023). [paper] [2023.05] -
IIR-Net: Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer.
"Text-to-image Editing by Image Information Removal." ArXiv (2023). [paper] [2023.05] -
Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong.
"Understanding segment anything model: Sam is biased towards texture rather than shape." ArXiv (2023). [paper] [2023.05] -
FineRewards: Guian Fang, Zutao Jiang, Jianhua Han, Guangsong Lu, Hang Xu, Xiaodan Liang.
"Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards." ArXiv (2023). [paper] [code] [2023.05] -
InstructEdit: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka.
"InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions." ArXiv (2023). [paper] [code] [2023.05] -
AIMS: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang.
"AIMS: All-Inclusive Multi-Level Segmentation." ArXiv (2023). [paper] [code] [2023.05] -
ShadowSAM: Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li.
"Detect Any Shadow: Segment Anything for Video Shadow Detection." TCSVT (2023). [paper] [code] [2023.05] -
ISA-NeRF: Xiaokang Chen, Jiaxiang Tang, Diwen Wan, Jingbo Wang, Gang Zeng.
"Interactive Segment Anything NeRF with Feature Imitation." ArXiv (2023). [paper] [homepage] [2023.05] -
Yihao Huang, Yue Cao, Tianlin Li, Felix Juefei-Xu, Di Lin, Ivor W. Tsang, Yang Liu, Qing Guo.
"On the Robustness of Segment Anything." ArXiv (2023). [paper] [2023.05] -
SAMScore: Yunxiang Li, Meixu Chen, Wenxuan Yang, Kai Wang, Jun Ma, Alan C. Bovik, You Zhang.
"SAMScore: A Semantic Structural Similarity Metric for Image Translation Evaluation." ArXiv (2023). [paper] [code] [2023.05] -
SAD: Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen.
"SAD: Segment Any RGBD." ArXiv (2023). [paper] [code] [2023.05] -
SPT: Zeyu Xiao, Jiawang Bai, Zhihe Lu, Zhiwei Xiong.
"A Dive into SAM Prior in Image Restoration." ArXiv (2023). [paper] [2023.05] -
Matcher: Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen.
"Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching." ArXiv (2023). [paper] [code] [2023.05] -
RAP: Jiaxi Jiang, Christian Holz.
"Restore Anything Pipeline: Segment Anything Meets Image Restoration." ArXiv (2023). [paper] [code] [2023.05] -
UVOSAM: Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu.
"UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model." ArXiv (2023). [paper] [2023.05] -
BreastSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"BreastSAM: A Study of Segment Anything Model for Breast Tumor Detection in Ultrasound Images." ArXiv (2023). [paper] [2023.05] -
SAMSh: Leiping Jie, Hui Zhang.
"When SAM Meets Shadow Detection." ArXiv (2023). [paper] [code] [2023.05] -
Instruct2Act: Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li.
"Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model." ArXiv (2023). [paper] [code] [2023.05] -
WS-SAM: Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li.
"Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping." NeurIPS (2023). [paper] [2023.05] -
SAA+: Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, Weiming Shen.
"Segment Any Anomaly without Training via Hybrid Prompt Regularization." ArXiv (2023). [paper] [code] [2023.05] -
OR-NeRF: Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin.
"OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields." ArXiv (2023). [paper] [2023.05] -
PromptUNet: Junde Wu.
"PromptUNet: Toward Interactive Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.05] -
EAC: Ao Sun, Pingchuan Ma, Yuanyuan Yuan, Shuai Wang.
"Explain Any Concept: Segment Anything Meets Concept-Based Explanation." NeurIPS (2023). [paper] [2023.05] -
Xiao Yang, Haixing Dai, Zihao Wu, Ramesh Bist, Sachin Subedi, Jin Sun, Guoyu Lu, Changying Li, Tianming Liu, Lilong Chai.
"SAM for Poultry Science." ArXiv (2023). [paper] [2023.05] -
Leaf Only SAM: Dominic Williams, Fraser MacFarlane, Avril Britten.
"Leaf Only SAM: A Segment Anything Pipeline for Zero-Shot Automated Leaf Segmentation." ArXiv (2023). [paper] [2023.05] -
KD-SAM: Sahib Julka, Michael Granitzer.
"Knowledge distillation with Segment Anything (SAM) model for Planetary Geological Mapping." ArXiv (2023). [paper] [2023.05] -
SAM-Track: Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang.
"Segment-and-Track Anything." ArXiv (2023). [paper] [code] [2023.05] -
SEEM: Zhihe Lu, Zeyu Xiao, Jiawang Bai, Zhiwei Xiong, Xinchao Wang.
"Can SAM Boost Video Super-Resolution?" ArXiv (2023). [paper] [2023.05] -
Yuqing Wang, Yun Zhao, Linda Petzold.
"An Empirical Study on the Robustness of the Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.05] -
SAM-WSSS: Tianle Chen, Zheda Mai, Ruiwen Li, Wei-lun Chao.
"Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.05] -
SAM4MIS: Yichi Zhang, Rushi Jiao.
"How Segment Anything Model (SAM) Boost Medical Image Segmentation?" ArXiv (2023). [paper] [code] [2023.05] -
BadSAM: Zihan Guan, Mengxuan Hu, Zhongliang Zhou, Jielu Zhang, Sheng Li, Ninghao Liu.
"BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks." ArXiv (2023). [paper] [2023.05] -
PerSAM: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li.
"Personalize Segment Anything Model with One Shot." ArXiv (2023). [paper] [code] [2023.05] -
CAT: Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao.
"Caption Anything: Interactive Image Description with Diverse Multimodal Controls." ArXiv (2023). [paper] [code] [2023.05] -
SAMRS: Di Wang, Jing Zhang, Bo Du, Dacheng Tao, Liangpei Zhang.
"Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model." NeurIPS 2023 Datasets and Benchmarks Track (2023). [paper] [code] [2023.05] -
AV-SAM: Shentong Mo, Yapeng Tian.
"AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation." ArXiv (2023). [paper] [2023.05] -
SAMA-AVS: Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie.
"Annotation-free Audio-Visual Segmentation." WACV (2024). [paper] [code] [2023.05] -
WSSS: Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes.
"An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems." ArXiv (2023). [paper] [2023.05] -
PLG-SAM: Peng-Tao Jiang, Yuqi Yang.
"Segment Anything is A Good Pseudo-label Generator for Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [2023.05] -
Attack-SAM: Chenshuang Zhang, Chaoning Zhang, Taegoo Kang, Donghun Kim, Sung-Ho Bae, In So Kweon.
"Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples." ArXiv (2023). [paper] [2023.05] -
Polyp-SAM: Yuheng Li, Mingzhe Hu, Xiaofeng Yang.
"Polyp-SAM: Transfer SAM for Polyp Segmentation." ArXiv (2023). [paper] [code] [2023.05] -
Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong.
"Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected." ArXiv (2023). [paper] [2023.05] -
DSEC-MOS: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac.
"DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle." ArXiv (2023). [paper] [code] [2023.05] -
Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Lucas Silveira Kupssinskü, Otávio Parraga, Marcelo Mussi Delucis, Rodrigo Coelho Barros.
"Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines." ArXiv (2023). [paper] [code] [2023.05] -
Dongjie Cheng, Ziyuan Qin, Zekun Jiang, Shaoting Zhang, Qicheng Lao, Kang Li.
"SAM on Medical Images: A Comprehensive Study on Three Prompt Modes." ArXiv (2023). [paper] [2023.05] -
Expedit-SAM: Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, Weihong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu.
"Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning." NeurIPS (2022). [paper] [code] [2023.04] -
An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren.
"SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective." ArXiv (2023). [paper] [2023.04] -
Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Haozhe Chi, Xindi Hu, Deng-Ping Fan, Fajin Dong, Dong Ni.
"Segment Anything Model for Medical Images?" ArXiv (2023). [paper] [2023.04] -
Edit Everything: Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin.
"Edit Everything: A Text-Guided Generative System for Images Editing." ArXiv (2023). [paper] [code] [2023.04] -
SkinSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model." ArXiv (2023). [paper] [2023.04] -
GazeSAM: Bin Wang, Armstrong Aboah, Zheyuan Zhang, Ulas Bagci.
"GazeSAM: What You See is What You Segment." ArXiv (2023). [paper] [code] [2023.04] -
SAMed: Kaidong Zhang, Dong Liu.
"Customized Segment Anything Model for Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.04] -
LearnablePromptSAM: Zhongxi Qiu, Yan Hu, Heng Li, Jiang Liu.
"Learnable Ophthalmology SAM." ArXiv (2023). [paper] [code] [2023.04] -
Simiao Ren, Francesco Luzi, Saad Lahrichi, Kaleb Kassaw, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof.
"Segment anything, from space?." WACV (2024). [paper] [2023.04] -
Peilun Shi, Jianing Qiu, Sai Mu Dalike Abaxi, Hao Wei, Frank P. -W. Lo, Wu Yuan.
"Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation." ArXiv (2023). [paper] [2023.04] -
MSA: Junde Wu, Yu Zhang, Rao Fu, Huihui Fang, Yuanpei Liu, Zhaowei Wang, Yanwu Xu, Yueming Jin.
"Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.04] -
Mohsen Ahmadi, Ahmad Gholizadeh Lonbar, Abbas Sharifi, Ali Tarlani Beris, Mohammadsadegh Nouri, Amir Sharifzadeh Javidi.
"Application of Segment Anything Model for Civil Infrastructure Defect Assessment." ArXiv (2023). [paper] [2023.04] -
SA3D: Jiazhong Cen, Zanwei Zhou, Jiemin Fang, Wei Shen, Lingxi Xie, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian.
"Segment Anything in 3D with NeRFs." NeurIPS (2023). [paper] [code] [2023.04] -
MedSAM: Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, Bo Wang.
"Segment Anything in Medical Images." Nature Communications (2024). [paper] [code] [2023.04] -
TAM: Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, Feng Zheng.
"Track Anything: Segment Anything Meets Videos." ArXiv (2023). [paper] [code] [2023.04] -
HFGFA: Rongsheng Wang, Yaofei Duan, YuKun Li.
"Segment anything also detect anything." ArXiv (2023). [paper] [2023.04] -
SNA: Yongcheng Jing, Xinchao Wang, Dacheng Tao.
"Segment Anything in Non-Euclidean Domains: Challenges and Opportunities." ArXiv (2023). [paper] [2023.04] -
SAMAug: Yizhe Zhang, Tao Zhou, Peixian Liang, Danny Z. Chen.
"Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model." ArXiv (2023). [paper] [2023.04] -
Count-Anything: Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan.
"Can SAM Count Anything? An Empirical Study on SAM Counting." ArXiv (2023). [paper] [code] [2023.04] -
Text2Seg: Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, Sheng Li.
"Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models." ArXiv (2023). [paper] [code] [2023.04] -
Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang.
"Segment Anything Model for Medical Image Analysis: an Experimental Study." MIA (2023). [paper] [2023.04] -
Anything-3D: Qiuhong Shen, Xingyi Yang, Xinchao Wang.
"Anything-3D: Towards Single-view Anything Reconstruction in the Wild." ArXiv (2023). [paper] [code] [2023.04] -
Any-to-Any Transfer: Songhua Liu, Jingwen Ye, Xinchao Wang.
"Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate." ArXiv (2023). [paper] [code] [2023.04] -
Sheng He, Rina Bao, Jingpeng Li, Jeffrey Stout, Atle Bjornerud, P. Ellen Grant, Yangming Ou.
"Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets." ArXiv (2023). [paper] [2023.04] -
SAM-Adapter: Tianrun Chen, Lanyun Zhu, Chaotao Ding, Runlong Cao, Yan Wang, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang.
"SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More." ICCVW (2023). [paper] [2023.04] -
Chuanfei Hu, Tianyi Xia, Shenghong Ju, Xinde Li.
"When SAM Meets Medical Images: An Investigation of Segment Anything Model (SAM) on Multi-phase Liver Tumor Segmentation." ArXiv (2023). [paper] [2023.04] -
SATIR: Junzhang Chen, Xiangzhi Bai.
"Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR." ArXiv (2023). [paper] [code] [2023.04] -
Florian Putz, Johanna Grigo, Thomas Weissmann, Philipp Schubert, Daniel Hoefler, Ahmed Gomaa, Hassen Ben Tkhayat, Amr Hagag, Sebastian Lettmaier, Benjamin Frey, Udo S. Gaipl, Luitpold V. Distel, Sabine Semrau, Christoph Bert, Rainer Fietkau, Yixing Huang.
"The Segment Anything foundation model achieves favorable brain tumor autosegmentation accuracy on MRI to support radiotherapy treatment planning." ArXiv (2023). [paper] [2023.04] -
Iraklis Giannakis, Anshuman Bhardwaj, Lydia Sam, Georgios Leontidis.
"Deep learning universal crater detection using Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.04] -
SAMPolyp: Tao Zhou, Yizhe Zhang, Yi Zhou, Ye Wu, Chen Gong.
"Can SAM Segment Polyps?" ArXiv (2023). [paper] [code] [2023.04] -
Inpaint-Anything: Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, Zhibo Chen.
"Inpaint Anything: Segment Anything Meets Image Inpainting." ArXiv (2023). [paper] [code] [2023.04] -
Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, Bowen Zhou, Luc Van Gool.
" SAM Struggles in Concealed Scenes -- Empirical Study on "Segment Anything"." ArXiv (2023). [paper] [2023.04] -
Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu, Wenbo Li, Li Cheng.
"Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications." Machine Intelligence Research (2024). [paper] [code] [2024.04] -
Wei Ji, Jingjing Li, Qi Bi, Wenbo Li, Li Cheng.
"Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications." CVPRW Oral (2023). [paper] [code] [2023.04] -
CLIP Surgery: Yi Li, Hualiang Wang, Yiqun Duan, Xiaomeng Li.
"CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks." ArXiv (2023). [paper] [code] [2023.04] -
SAMM: Yihao Liu, Jiaming Zhang, Zhangcong She, Amir Kheradmand, Mehran Armand.
"SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM." ArXiv (2023). [paper] [code] [2023.04] -
SAM.MD: Saikat Roy, Tassilo Wald, Gregor Koehler, Maximilian R. Rokuss, Nico Disch, Julius Holzschuh, David Zimmerer, Klaus H. Maier-Hein.
"SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model." ArXiv (2023). [paper] [2023.04] -
SAM vs BET: Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug.
"SAM vs BET: A Comparative Study for Brain Extraction and Segmentation of Magnetic Resonance Images using Deep Learning." ArXiv (2023). [paper] [2023.04] -
SAMCOD: Lv Tang, Haoke Xiao, Bo Li.
"Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection." ArXiv (2023). [paper] [code] [2023.04]
No. | Project | Title | Project page | Code base | Affiliation | Description |
---|---|---|---|---|---|---|
001 | SAM | Segment Anything | Project page | Code | Meta | A foundation model for general segmentation. |
002 | SAM-Track | Segment and Track Anything | Colab | Code | Zhejiang University | A project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. |
003 | Grounded-SAM | Grounded-Segment-Anything | Colab | Code | IDEA-Research | A project by combining Grounding DINO and SAM which aims to detect and segment Anything with text inputs. |
004 | MMDet-SAM | - | - | Code | OpenMMLab | A new way of instance segmentation by combining SAM with Closed-Set Object Detection, Open-Vocabulary Object Detection, Grounding Object Detection. |
005 | MMRotate-SAM | Zero-shot Oriented Object Detection with SAM | - | Code | OpenMMLab | A project join SAM and weakly supervised horizontal box detection to achieve rotated box detection. |
006 | MMOCR-SAM | - | - | Code | OpenMMLab | A solution of Text Detection/Recognition and SAM that segments every text character, with striking text removal and text inpainting demos driven by diffusion models and Gradio. |
007 | MMEditing-SAM | - | - | Code | OpenMMLab | A project join SAM and image generation to create awesome images and edit any part of them. |
008 | Label-Studio-SAM | OpenMMLab PlayGround: Semi-Automated Annotation with Label-Studio and SAM | - | Code | OpenMMLab | A project combining Label-Studio and SAM to achieve semi-automated annotation. |
009 | PaddleSeg | Segment Anything with PaddleSeg | - | Code | PaddlePaddle | A pretrained model parameters of PaddlePaddle format. |
010 | SegGPT | Segmenting Everything In Context | Hugging Face | Code | BAAI-Vision | SAM In Context based on Painter. |
011 | SEEM | Segment Everything Everywhere All at Once | Hugging Face | Code | Microsoft | A project can Segment Everything Everywhere with Multi-modal prompts all at once. |
012 | CLIP Surgery | CLIP Surgery for Better Explainability with Enhancement in Open Vocabulary Tasks | Project page | Code | HKUST | A work about SAM based on CLIP's explainability to achieve text to mask without manual points. |
013 | SAMCOD | Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection | - | Code | - | SAM +Camouflaged object detection (COD) task. |
014 | Inpaint Anything | Segment Anything Meets Image Inpainting | Hugging Face | Code | USTC and EIT | SAM combines Inpainting, which is able to remove the object smoothly. |
015 | PerSAM | Personalize Segment Anything Model with One Shot | Hugging Face | Code | - | SAM with specific concepts. |
016 | MedSAM | Segment Anything in Medical Images | - | Code | - | A step-by-step tutorial with a small dataset to help you quickly utilize SAM. |
017 | Segment-Any-Anomaly | GroundedSAM Anomaly Detection | Colab | Code | HUST | Grounding DINO + SAM to segment any anomaly. |
018 | SSA | Semantic Segment Anything | - | Code | Fudan University | A dense category annotation engine. |
019 | Magic Copy | - | - | Code | - | Magic Copy is a Chrome extension that uses SAM to extract a foreground object from an image and copy it to the clipboard. |
020 | Segment Anything with Clip | Segment Anything with Clip | Hugging Face | Code | - | SAM combined with CLIP. |
021 | MetaSeg | Segment Anything Video | Hugging Face | Code | - | Packaged version of the SAM. |
022 | SAM in Napari | Segment Anything Model (SAM) in Napari | Project page | Code | Applied Computer Vision Lab and German Cancer Research Center | Extended SAM's click-based foreground separation to full click-based semantic segmentation and instance segmentation. |
023 | SAM Medical Imaging | SAM Medical Imaging | - | Code | - | SAM for Medical Imaging. |
024 | 3D-Box | 3D-Box via Segment Anything | - | Code | - | SAM is extended to 3D perception by combining it with VoxelNeXt. |
025 | Anything-3D | - | - | Code | - | Anything 3DNovel View, Anything-NeRF, Any 3DFace. |
026 | L2SET | Learning to Segment EveryThing | - | Code | UC Berkeley, FAIR | A new partially supervised training paradigm for instance segmentation. |
027 | Edit Anything | Edit Anything by Segment-Anything | - | Code | - | Edit anything in images powered by SAM, ControlNet, StableDiffusion, \etc. |
028 | Image Edit Anything | IEA: Image Editing Anything | - | Code | - | Using stable diffusion and SAM for image editing. |
029 | SAM for Stable Diffusion Webui | Segment Anything for Stable Diffusion WebUI | - | Code | - | This extension aim for connecting AUTOMATIC1111 Stable Diffusion WebUI and Mikubill ControlNet Extension with SAM and GroundingDINO to enhance Stable Diffusion/ControlNet inpainting. |
030 | Earth Observation Tools | Segment Anything EO tools | Colab | Code | - | An earth observation tools for SAM. |
031 | Moving Object Detection | Towards Segmenting Anything That Moves | - | Code | - | A project about SAM + Moving Object Detection. |
032 | OCR-SAM | Optical Character Recognition with Segment Anything | Project page | Code | - | Combining MMOCR with SAM and Stable Diffusion. |
033 | SALT | Segment Anything Labelling Tool | - | Code | - | A project uses the SAM Model and adds a barebones interface to label images and saves the masks in the COCO format. |
034 | Prompt Segment Anything | Prompt Segment Anything | - | Code | - | An implementation of zero-shot instance segmentation using SAM. |
035 | SAM-RBox | - | - | Code | - | A project uses SAM for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2. |
036 | VISAM | MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors | - | Code | - | Combining SAM with MOT, it create the era of "MOTS". |
037 | SegEO | Segment Anything EO tools | - | Code | - | The tools are developed to ease the processing of spatial data (GeoTIFF and TMS) with SAM using sliding window algorithm for big files. |
038 | Napari Segment Anything | Napari Segment Anything | Project page | Code | - | SAM native Qt UI. |
039 | Segment-Anything-U-Specify | Segment-Anything-U-Specify | - | Code | - | Using CLIP and SAM to segment any instance you specify with text prompt of any instance names. |
040 | SegDrawer | Simple static web-based mask drawer | Colab | Code | - | Simple static web-based mask drawer, supporting semantic segmentation with SAM. |
041 | Track Anything | Segment Anything Meets Videos | Hugging Face | Code | SUSTech | Track-Anything is a flexible and interactive tool for video object tracking and segmentation. |
042 | Count Anything | - | - | Code | - | A method uses SAM and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation. |
043 | RAM | Relate Anything Model | Hugging Face | Code | MMLab, NTU and VisCom Lab, KCL/TongJi | Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image. |
044 | Segment Any RGBD | Segment Any RGBD | Project page | Code | - | Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM. |
045 | Show Anything | Show Anything | Hugging Face | Code | Showlab, NUS | Some Applications that are compatible with both SAM and Generation. |
046 | Transfer Any Style | Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate | - | Code | LV-lab, NUS | An interactive demo based on Segment-Anything for style transfer which enables different content regions apply different styles. |
047 | Caption Anything | - | Colab | Code | VIP lab, SUSTech | Caption-Anything is a versatile image processing tool that combines the capabilities of SAM, Visual Captioning, and ChatGPT. |
048 | Image2Paragraph | Transform Image Into Unique Paragraph | Project page | Code | - | Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet. |
049 | LIME SAM | Local Interpretable Model-agnostic Explanations Segment Anything | Colab | Code | - | LIME-SAM aims to create an Explainable Artificial Intelligence (XAI) framework for image classification using LIME (Local Interpretable Model-agnostic Explanations) as the base algorithm, with the super-pixel method replaced by SAM. |
050 | Paint Anything | - | - | Code | - | An interactive demo based on SAM for stroke-based painting which enables human-like painting. |
051 | SAMed | Customized Segment Anything Model for Medical Image Segmentation | Colab | Code | USTC | SAMed is built upon the large-scale image segmentation model, SAM, to explore the new research paradigm of customizing large-scale models for medical image segmentation. |
052 | Personalize SAM | Personalize Segment Anything with 1 Shot in 10 Seconds | Hugging Face | Code | MMLab, CUHK | A training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM can segment specific visual concepts. |
053 | Open-vocabulary-Segment-Anything | Open-vocabulary-Segment-Anything | - | Code | - | Combining OwlViT with Segment Anything - Open-vocabulary Detection and Segmentation (Text-conditioned, and Image-conditioned). |
054 | Labal-Anything-Pipeline | Label-Anything-Pipeline | - | Code | ZJU | Annotation anything in visual tasks just all in one-pipeline with GPT-4 and SAM. |
055 | Grounded-Segment-Any-Parts | Grounded Segment Anything: From Objects to Parts | Project page | Code | HKU | Expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level(eg, dog) and part-level(eg, dog head). |
056 | AnyLabeling | AnyLabeling | Youtube page | Code | - | Effortless AI-assisted data labeling with AI support from Segment Anything and YOLO. |
057 | SSA | Semantic-Segment-Anything | Project page | Code | - | Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B). |
058 | RefSAM | Label Data with Segment Anything in Roboflow | Project page | Code | - | Referring Image Segmentation Benchmarking with Segment Anything Model (SAM). |
059 | Roboflow Annotate | Launch: Label Data with Segment Anything in Roboflow | Project page | APP | Roboflow | SAM-assisted labeling for training computer vision models. |
060 | ImageBind SAM | - | - | Code | IDEA-Research | This is an experimental demo aims to combine ImageBind and SAM to generate mask with different modalities. |
061 | X-AnyLabeling | X-AnyLabeling | Code | CVHub | A new interactive automatic labeling tool based on AnyLabeling. | |
062 | Segment Anything + NNCF | - | Code | - | OpenVINO™ NNCF for segment anything encoder quantization acceleration. | |
063 | YOLOv8 + SAM | - | - | - | Use SAM in YOLOv8. | |
064 | SearchAnything | SearchAnything | Zhihu blog, Twitter | Code | CAS and MSRA | A semantic local search engine powered by various AI models. |
065 | SAM Meets Stable Diffusion | - | Code | PaddlePaddle | Segment and generate Anything. | |
066 | Language Segment-Anything | - | - | Code | - | SAM with text prompts generates masks for specific objects in images. |
067 | Expedit-SAM | - | - | Code | - | Expediting SAM without Fine-tuning. |
068 | Segment-Anything-Fast | Accelerating Generative AI with PyTorch: Segment Anything, Fast | Project page | Code | Team PyTorch | A batched offline inference oriented version of segment-anything. |
069 | YOLOv9+SAM | YOLOv9+SAM | Project page | Code | - | Dynamic Detection and Segmentation with YOLOv9+SAM. |
070 | LiteMedSAM | LiteMedSAM | Project page | Code | - | A lightweight version of MedSAM for fast training and inference. |
- VainF/Awesome-Anything
- Hedlen/Awesome Segment Anything
- Vision-Intelligence-and-Robots-Group/Awesome-Segment-Anything
- JerryX1110/Awesome-segment-anything-extensions
- dk-liang/Awesome-Segment-Anything
This project is released under the MIT license. Please see the LICENSE file for more information.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Segment-Anything
Similar Open Source Tools
Awesome-Segment-Anything
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
awesome-large-audio-models
This repository is a curated list of awesome large AI models in audio signal processing, focusing on the application of large language models to audio tasks. It includes survey papers, popular large audio models, automatic speech recognition, neural speech synthesis, speech translation, other speech applications, large audio models in music, and audio datasets. The repository aims to provide a comprehensive overview of recent advancements and challenges in applying large language models to audio signal processing, showcasing the efficacy of transformer-based architectures in various audio tasks.
Awesome-LLM-Preference-Learning
The repository 'Awesome-LLM-Preference-Learning' is the official repository of a survey paper titled 'Towards a Unified View of Preference Learning for Large Language Models: A Survey'. It contains a curated list of papers related to preference learning for Large Language Models (LLMs). The repository covers various aspects of preference learning, including on-policy and off-policy methods, feedback mechanisms, reward models, algorithms, evaluation techniques, and more. The papers included in the repository explore different approaches to aligning LLMs with human preferences, improving mathematical reasoning in LLMs, enhancing code generation, and optimizing language model performance.
awesome-open-ended
A curated list of open-ended learning AI resources focusing on algorithms that invent new and complex tasks endlessly, inspired by human advancements. The repository includes papers, safety considerations, surveys, perspectives, and blog posts related to open-ended AI research.
Awesome-Story-Generation
Awesome-Story-Generation is a repository that curates a comprehensive list of papers related to Story Generation and Storytelling, focusing on the era of Large Language Models (LLMs). The repository includes papers on various topics such as Literature Review, Large Language Model, Plot Development, Better Storytelling, Story Character, Writing Style, Story Planning, Controllable Story, Reasonable Story, and Benchmark. It aims to provide a chronological collection of influential papers in the field, with a focus on citation counts for LLMs-era papers and some earlier influential papers. The repository also encourages contributions and feedback from the community to improve the collection.
awesome-generative-information-retrieval
This repository contains a curated list of resources on generative information retrieval, including research papers, datasets, tools, and applications. Generative information retrieval is a subfield of information retrieval that uses generative models to generate new documents or passages of text that are relevant to a given query. This can be useful for a variety of tasks, such as question answering, summarization, and document generation. The resources in this repository are intended to help researchers and practitioners stay up-to-date on the latest advances in generative information retrieval.
Awesome-Quantization-Papers
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
Awesome-LLM-Reasoning-Openai-o1-Survey
The repository 'Awesome LLM Reasoning Openai-o1 Survey' provides a collection of survey papers and related works on OpenAI o1, focusing on topics such as LLM reasoning, self-play reinforcement learning, complex logic reasoning, and scaling law. It includes papers from various institutions and researchers, showcasing advancements in reasoning bootstrapping, reasoning scaling law, self-play learning, step-wise and process-based optimization, and applications beyond math. The repository serves as a valuable resource for researchers interested in exploring the intersection of language models and reasoning techniques.
Awesome-LLM-RAG
This repository, Awesome-LLM-RAG, aims to record advanced papers on Retrieval Augmented Generation (RAG) in Large Language Models (LLMs). It serves as a resource hub for researchers interested in promoting their work related to LLM RAG by updating paper information through pull requests. The repository covers various topics such as workshops, tutorials, papers, surveys, benchmarks, retrieval-enhanced LLMs, RAG instruction tuning, RAG in-context learning, RAG embeddings, RAG simulators, RAG search, RAG long-text and memory, RAG evaluation, RAG optimization, and RAG applications.
Awesome-LLM4RS-Papers
This paper list is about Large Language Model-enhanced Recommender System. It also contains some related works. Keywords: recommendation system, large language models
LLM4DB
LLM4DB is a repository focused on the intersection of Large Language Models (LLMs) and Database technologies. It covers various aspects such as data processing, data analysis, database optimization, and data management for LLMs. The repository includes research papers, tools, and techniques related to leveraging LLMs for tasks like data cleaning, entity matching, schema matching, data discovery, NL2SQL, data exploration, data visualization, knob tuning, query optimization, and database diagnosis.
awesome-llm-security
Awesome LLM Security is a curated collection of tools, documents, and projects related to Large Language Model (LLM) security. It covers various aspects of LLM security including white-box, black-box, and backdoor attacks, defense mechanisms, platform security, and surveys. The repository provides resources for researchers and practitioners interested in understanding and safeguarding LLMs against adversarial attacks. It also includes a list of tools specifically designed for testing and enhancing LLM security.
awesome-llm-role-playing-with-persona
Awesome-llm-role-playing-with-persona is a curated list of resources for large language models for role-playing with assigned personas. It includes papers and resources related to persona-based dialogue systems, personalized response generation, psychology of LLMs, biases in LLMs, and more. The repository aims to provide a comprehensive collection of research papers and tools for exploring role-playing abilities of large language models in various contexts.
ai4math-papers
The 'ai4math-papers' repository contains a collection of research papers related to AI applications in mathematics, including automated theorem proving, synthetic theorem generation, autoformalization, proof refactoring, premise selection, benchmarks, human-in-the-loop interactions, and constructing examples/counterexamples. The papers cover various topics such as neural theorem proving, reinforcement learning for theorem proving, generative language modeling, formal mathematics statement curriculum learning, and more. The repository serves as a valuable resource for researchers and practitioners interested in the intersection of AI and mathematics.
glossAPI
The glossAPI project aims to develop a Greek language model as open-source software, with code licensed under EUPL and data under Creative Commons BY-SA. The project focuses on collecting and evaluating open text sources in Greek, with efforts to prioritize and gather textual data sets. The project encourages contributions through the CONTRIBUTING.md file and provides resources in the wiki for viewing and modifying recorded sources. It also welcomes ideas and corrections through issue submissions. The project emphasizes the importance of open standards, ethically secured data, privacy protection, and addressing digital divides in the context of artificial intelligence and advanced language technologies.
For similar tasks
Awesome-Segment-Anything
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
Time-LLM
Time-LLM is a reprogramming framework that repurposes large language models (LLMs) for time series forecasting. It allows users to treat time series analysis as a 'language task' and effectively leverage pre-trained LLMs for forecasting. The framework involves reprogramming time series data into text representations and providing declarative prompts to guide the LLM reasoning process. Time-LLM supports various backbone models such as Llama-7B, GPT-2, and BERT, offering flexibility in model selection. The tool provides a general framework for repurposing language models for time series forecasting tasks.
crewAI
CrewAI is a cutting-edge framework designed to orchestrate role-playing autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. It enables AI agents to assume roles, share goals, and operate in a cohesive unit, much like a well-oiled crew. Whether you're building a smart assistant platform, an automated customer service ensemble, or a multi-agent research team, CrewAI provides the backbone for sophisticated multi-agent interactions. With features like role-based agent design, autonomous inter-agent delegation, flexible task management, and support for various LLMs, CrewAI offers a dynamic and adaptable solution for both development and production workflows.
Transformers_And_LLM_Are_What_You_Dont_Need
Transformers_And_LLM_Are_What_You_Dont_Need is a repository that explores the limitations of transformers in time series forecasting. It contains a collection of papers, articles, and theses discussing the effectiveness of transformers and LLMs in this domain. The repository aims to provide insights into why transformers may not be the best choice for time series forecasting tasks.
pytorch-forecasting
PyTorch Forecasting is a PyTorch-based package for time series forecasting with state-of-the-art network architectures. It offers a high-level API for training networks on pandas data frames and utilizes PyTorch Lightning for scalable training on GPUs and CPUs. The package aims to simplify time series forecasting with neural networks by providing a flexible API for professionals and default settings for beginners. It includes a timeseries dataset class, base model class, multiple neural network architectures, multi-horizon timeseries metrics, and hyperparameter tuning with optuna. PyTorch Forecasting is built on pytorch-lightning for easy training on various hardware configurations.
spider
Spider is a high-performance web crawler and indexer designed to handle data curation workloads efficiently. It offers features such as concurrency, streaming, decentralization, headless Chrome rendering, HTTP proxies, cron jobs, subscriptions, smart mode, blacklisting, whitelisting, budgeting depth, dynamic AI prompt scripting, CSS scraping, and more. Users can easily get started with the Spider Cloud hosted service or set up local installations with spider-cli. The tool supports integration with Node.js and Python for additional flexibility. With a focus on speed and scalability, Spider is ideal for extracting and organizing data from the web.
AI_for_Science_paper_collection
AI for Science paper collection is an initiative by AI for Science Community to collect and categorize papers in AI for Science areas by subjects, years, venues, and keywords. The repository contains `.csv` files with paper lists labeled by keys such as `Title`, `Conference`, `Type`, `Application`, `MLTech`, `OpenReviewLink`. It covers top conferences like ICML, NeurIPS, and ICLR. Volunteers can contribute by updating existing `.csv` files or adding new ones for uncovered conferences/years. The initiative aims to track the increasing trend of AI for Science papers and analyze trends in different applications.
pytorch-forecasting
PyTorch Forecasting is a PyTorch-based package designed for state-of-the-art timeseries forecasting using deep learning architectures. It offers a high-level API and leverages PyTorch Lightning for efficient training on GPU or CPU with automatic logging. The package aims to simplify timeseries forecasting tasks by providing a flexible API for professionals and user-friendly defaults for beginners. It includes features such as a timeseries dataset class for handling data transformations, missing values, and subsampling, various neural network architectures optimized for real-world deployment, multi-horizon timeseries metrics, and hyperparameter tuning with optuna. Built on pytorch-lightning, it supports training on CPUs, single GPUs, and multiple GPUs out-of-the-box.
For similar jobs
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.
exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).
AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.
ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.
obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.