Touchstone

[NeurIPS 2024] Touchstone - Benchmarking AI on 5,172 o.o.d. CT volumes and 9 anatomical structures

Stars: 98

Visit

README:

Touchstone Benchmark

Subscribe us: https://groups.google.com/u/2/g/bodymaps

We present Touchstone, a large-scale medical segmentation benchmark based on annotated 5,195 CT volumes from 76 hospitals for training, and 6,933 CT volumes from 8 additional hospitals for testing. We invite AI inventors to train their models on AbdomenAtlas, and we independently evaluate their algorithms. We have already collaborated with 14 influential research teams, and we remain accepting new submissions.

Paper

Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Pedro R. A. S. Bassi¹, Wenxuan Li¹, Yucheng Tang², Fabian Isensee³, ..., Alan Yuille¹, Zongwei Zhou¹
¹Johns Hopkins University, ²NVIDIA, ³DKFZ
NeurIPS 2024

Touchstone 1.0 Leaderboard

rank	model	organization	average DSC
🏆	MedNeXt	DKFZ	89.2
🏆	STU-Net-B	Shanghai AI Lab	89.0
🏆	MedFormer	Rutgers	89.0
🏆	nnU-Net ResEncL	DKFZ	88.8
🏆	UniSeg	NPU	88.8
🏆	Diff-UNet	HKUST	88.5
🏆	LHU-Net	UR	88.0
🏆	NexToU	HIT	87.8
9	SegVol	BAAI	87.1
10	U-Net & CLIP	CityU	87.1
11	Swin UNETR & CLIP	CityU	86.7
12	Swin UNETR	NVIDIA	80.1
13	UNesT	NVIDIA	79.1
14	SAM-Adapter	Duke	73.4
15	UNETR	NVIDIA	64.4

Aorta - NexToU 🏆

rank	model	organization	DSC
🏆	NexToU	HIT	86.4
2	MedNeXt	DKFZ	83.1
3	UniSeg	NPU	82.3
4	STU-Net-B	Shanghai AI Lab	82.1
5	nnU-Net ResEncL	DKFZ	81.4
6	Diff-UNet	HKUST	81.2
7	Swin UNETR	NVIDIA	81.1
8	SegVol	BAAI	80.2
9	UNesT	NVIDIA	78.6
10	Swin UNETR & CLIP	CityU	78.1
11	U-Net & CLIP	CityU	77.1
12	SAM-Adapter	Duke	62.8
13	UNETR	NVIDIA	52.1

Gallbladder - STU-Net-B & MedNeXt 🏆

rank	model	organization	DSC
🏆	STU-Net-B	Shanghai AI Lab	85.5
🏆	MedNeXt	DKFZ	85.3
3	nnU-Net ResEncL	DKFZ	84.9
4	UniSeg	NPU	84.7
5	Diff-UNet	HKUST	83.8
6	NexToU	HIT	82.3
7	U-Net & CLIP	CityU	82.1
8	Swin UNETR & CLIP	CityU	80.2
9	SegVol	BAAI	79.3
10	Swin UNETR	NVIDIA	69.2
11	UNesT	NVIDIA	62.1
12	SAM-Adapter	Duke	49.4
13	UNETR	NVIDIA	43.8

KidneyL - Diff-UNet 🏆

rank	model	organization	DSC
🏆	Diff-UNet	HKUST	91.9
2	nnU-Net ResEncL	DKFZ	91.9
3	STU-Net-B	Shanghai AI Lab	91.9
4	MedNeXt	DKFZ	91.8
5	SegVol	BAAI	91.8
6	UniSeg	NPU	91.5
7	U-Net & CLIP	CityU	91.1
8	Swin UNETR & CLIP	CityU	91.0
9	NexToU	HIT	89.6
10	SAM-Adapter	Duke	87.3
11	Swin UNETR	NVIDIA	85.5
12	UNesT	NVIDIA	85.4
13	UNETR	NVIDIA	63.7

KidneyR - Diff-UNet 🏆

rank	model	organization	DSC
🏆	Diff-UNet	HKUST	92.8
2	MedNeXt	DKFZ	92.6
3	nnU-Net ResEncL	DKFZ	92.6
4	STU-Net-B	Shanghai AI Lab	92.5
5	SegVol	BAAI	92.5
6	UniSeg	NPU	92.2
7	U-Net & CLIP	CityU	91.9
8	Swin UNETR & CLIP	CityU	91.7
9	SAM-Adapter	Duke	90.4
10	NexToU	HIT	90.1
11	UNesT	NVIDIA	83.6
12	Swin UNETR	NVIDIA	81.7
13	UNETR	NVIDIA	69.6

Liver - MedNeXt 🏆

rank	model	organization	DSC
🏆	MedNeXt	DKFZ	96.3
2	nnU-Net ResEncL	DKFZ	96.3
3	Diff-UNet	HKUST	96.2
4	STU-Net-B	Shanghai AI Lab	96.2
5	UniSeg	NPU	96.1
6	U-Net & CLIP	CityU	96.0
7	SegVol	BAAI	96.0
8	Swin UNETR & CLIP	CityU	95.8
9	NexToU	HIT	95.7
10	SAM-Adapter	Duke	94.1
11	UNesT	NVIDIA	93.6
12	Swin UNETR	NVIDIA	93.5
13	UNETR	NVIDIA	90.5

Pancreas - MedNeXt 🏆

rank	model	organization	DSC
🏆	MedNeXt	DKFZ	83.3
2	STU-Net-B	Shanghai AI Lab	83.2
3	nnU-Net ResEncL	DKFZ	82.9
4	UniSeg	NPU	82.7
5	Diff-UNet	HKUST	81.9
6	U-Net & CLIP	CityU	80.8
7	Swin UNETR & CLIP	CityU	80.2
8	NexToU	HIT	80.2
9	SegVol	BAAI	79.1
10	Swin UNETR	NVIDIA	68.5
11	UNesT	NVIDIA	68.3
12	UNETR	NVIDIA	55.1
13	SAM-Adapter	Duke	50.2

Postcava - STU-Net-B & MedNeXt 🏆

rank	model	organization	DSC
🏆	STU-Net-B	Shanghai AI Lab	81.3
🏆	MedNeXt	DKFZ	81.3
3	UniSeg	NPU	81.2
4	Diff-UNet	HKUST	80.8
5	nnU-Net ResEncL	DKFZ	80.5
6	U-Net & CLIP	CityU	78.5
7	NexToU	HIT	78.1
8	SegVol	BAAI	77.8
9	Swin UNETR & CLIP	CityU	76.8
10	Swin UNETR	NVIDIA	69.9
11	UNesT	NVIDIA	66.2
12	UNETR	NVIDIA	53.9
13	SAM-Adapter	Duke	48.0

Spleen - nnU-Net ResEncL 🏆

rank	model	organization	DSC
🏆	nnU-Net ResEncL	DKFZ	95.2
2	MedNeXt	DKFZ	95.2
3	STU-Net-B	Shanghai AI Lab	95.1
4	Diff-UNet	HKUST	95.0
5	UniSeg	NPU	94.9
6	SegVol	BAAI	94.5
7	NexToU	HIT	94.7
8	U-Net & CLIP	CityU	94.3
9	Swin UNETR & CLIP	CityU	94.1
10	SAM-Adapter	Duke	90.5
11	Swin UNETR	NVIDIA	87.9
12	UNesT	NVIDIA	86.7
13	UNETR	NVIDIA	76.5

Stomach - STU-Net-B & MedNeXt & nnU-Net ResEncL 🏆

rank	model	organization	DSC
🏆	STU-Net-B	Shanghai AI Lab	93.5
🏆	MedNeXt	DKFZ	93.5
🏆	nnU-Net ResEncL	DKFZ	93.4
4	UniSeg	NPU	93.3
5	Diff-UNet	HKUST	93.1
6	NexToU	HIT	92.7
7	SegVol	BAAI	92.5
8	U-Net & CLIP	CityU	92.4
9	Swin UNETR & CLIP	CityU	92.2
10	SAM-Adapter	Duke	88.0
11	UNesT	NVIDIA	87.6
12	Swin UNETR	NVIDIA	84.1
13	UNETR	NVIDIA	74.2

Touchstone 1.0 Dataset

Training set

Touchstone 1.0: AbdomenAtlas1.0Mini (N=5,195)
Touchstone 2.0: AbdomenAtlas1.1Mini (N=9,262)

Test set

Proprietary JHH dataset (N=5,172)
Public TotalSegmentator V2 dataset (N=1,228)

Figure 1. Metadata distribution in the test set.

Touchstone 1.0 Model

[!NOTE] We are releasing the trained AI models evaluated in Touchstone right here. Stay tuned!

rank	model	average DSC	parameter	infer. speed
🏆	MedNeXt	89.2	61.8M	★☆☆☆☆
🏆	STU-Net-B	89.0	58.3M	★★☆☆☆
🏆	MedFormer	89.0	38.5M	★★★☆☆
🏆	nnU-Net ResEncL	88.8	102.0M	★★★★☆
🏆	UniSeg	88.8	31.0M	☆☆☆☆☆
🏆	Diff-UNet	88.5	434.0M	★★★☆☆
🏆	LHU-Net	88.0	8.6M	★★★★★
🏆	NexToU	87.8	81.9M	★★★★☆
9	SegVol	87.1	181.0M	★★★★☆
10	U-Net & CLIP	87.1	19.1M	★★★☆☆
11	Swin UNETR & CLIP	86.7	62.2M	★★★☆☆
12	Swin UNETR	80.1	72.8M	★★★★★
13	UNesT	79.1	87.2M	★★★★★
14	SAM-Adapter	73.4	11.6M	★★★★☆
15	UNETR	64.4	101.8M	★★★★★

Citation

Please cite the following papers if you find our study helpful.

@article{bassi2024touchstone,
  title={Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?},
  author={Bassi, Pedro RAS and Li, Wenxuan and Tang, Yucheng and Isensee, Fabian and Wang, Zifu and Chen, Jieneng and Chou, Yu-Cheng and Kirchhoff, Yannick and Rokuss, Maximilian and Huang, Ziyan and Ye, Jin and He, Junjun and Wald, Tassilo and Ulrich, Constantin and Baumgartner, Michael and Roy, Saikat and Maier-Hein, Klaus H. and Jaeger, Paul and Ye, Yiwen and Xie, Yutong and Zhang, Jianpeng and Chen, Ziyang and Xia, Yong and Xing, Zhaohu and Zhu, Lei and Sadegheih, Yousef and Bozorgpour, Afshin and Kumari, Pratibha and Azad, Reza and Merhof, Dorit and Shi, Pengcheng and Ma, Ting and Du, Yuxin and Bai, Fan and Huang, Tiejun and Zhao, Bo and Wang, Haonan and Li, Xiaomeng and Gu, Hanxue and Dong, Haoyu and Yang, Jichen and Mazurowski, Maciej A. and Gupta, Saumya and Wu, Linshan and Zhuang, Jiaxin and Chen, Hao and Roth, Holger and Xu, Daguang and Blaschko, Matthew B. and Decherchi, Sergio and Cavalli, Andrea and Yuille, Alan L. and Zhou, Zongwei},
  journal={Conference on Neural Information Processing Systems},
  year={2024},
  utl={https://github.com/MrGiovanni/Touchstone}
}

@article{li2024abdomenatlas,
  title={AbdomenAtlas: A large-scale, detailed-annotated, \& multi-center dataset for efficient transfer learning and open algorithmic benchmarking},
  author={Li, Wenxuan and Qu, Chongyu and Chen, Xiaoxi and Bassi, Pedro RAS and Shi, Yijia and Lai, Yuxiang and Yu, Qian and Xue, Huimin and Chen, Yixiong and Lin, Xiaorui and others},
  journal={Medical Image Analysis},
  pages={103285},
  year={2024},
  publisher={Elsevier}
}

@inproceedings{li2024well,
  title={How Well Do Supervised Models Transfer to 3D Image Segmentation?},
  author={Li, Wenxuan and Yuille, Alan and Zhou, Zongwei},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

@article{qu2023abdomenatlas,
  title={Abdomenatlas-8k: Annotating 8,000 CT volumes for multi-organ segmentation in three weeks},
  author={Qu, Chongyu and Zhang, Tiezheng and Qiao, Hualin and Tang, Yucheng and Yuille, Alan L and Zhou, Zongwei and others},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2023}
}

Acknowledgement

This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research and the McGovern Foundation. Paper content is covered by patents pending.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Touchstone

Similar Open Source Tools

Touchstone

github

: 98

Awesome-LLM-Resources-List

Awesome LLM Resources is a curated collection of resources for Large Language Models (LLMs) covering various aspects such as serverless hosting, accessing off-the-shelf models via API, local inference, LLM serving frameworks, open-source LLM web chat UIs, renting GPUs for fine-tuning, fine-tuning with no-code UI, fine-tuning frameworks, OS agentic/AI workflow, AI agents, co-pilots, voice API, open-source TTS models, OS RAG frameworks, research papers on chain-of-thought prompting, CoT implementations, CoT fine-tuned models & datasets, and more.

github

: 126

EmoLLM

EmoLLM is a series of large-scale psychological health counseling models that can support **understanding-supporting-helping users** in the psychological health counseling chain, which is fine-tuned from `LLM` instructions. Welcome everyone to star~⭐⭐. The currently open source `LLM` fine-tuning configurations are as follows:

github

: 1.3k

cgft-llm

The cgft-llm repository is a collection of video tutorials and documentation for implementing large models. It provides guidance on topics such as fine-tuning llama3 with llama-factory, lightweight deployment and quantization using llama.cpp, speech generation with ChatTTS, introduction to Ollama for large model deployment, deployment tools for vllm and paged attention, and implementing RAG with llama-index. Users can find detailed code documentation and video tutorials for each project in the repository.

github

: 1.1k

DI-engine

github

: 2.9k

widgets

Widgets is a desktop component front-end open source component. The project is still being continuously improved. The desktop component client can be downloaded and run in two ways: 1. https://www.microsoft.com/store/productId/9NPR50GQ7T53 2. https://widgetjs.cn After cloning the code, you need to download the dependency in the project directory: `shell pnpm install` and run: `shell pnpm serve`

github

: 228

fastapi

智元 Fast API is a one-stop API management system that unifies various LLM APIs in terms of format, standards, and management, achieving the ultimate in functionality, performance, and user experience. It supports various models from companies like OpenAI, Azure, Baidu, Keda Xunfei, Alibaba Cloud, Zhifu AI, Google, DeepSeek, 360 Brain, and Midjourney. The project provides user and admin portals for preview, supports cluster deployment, multi-site deployment, and cross-zone deployment. It also offers Docker deployment, a public API site for registration, and screenshots of the admin and user portals. The API interface is similar to OpenAI's interface, and the project is open source with repositories for API, web, admin, and SDK on GitHub and Gitee.

github

: 245

JiwuChat

JiwuChat is a lightweight multi-platform chat application built on Tauri2 and Nuxt3, with various real-time messaging features, AI group chat bots (such as 'iFlytek Spark', 'KimiAI' etc.), WebRTC audio-video calling, screen sharing, and AI shopping functions. It supports seamless cross-device communication, covering text, images, files, and voice messages, also supporting group chats and customizable settings. It provides light/dark mode for efficient social networking.

github

: 400

llm-book

The 'llm-book' repository is dedicated to the introduction of large-scale language models, focusing on natural language processing tasks. The code is designed to run on Google Colaboratory and utilizes datasets and models available on the Hugging Face Hub. Note that as of July 28, 2023, there are issues with the MARC-ja dataset links, but an alternative notebook using the WRIME Japanese sentiment analysis dataset has been added. The repository covers various chapters on topics such as Transformers, fine-tuning language models, entity recognition, summarization, document embedding, question answering, and more.

github

: 291

PaddleNLP

PaddleNLP is an easy-to-use and high-performance NLP library. It aggregates high-quality pre-trained models in the industry and provides out-of-the-box development experience, covering a model library for multiple NLP scenarios with industry practice examples to meet developers' flexible customization needs.

github

: 12.5k

MedicalGPT

MedicalGPT is a training medical GPT model with ChatGPT training pipeline, implement of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization).

github

: 3.6k

awesome-VLLMs

github

: 52

ruoyi-vue-pro

The ruoyi-vue-pro repository is an open-source project that provides a comprehensive development platform with various functionalities such as system features, infrastructure, member center, data reports, workflow, payment system, mall system, ERP system, CRM system, and AI big model. It is built using Java backend with Spring Boot framework and Vue frontend with different versions like Vue3 with element-plus, Vue3 with vben(ant-design-vue), and Vue2 with element-ui. The project aims to offer a fast development platform for developers and enterprises, supporting features like dynamic menu loading, button-level access control, SaaS multi-tenancy, code generator, real-time communication, integration with third-party services like WeChat, Alipay, and cloud services, and more.

github

: 28.9k

Awesome-LLM-Tabular

This repository is a curated list of research papers that explore the integration of Large Language Model (LLM) technology with tabular data. It aims to provide a comprehensive resource for researchers and practitioners interested in this emerging field. The repository includes papers on a wide range of topics, including table-to-text generation, table question answering, and tabular data classification. It also includes a section on related datasets and resources.

github

: 335

yudao-boot-mini

yudao-boot-mini is an open-source project focused on developing a rapid development platform for developers in China. It includes features like system functions, infrastructure, member center, data reports, workflow, mall system, WeChat official account, CRM, ERP, etc. The project is based on Spring Boot with Java backend and Vue for frontend. It offers various functionalities such as user management, role management, menu management, department management, workflow management, payment system, code generation, API documentation, database documentation, file service, WebSocket integration, message queue, Java monitoring, and more. The project is licensed under the MIT License, allowing both individuals and enterprises to use it freely without restrictions.

github

: 54

yudao-cloud

Yudao-cloud is an open-source project designed to provide a fast development platform for developers in China. It includes various system functions, infrastructure, member center, data reports, workflow, mall system, WeChat public account, CRM, ERP, etc. The project is based on Java backend with Spring Boot and Spring Cloud Alibaba microservices architecture. It supports multiple databases, message queues, authentication systems, dynamic menu loading, SaaS multi-tenant system, code generator, real-time communication, integration with third-party services like WeChat, Alipay, and more. The project is well-documented and follows the Alibaba Java development guidelines, ensuring clean code and architecture.

github

: 16.5k

For similar tasks

No tools available

For similar jobs

No tools available