Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

Stars: 1565

Visit

This repository is a curated list of papers related to 3D tasks empowered by Large Language Models (LLMs). It covers tasks such as 3D understanding, reasoning, generation, and embodied agents. The repository also includes other Foundation Models like CLIP and SAM to provide a comprehensive view of the area. It is actively maintained and updated to showcase the latest advances in the field. Users can find a variety of research papers and projects related to 3D tasks and LLMs in this repository.

README:

Awesome-LLM-3D

🏠 About

Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models (LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents. Also, we include other Foundation Models (CLIP, SAM) for the whole picture of this area.

This is an active repository, you can watch for following the latest advances. If you find it useful, please kindly star ⭐ this repo and cite the paper.

🔥 News

[2024-05-16] 📢 Check out the first survey paper in the 3D-LLM domain: When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
[2024-01-06] Runsen Xu added chronological information and Xianzheng Ma reorganized it in Z-A order for better following the latest advances.
[2023-12-16] Xianzheng Ma and Yash Bhalgat curated this list and published the first version;

Table of Content

Awesome-LLM-3D

3D Understanding via LLM

Date	Keywords	Institute (first)	Paper	Publication	Others
2025-02-02	LSceneLLM	SCUT	LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences	CVPR '25	project
2025-01-02	GPT4Scene	HKU	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Arxiv	project
2024-12-03	Video-3D LLM	CUHK	Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	Arxiv	project
2024-10-12	Situation3D	UIUC	Situational Awareness Matters in 3D Vision Language Reasoning	CVPR '24	project
2024-09-28	LLaVA-3D	HKU	LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness	Arxiv	project
2024-09-08	MSR3D	BIGAI	Multi-modal Situated Reasoning in 3D Scenes	NeurIPS '24	project
2024-08-28	GreenPLM	HUST	More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding	Arxiv	github
2024-06-17	LLaNA	UniBO	LLaNA: Large Language and NeRF Assistant	NeurIPS '24	project
2024-06-07	SpatialPIN	Oxford	SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors	NeurIPS '24	project
2024-06-03	SpatialRGPT	UCSD	SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models	NeurIPS '24	github
2024-05-02	MiniGPT-3D	HUST	MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors	ACM MM '24	project
2024-02-27	ShapeLLM	XJTU	ShapeLLM: Universal 3D Object Understanding for Embodied Interaction	Arxiv	project
2024-01-22	SpatialVLM	Google DeepMind	SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities	CVPR '24	project
2023-12-21	LiDAR-LLM	PKU	LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding	Arxiv	project
2023-12-15	3DAP	Shanghai AI Lab	3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V	Arxiv	project
2023-12-13	Chat-Scene	ZJU	Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	NeurIPS '24	github
2023-12-5	GPT4Point	HKU	GPT4Point: A Unified Framework for Point-Language Understanding and Generation	Arxiv	github
2023-11-30	LL3DA	Fudan University	LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning	Arxiv	github
2023-11-26	ZSVG3D	CUHK(SZ)	Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding	Arxiv	project
2023-11-18	LEO	BIGAI	An Embodied Generalist Agent in 3D World	ICML '24	github
2023-10-14	JM3D-LLM	Xiamen University	JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues	ACM MM '23	github
2023-10-10	Uni3D	BAAI	Uni3D: Exploring Unified 3D Representation at Scale	ICLR '24	project
2023-9-27	-	KAUST	Zero-Shot 3D Shape Correspondence	Siggraph Asia '23	-
2023-9-21	LLM-Grounder	U-Mich	LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent	ICRA '24	github
2023-9-1	Point-Bind	CUHK	Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following	Arxiv	github
2023-8-31	PointLLM	CUHK	PointLLM: Empowering Large Language Models to Understand Point Clouds	ECCV '24	github
2023-8-17	Chat-3D	ZJU	Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes	Arxiv	github
2023-8-8	3D-VisTA	BIGAI	3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	ICCV '23	github
2023-7-24	3D-LLM	UCLA	3D-LLM: Injecting the 3D World into Large Language Models	NeurIPS '23	github
2023-3-29	ViewRefer	CUHK	ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding	ICCV '23	github
2022-9-12	-	MIT	Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding	Arxiv	github

3D Understanding via other Foundation Models

ID	keywords	Institute (first)	Paper	Publication	Others
2024-10-12	Lexicon3D	UIUC	Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding	NeurIPS '24	project
2024-10-07	Diff2Scene	CMU	Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models	ECCV 2024	project
2024-04-07	Any2Point	Shanghai AI Lab	Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding	ECCV 2024	github
2024-03-16	N2F2	Oxford-VGG	N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields	Arxiv	-
2023-12-17	SAI3D	PKU	SAI3D: Segment Any Instance in 3D Scenes	Arxiv	project
2023-12-17	Open3DIS	VinAI	Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance	Arxiv	project
2023-11-6	OVIR-3D	Rutgers University	OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data	CoRL '23	github
2023-10-29	OpenMask3D	ETH	OpenMask3D: Open-Vocabulary 3D Instance Segmentation	NeurIPS '23	project
2023-10-5	Open-Fusion	-	Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation	Arxiv	github
2023-9-22	OV-3DDet	HKUST	CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection	NeurIPS '23	github
2023-9-19	LAMP	-	From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception	OpenReview	-
2023-9-15	OpenNerf	-	OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	OpenReview	github
2023-9-1	OpenIns3D	Cambridge	OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation	Arxiv	project
2023-6-7	Contrastive Lift	Oxford-VGG	Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion	NeurIPS '23	github
2023-6-4	Multi-CLIP	ETH	Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes	Arxiv	-
2023-5-23	3D-OVS	NTU	Weakly Supervised 3D Open-vocabulary Segmentation	NeurIPS '23	github
2023-5-21	VL-Fields	University of Edinburgh	VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations	ICRA '23	project
2023-5-8	CLIP-FO3D	Tsinghua University	CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP	ICCVW '23	-
2023-4-12	3D-VQA	ETH	CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes	CVPRW '23	github
2023-4-3	RegionPLC	HKU	RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding	Arxiv	project
2023-3-20	CG3D	JHU	CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition	Arxiv	github
2023-3-16	LERF	UC Berkeley	LERF: Language Embedded Radiance Fields	ICCV '23	github
2023-2-14	ConceptFusion	MIT	ConceptFusion: Open-set Multimodal 3D Mapping	RSS '23	project
2023-1-12	CLIP2Scene	HKU	CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP	CVPR '23	github
2022-12-1	UniT3D	TUM	UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding	ICCV '23	github
2022-11-29	PLA	HKU	PLA: Language-Driven Open-Vocabulary 3D Scene Understanding	CVPR '23	github
2022-11-28	OpenScene	ETHz	OpenScene: 3D Scene Understanding with Open Vocabularies	CVPR '23	github
2022-10-11	CLIP-Fields	NYU	CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory	Arxiv	project
2022-7-23	Semantic Abstraction	Columbia	Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models	CoRL '22	project
2022-4-26	ScanNet200	TUM	Language-Grounded Indoor 3D Semantic Segmentation in the Wild	ECCV '22	project

3D Reasoning

Date	keywords	Institute (first)	Paper	Publication	Others
2024-09-08	MSR3D	BIGAI	Multi-modal Situated Reasoning in 3D Scenes	NeurIPS '24	project
2023-5-20	3D-CLR	UCLA	3D Concept Learning and Reasoning from Multi-View Images	CVPR '23	github
-	Transcribe3D	TTI, Chicago	Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning	CoRL '23	github

3D Generation

Date	keywords	Institute	Paper	Publication	Others
2023-11-29	ShapeGPT	Fudan University	ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model	Arxiv	github
2023-11-27	MeshGPT	TUM	MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers	Arxiv	project
2023-10-19	3D-GPT	ANU	3D-GPT: Procedural 3D Modeling with Large Language Models	Arxiv	github
2023-9-21	LLMR	MIT	LLMR: Real-time Prompting of Interactive Worlds using Large Language Models	Arxiv	-
2023-9-20	DreamLLM	MEGVII	DreamLLM: Synergistic Multimodal Comprehension and Creation	Arxiv	github
2023-4-1	ChatAvatar	Deemos Tech	DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance	ACM TOG	website

3D Embodied Agent

Date	keywords	Institute	Paper	Publication	Others
2024-01-22	SpatialVLM	Deepmind	SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities	CVPR '24	project
2023-12-05	NaviLLM	CUHK	Towards Learning a Generalist Model for Embodied Navigation	CVPR '24	project
2023-11-27	Dobb-E	NYU	On Bringing Robots Home	Arxiv	github
2023-11-26	STEVE	ZJU	See and Think: Embodied Agent in Virtual Environment	Arxiv	github
2023-11-18	LEO	BIGAI	An Embodied Generalist Agent in 3D World	ICML '24	github
2023-9-14	UniHSI	Shanghai AI Lab	Unified Human-Scene Interaction via Prompted Chain-of-Contacts	Arxiv	github
2023-7-28	RT-2	Google-DeepMind	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	Arxiv	github
2023-7-12	SayPlan	QUT Centre for Robotics	SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning	CoRL '23	github
2023-7-12	VoxPoser	Stanford	VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models	Arxiv	github
2022-12-13	RT-1	Google	RT-1: Robotics Transformer for Real-World Control at Scale	Arxiv	github
2022-12-8	LLM-Planner	The Ohio State University	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	ICCV '23	github
2022-10-11	CLIP-Fields	NYU, Meta	CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory	RSS '23	github
2022-09-20	NLMap-SayCan	Google	Open-vocabulary Queryable Scene Representations for Real World Planning	ICRA '23	github

3D Benchmarks

Date	keywords	Institute	Paper	Publication	Others
2025-03-28	Beacon3D	BIGAI	Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis	CVPR '25	project
2025-03-08	3D-CoT	PolyU, EIT	Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning	Arxiv	dataset
2024-09-08	MSQA / MSNN	BIGAI	Multi-modal Situated Reasoning in 3D Scenes	NeurIPS '24	project
2024-06-10	3D-GRAND / 3D-POPE	UMich	3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination	Arxiv	project
2024-06-03	SpatialRGPT-Bench	UCSD	SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models	NeurIPS '24	github
2024-1-18	SceneVerse	BIGAI	SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding	ECCV '24	github
2023-12-26	EmbodiedScan	Shanghai AI Lab	EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI	Arxiv	github
2023-12-17	M3DBench	Fudan University	M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts	Arxiv	github
2023-11-29	-	DeepMind	Leveraging VLM-Based Pipelines to Annotate 3D Objects	ICML '24	github
2023-09-14	CrossCoherence	UniBO	Looking at words and points with attention: a benchmark for text-to-shape coherence	ICCV '23	github
2022-10-14	SQA3D	BIGAI	SQA3D: Situated Question Answering in 3D Scenes	ICLR '23	github
2021-12-20	ScanQA	RIKEN AIP	ScanQA: 3D Question Answering for Spatial Scene Understanding	CVPR '23	github
2020-12-3	Scan2Cap	TUM	Scan2Cap: Context-aware Dense Captioning in RGB-D Scans	CVPR '21	github
2020-8-23	ReferIt3D	Stanford	ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes	ECCV '20	github
2019-12-18	ScanRefer	TUM	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	ECCV '20	github

Contributing

Your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.

If you have any questions about this opinionated list, please get in touch at [email protected] or Wechat ID: mxz1997112.

Star History

Citation

If you find this repository useful, please consider citing this paper:

@misc{ma2024llmsstep3dworld,
      title={When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models}, 
      author={Xianzheng Ma and Yash Bhalgat and Brandon Smart and Shuai Chen and Xinghui Li and Jian Ding and Jindong Gu and Dave Zhenyu Chen and Songyou Peng and Jia-Wang Bian and Philip H Torr and Marc Pollefeys and Matthias Nießner and Ian D Reid and Angel X. Chang and Iro Laina and Victor Adrian Prisacariu},
      year={2024},
      journal={arXiv preprint arXiv:2405.10255},
}

Acknowledgement

This repo is inspired by Awesome-LLM

For Tasks:

Click tags to check more tools for each tasks

explore 3d scenes generate 3d shapes understand 3d objects plan robot tasks train embodied agents

For Jobs:

research scientist machine learning engineer data scientist ai researcher computer vision engineer

Alternative AI tools for Awesome-LLM-3D

Similar Open Source Tools

Awesome-LLM-3D

github

: 1.6k

speech-trident

Speech Trident is a repository focusing on speech/audio large language models, covering representation learning, neural codec, and language models. It explores speech representation models, speech neural codec models, and speech large language models. The repository includes contributions from various researchers and provides a comprehensive list of speech/audio language models, representation models, and codec models.

github

: 636

Awesome-LLM4IE-Papers

github

: 645

AudioLLM

AudioLLMs is a curated collection of research papers focusing on developing, implementing, and evaluating language models for audio data. The repository aims to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. It includes models for speech interaction, speech recognition, speech translation, audio generation, and more. Additionally, it covers methodologies like multitask audioLLMs and segment-level Q-Former, as well as evaluation benchmarks like AudioBench and AIR-Bench. Adversarial attacks such as VoiceJailbreak are also discussed.

github

: 71

Cool-GenAI-Fashion-Papers

Cool-GenAI-Fashion-Papers is a curated list of resources related to GenAI-Fashion, including papers, workshops, companies, and products. It covers a wide range of topics such as fashion design synthesis, outfit recommendation, fashion knowledge extraction, trend analysis, and more. The repository provides valuable insights and resources for researchers, industry professionals, and enthusiasts interested in the intersection of AI and fashion.

github

: 129

Github-Ranking-AI

This repository provides a list of the most starred and forked repositories on GitHub. It is updated automatically and includes information such as the project name, number of stars, number of forks, language, number of open issues, description, and last commit date. The repository is divided into two sections: LLM and chatGPT. The LLM section includes repositories related to large language models, while the chatGPT section includes repositories related to the chatGPT chatbot.

github

: 227

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

open-llms

Open LLMs is a repository containing various Large Language Models licensed for commercial use. It includes models like T5, GPT-NeoX, UL2, Bloom, Cerebras-GPT, Pythia, Dolly, and more. These models are designed for tasks such as transfer learning, language understanding, chatbot development, code generation, and more. The repository provides information on release dates, checkpoints, papers/blogs, parameters, context length, and licenses for each model. Contributions to the repository are welcome, and it serves as a resource for exploring the capabilities of different language models.

github

: 10.3k

Awesome-Tabular-LLMs

This repository is a collection of papers on Tabular Large Language Models (LLMs) specialized for processing tabular data. It includes surveys, models, and applications related to table understanding tasks such as Table Question Answering, Table-to-Text, Text-to-SQL, and more. The repository categorizes the papers based on key ideas and provides insights into the advancements in using LLMs for processing diverse tables and fulfilling various tabular tasks based on natural language instructions.

github

: 151

LLM-Agent-Survey

Autonomous agents are designed to achieve specific objectives through self-guided instructions. With the emergence and growth of large language models (LLMs), there is a growing trend in utilizing LLMs as fundamental controllers for these autonomous agents. This repository conducts a comprehensive survey study on the construction, application, and evaluation of LLM-based autonomous agents. It explores essential components of AI agents, application domains in natural sciences, social sciences, and engineering, and evaluation strategies. The survey aims to be a resource for researchers and practitioners in this rapidly evolving field.

github

: 2.2k

Awesome-LLM-Papers-Comprehensive-Topics

github

: 172

Awesome-LLM-Eval

Awesome-LLM-Eval: a curated list of tools, benchmarks, demos, papers for Large Language Models (like ChatGPT, LLaMA, GLM, Baichuan, etc) Evaluation on Language capabilities, Knowledge, Reasoning, Fairness and Safety.

github

: 280

Awesome-Model-Merging-Methods-Theories-Applications

A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

github

: 347

ai-game-devtools

github

: 735

LLM4EC

LLM4EC is an interdisciplinary research repository focusing on the intersection of Large Language Models (LLM) and Evolutionary Computation (EC). It provides a comprehensive collection of papers and resources exploring various applications, enhancements, and synergies between LLM and EC. The repository covers topics such as LLM-assisted optimization, EA-based LLM architecture search, and applications in code generation, software engineering, neural architecture search, and other generative tasks. The goal is to facilitate research and development in leveraging LLM and EC for innovative solutions in diverse domains.

github

: 78

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on LLMs inference and serving.

github

: 184

For similar tasks

Awesome-LLM-3D

github

: 1.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

ShapeLLM

ShapeLLM is the first 3D Multimodal Large Language Model designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. It supports single-view colored point cloud input and introduces a robust 3D QA benchmark, 3D MM-Vet, encompassing various variants. The model extends the powerful point encoder architecture, ReCon++, achieving state-of-the-art performance across a range of representation learning tasks. ShapeLLM can be used for tasks such as training, zero-shot understanding, visual grounding, few-shot learning, and zero-shot learning on 3D MM-Vet.

github

: 86

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675