llms-tools

A list of LLMs Tools & Projects

Stars: 278

Visit

The 'llms-tools' repository is a comprehensive collection of AI tools, open-source projects, and research related to Large Language Models (LLMs) and Chatbots. It covers a wide range of topics such as AI in various domains, open-source models, chats & assistants, visual language models, evaluation tools, libraries, devices, income models, text-to-image, computer vision, audio & speech, code & math, games, robotics, typography, bio & med, military, climate, finance, and presentation. The repository provides valuable resources for researchers, developers, and enthusiasts interested in exploring the capabilities of LLMs and related technologies.

README:

LLMs Tools & Research Projects

The repository contains a list of ready-to-use AI Tools, Open Sources, and Research Projects
Apart from LLMs, you can find here new AI research from other areas such as Computer Vision, etc.
Welcome to contribute.

Nobel Prize

The Nobel Prize in Physics 2024 was awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks”.

The Nobel Prize in Chemistry 2024 was awarded with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction”.

Jürgen Schmidhuber's Post: The NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science

Turing Award

TURING AWARD in 2025 was recognized Andrew Barto and Richard Sutton as Pioneers of Reinforcement Learning

Large Language Models (LLMs) and Chatbots

Courses

Video

Andrej Karpathy
-- State of GPT, [1hr Talk] Intro to Large Language Models, How I use LLMs
Andrew Ng: Opportunities in AI - 2023, AI Dev 25: Code, learn, connect
Sequoia Capital: AI Ascent 2024, AI Ascent 2025
Greg Brockman | The Inside Story of ChatGPT’s Astonishing Potential
Sam Altman | GPT-4 Turbo | OpenAI DevDay, Opening Keynote
Gemma Developer Day Paris | Gemma 3 | Google for Developers
3Blue1Brown | Neural networks

Reading

Visualization

The Rise and Rise of A.I. LLMs & their associated bots like ChatGPT
Opening up ChatGPT: tracking openness of instruction-tuned LLMs
Generative AI exists because of the transformer
Can an AI make a data-driven, visual story?

Competition

Red‑Teaming Challenge - OpenAI gpt-oss-20b - find any previously undetected vulnerabilities and harmful behaviors — from lying and deceptive alignment to reward‑hacking exploits (Deadline: Aug, 2025)
AI Mathematical Olympiad - Progress Prize 2 - solve national-level math challenges using artificial intelligence models (Deadline: Mar, 2025)
Google - Unlock Global Communication with Gemma - create Gemma model variants for a specific language or unique cultural aspect (Deadline: Jan, 2024)
Google - Gemini Long Context - demonstrate interesting use cases for Gemini's long context window (Deadline: Dec, 2024)
Gemini API Developer Competition - build incredible apps with the Gemini API, $1 million in cash prizes (Deadline: Sep, 2024)

Models

	2021-22	2023	2024	2025
Google, DeepMind	LaMDA, GLaM PaLM, Chinchilla	Bard, PaLM-2, Gemini	Gemini 1.5, Gemini 1.5 Flash, Gemma, Gemma 2, Gemini 2.0	Gemini 2.0 Flash Thinking, Gemma 3, Gemini 2.5, Gemini 2.5 Flash, Gemma 3n, Gemini Diffusion, Gemma 3 270M
OpenAI	ChatGPT	GPT-4, GPT-4 Turbo	GPT-4o, GPT-4o mini, CriticGPT, o1-preview, o1-mini, o1	o3-mini, Deep Research, GPT-4.5, GPT-4.1, o3 and o4-mini, gpt-oss, GPT-5
MetaAI	Galactica	LLaMA, LLaMA2: HF, Purple Llama	Llama 3, Llama 3.1, Llama 3.2, quantized Llama, Llama 3.3 70B	Llama 4
Mistral AI		Mistral 7B, Mixtral of experts	Mistral Large, Mistral Large 2, Mistral NeMo, Pixtral 12B, Pixtral Large, Ministral 3B and Ministral 8B	Mistral Small 3, Mistral Small 3.1, Mistral Medium 3, Magistral
Anthropic	RL-CAI	Claude, Claude2, Claude2.1	Claude 3, Claude 3.5 Sonnet	Claude 3.7 Sonnet, Claude for Education, Claude 4, Claude Opus 4.1
Microsoft		phi-1, phi-1.5, phi-2	phi-3, phi-3.5, phi-4	phi-4-multimodal, Phi-4-reasoning
Stability AI		Stable Vicuna, StableLM, Stable LM 3B, Stable Beluga, Stable Chat, Stable LM Zephyr 3B	Stable LM 2 1.6B, Stable LM 2 12B
Inflection AI		Inflection-2	Inflection-2.5
TII		Falcon	Falcon Mamba 7B, Falcon 3	Falcon-Edge, Falcon-H1, Falcon-Arabic
Cohere		Aya	Command R+, Rerank 3, Aya Expanse	Aya Vision, Command A, Command A Reasoning
xAI			Grok-1, Grok-1.5, Grok-2	Grok-3, Grok-4
NVIDIA			Nemotron-4 340B, Minitron-4B-Base, NVLM 1.0, Llama-3.1-Nemotron-70B-Instruct	NVIDIA Llama Nemotron
AI2			Molmo, Tulu3, OLMo 2	Tülu 3 405B, OLMo 2 32B
AI21Lab			Jamba, Jamba 1.5	Jamba 1.6
Abacus.AI		Giraffe	Smaug-72B-v0.1
Alibaba Cloud			Qwen, Qwen2, Qwen2.5	Qwen2.5-Max, QwQ-32B, Qwen2.5 Omni, Qwen3, Qwen3-Next
AWS		Titan	Nova	Nova Act, Nova Premier
DeepSeek			DeepSeek-V2.5, DeepSeek-V3, DeepSeek-R1-Lite-Preview	DeepSeek-R1, DeepSeek-V3.1

Open Source Models

Model	Company	Date	Notes
gpt-oss	OpenAI	2025-08-05	20b and 120b
MiniMax Family	MiniMax	2025-01-15	language and visual
OLMo 2	AI2	2024-11-26
Qwen2.5 Family	Alibaba Cloud	2024-09-19	some versions
phi-3	Microsoft	2023-05-21
Qwen2 Family	Alibaba Cloud	2024-06-07	some versions
Llama Family	MetaAI
DBRX	Databricks	2024-03-27	a general purpose LLM
Gemma	Google	2024-02-21
phi-2	Microsoft	2023-12-12
Mistral 7B	Mistral	2023-09-27	Apache 2.0

Cogito v1 Preview, Cogito v2 Preview - from inference-time search to self-improvement (4 hybrid reasoning models)
GLM-4.5 - Reasoning, Coding, and Agentic Abililties
Kimi k1.5 - Scaling Reinforcement Learning with LLMs, Kimi K2 - Open Agentic Intelligence
MiniMax-M1 - the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism
BAGEL - the open-source Unified Multimodal Model you can fine-tune, distill and deploy anywhere, offering comparable functionality to proprietary systems like GPT-4o and Gemini 2.0 in an open form
Open-R1, updates - a fully open reproduction of DeepSeek-R1
Mercury - diffusion LLM that are up to 10x faster and cheaper than current LLMs, pushing the frontier of intelligence and speed for LMs, by Inception
Granite, Granite 3.0, Granite 3.2 - a family of open, performant, and trusted AI models, tailored for business and optimized to scale your AI applications, by IBM
SmolLM, SmolVLM2 - a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset, by Hugging Face
LLaDA - Large Language Diffusion with mAsking - a diffusion model with an unprecedented 8B scale, trained entirely from scratch, rivaling LLaMA3 8B in performance
R1 1776 - a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information
s1 - minimal recipe for test-time scaling and strong reasoning performance matching o1-preview with just 1,000 examples & budget forcing
Sky-T1 - reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks
Transformer² - a ML system that dynamically adjusts its weights for various tasks, by Sakana
SmallThinker-3B-preview - a 3 billion parameter o1-like language model designed to excel at reasoning tasks (fine-tuned from the Qwen2.5-3b-Instruct model)
voyage-3 & voyage-3-lite - a new generation of small yet mighty general-purpose embedding models
Tencent-Hunyuan-Large - the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters
Paperguide - AI Research Assistant, Reference Manager and Writing Assistant that help you understand papers, manage references, annotate/take notes, and supercharge your writing
Gemma Scope Demo - a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic
Hermes 3 - the latest version in our Hermes series, available in 3 sizes, 8, 70, and 405B parameters
SearchGPT - a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources
InternLM 2.5 - outstanding reasoning capability, 1M context window, stronger tool use
FILM - repo can help you to reproduce the results of FILM-7B, a 32K-context LLM that overcomes the lost-in-the-middle problem. FILM-7B is trained from Mistral-7B-Instruct-v0.2 by applying Information-Intensie (In2) Training, by Microsoft
gpt2-chatbots (aka GPT-4o)
Snowflake Arctic - an enterprise-focused LLM designed to provide cost-effective training and openness
Reka Core - Multimodal LLM
ChatFlow - a no-code platform that lets you set up an OpenAI-powered chatbot for your website
Ferret - An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response, by Apple
NotebookLM - a powerful new interface that lets you shift effortlessly from reading to asking questions to writing, with an AI thought partner helping you at every turn
LLM360 - enables community-owned AGI through open-source large model research and development (K2-65B, CrystalCoder-7B, Amber-7B)
Mirasol - a multimodal model for learning across audio, video, & text that decouples the modeling into separate autoregressive models to process the inputs according to the characteristics of their modalities, for state-of-the-art performance, by Google
UniIR - Universal Multimodal Information Retrievers, framework to learn a single retriever to accomplish (possibly) any retrieval task
Tulu-2-DPO model - RLHF method DPO scales to 70B parameters, clearly compare PEFT fine-tuning to full-parameter fine-tuning
Phind, Phind-70B - model that matches and exceeds GPT-4's coding abilities while running 5x faster
Vicuna-13B - an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT, by Vicuna Team
Alpaca 7B - a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations
Koala - a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web, by Berkeley-BAIR
FacTool - a tool augmented framework for detecting factual errors of texts generated by LLMs. Factool now supports 4 tasks: knowledge-based QA, code generation, mathematical reasoning, scientific literature review
Nougat - Neural Optical Understanding for Academic Documents, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents, by MetaAI
TextFX - AI-powered tools for rappers, writers and wordsmiths
Prompt2Model - a system that takes a natural language task description (like the prompts used for LLMs such as ChatGPT) to train a small special-purpose model that is conducive for deployment
ToolBench - open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability
Platypus - a family of fine-tuned and merged LLMs that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work
OpenFlamingo V2 - an open-source effort to replicate DeepMind's Flamingo models
MetaGPT - a framework involving LLM-based multi-agents that encodes human standardized operating procedures (SOPs) to extend complex problem-solving capabilities that mimic efficient human workflows
Universal and Transferable Adversarial Attacks on Aligned Language Models
FlashAttention - an algorithm to speed up attention and reduce its memory footprint—without any approximation
Quivr - utilizes the power of Generative AI to store and retrieve unstructured information
LongLLaMA - a LLM capable of handling long contexts of 256k tokens or even more
OpenLLaMA - open source reproduction of MetaAI’s LLaMA
BuboGPT - an advanced LLM that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects
LAION - Large-scale Artificial Intelligence Open Network
GPT4All, Code - an open-source assistant-style LLM that run locally on your CPU
SdkVercelAI - you can input a prompt, pick different LLMS, and compare two side by side
Open Assistant - a completely open-source ChatGPT alternative
Baize - an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself
Chameleon - a compositional reasoning framework designed to enhance LLMs and overcome their inherent limitations, such as outdated information and lack of precise reasoning
Bloom - an autoregressive LLM, trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources
Pythia - the hub for EleutherAI's work on interpretability and learning dynamics

Chats & Assistants

Chat	Company	Notes
Andrew Ng	DeeplearningAI	Chat with Andrew Ng
Playground	THe Allen Institute for AI, AI2
Stable Assistant	Stability AI	latest text and image generation technology featuring Stable Diffusion 3, Stable Video, Stable Image Services and Stable LM 2 12B
Moshi	Kyutai	engaging conversations limited to 5 minutes, thinks and speaks at the same time
MetaAI	MetaAI
character.ai	Character.AI	talk with fictional AI characters
POE	Quora	talk to ChatGPT, GPT-4, Claude 3 Opus, DALLE 3, and millions of others
Hume	Hume	empathic AI voice chat
Pi	Inflection AI
Gemini	Google
ChatRTX	Nvidia	runs locally on your PC
Le Chat	Mistral AI
Copilot	Microsoft
Perplexity	PerplexityAI
ChatGPT	OpenAI

AI Chat - get ChatGPT, Gemini, Claude, Grok & Husky AI - without jumping between apps
SciSummary - use AI to summarize scientific articles and research papers
PdfGPT - summarize any PDF; get instant, accurate answers from long research papers, legal documents, and manuals
ChatPDF - chat with any PDF
Dmwithme - virtual companion with realistic emotions that roleplays as ai friend

Offline-Mode

Google AI Edge Gallery - an experimental app that puts the power of cutting-edge Generative AI models directly into your hands, running entirely on your Android (available now) and iOS (coming soon) devices
msty - the easiest way to use local and online AI models
aider - AI pair programming in your terminal
Open Interpreter - an open-source, locally running implementation of OpenAI's Code Interpreter
ollama - get up and running with Llama 3, Mistral, Gemma, and other LLMs
Dalai - run LLaMA and Alpaca on your computer
LLaMAChat - allows you to chat with LLaMa, Alpaca and GPT4All models all running locally on your CPU
OpenLLM - an open-source platform designed to facilitate the deployment and operation of LLMs in real-world applications
LM Studio - an easy way to run open-source LLMs locally
Jan - open-source ChatGPT alternative that runs 100% offline on your computer
Pinokio - a browser that lets you install, run, and programmatically control ANY application, automatically

Large Visual Language Models (LVLMs)

QVQ-Max - visual reasoning model that can not only “understand” the content in images and videos but also analyze and reason with this information to provide solutions
PaliGemma, PaliGemma 2, PaliGemma 2 Mix - fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more, by Google
Eagle Family - Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs, by NVIDIA
Qwen-VL, Qwen2-VL, Qwen2.5-VL - a significant leap from the previous Qwen2-VL: Understand things visually; Being agentic; Understanding long videos and capturing events; Capable of visual localization in different formats; Generating structured outputs
Janus - Unified Multimodal Understanding and Generation Models, by DeepSeek
Moondream 1.9B - a tiny open-source vision AI that brings powerful image understanding to your applications and runs everywhere
Idefics2 - it can answer questions about images, describe visual content, create stories grounded in multiple images, extract information from documents, and perform basic arithmetic operations
Grok-1.5 Vision - can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs, by xAI
CogVLM & CogAgent - an 18 billion parameter visual language model specializing in GUI understanding and navigation; supports high-resolution inputs (1120x1120) and shows abilities in tasks such as visual Q&A, visual grounding, and GUI Agent
AnyText - Multilingual Visual Text Generation And Editing
AnomalyGPT - the LVLM based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds
IDEFICS - an open-access VLM based on Flamingo. The model accepts arbitrary sequences of image and text inputs and produces text outputs, aiming to bring transparency to AI systems and serve as a foundation for open research in multimodal AI systems
Prismer - a data- and parameter-efficient VLM that leverages an ensemble of diverse, pre-trained domain experts
MiniGPT-4 - upload an image, and then use chat to identify what's in the picture and learn more about it
MultiModal-GPT - a vision and language model for multi-round dialogue with humans; the model is fine-tuned from OpenFlamingo, with LoRA added in the cross-attention and self-attention parts of the language model
MoE-LLaVA - Mixture-of-Experts for Large Vision-Language Models
LLaVA - a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding
TaskMatrix - connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting

Evaluation

Humanity's Last Exam - a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage
LMEval - a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets, by Google
SuperGPQA - a comprehensive benchmark designed to evaluate the knowledge and reasoning abilities of LLMs across 285 graduate-level disciplines
FACTS Grounding - a new benchmark for evaluating the factuality of LLMs, by DeepMind
MLE-bench - a benchmark for measuring how well AI agents perform at machine learning engineering, by OpenAI
JailbreakBench - Repository of jailbreak artifacts, Standardized evaluation framework, Leaderboard, Dataset
SWE-bench Verified - a benchmark for evaluating LLMs’ abilities to solve real-world software issues sourced from GitHub, by OpenAI
SWE-bench - Can Language Models Resolve Real-world Github Issues?
promptbench - a Unified Library for Evaluating and Understanding LLM
Vibe-Eval - evaluation suite for measuring progress of multimodal language models, by Reka
FACET - FAirness in Computer Vision EvaluaTion - a new comprehensive benchmark for evaluating the fairness of computer vision models across classification, detection, instance segmentation, and visual grounding tasks
Arthur Bench - an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models
AgentBench - the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments
L-Eval - a comprehensive long-context language models evaluation suite with 18 long document tasks across multiple domains that require reasoning over long texts, including summarization, question answering, in-context learning with long CoT examples, topic retrieval, and paper writing assistance
OpenICL - an open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs
OpenAGI - an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models

Leaderboards

Embedding Leaderboard - compares 100+ text and image embedding models across 1000+ languages
Game Arena - a new benchmarking platform where AI models and agents compete head-to-head in a variety of strategic games to help chart new frontiers for trustworthy AI evaluation, by Kaggle
OpenRouter - a unified interface for LLMs
MathArena - a platform for evaluation of LLMs on the latest math competitions and olympiads
EU AI Act Compliance Leaderboard - the high-level regulatory requirements of the EU AI Act as concrete technical requirements
AgentBoard - a benchmark designed for multi-turn LLM agents, complemented by an analytical evaluation board for detailed model assessment beyond final success rates
LLM Hallucination Index - A Ranking & Evaluation Framework For LLM Hallucinations
Artificial Analysis - Text to Image AI Model & Provider Leaderboard across quality, generation time, and price
SEAL Leaderboards - Safety, Evaluations and Alignment Lab: (i) generate code, (ii) work on Spanish-language inputs and outputs, (iii) follow detailed instructions, and (iv) solve fifth-grade math problems, by Scale AI
HELM - Holistic Evaluation of Language Models projec - leaderboards with many scenarios, metrics, and models with support for multimodality and model-graded evaluation, by Stanford
vals.ai - an independent model testing service, developed benchmarks that rank LLM performance of tasks associated with income taxes, corporate finance, and contract law; it also maintains a pre-existing legal benchmark, by Vals AI
TrustLLM - a comprehensive study of Trustworthiness in LLMs
LMSYS Chatbot Arena - an open platform to evaluate LLMs by human preference in the real-world
Open LLM Leaderboard - evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness, a unified framework to test generative language models on a large number of different evaluation tasks
LLM-Perf Leaderboard - a benchmark the performance (latency, throughput, memory & energy) of LLMs with different hardwares, backends and optimizations using Optimum-Benhcmark
Hallucinations Leaderboard - evaluates the propensity for hallucination in LLMs across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, and Hallucination Detection
NPHardEval leaderboard - a benchmark for assessing the reasoning abilities of LLMs through the lens of computational complexity classes
LLM Safety Leaderboard - evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs
The Open Medical-LLM Leaderboard - aims to track, rank and evaluate the performance of LLMs on medical question answering tasks
TheFastest.AI - site that provides reliable measurements for the performance of popular models
GAIA Leaderboard - evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc)

Datasets

Instruction tuning datasets - open-source instruction tuning datasets, models, papers, repositories
InfiMM-WebMath-40B Dataset - large-scale, open-source multimodal dataset specifically designed for mathematical reasoning tasks
MMMLU - Multilingual Massive Multitask Language Understanding
Natural Questions - contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question

Libraries

aisuite - simple, unified interface to multiple Generative AI providers, by Andrew Ng Team
LangChain , Tutorials - a framework for developing applications powered by language models
Open Agent Platform - a no-code agent building platform, by LangChain
Gemini Fullstack LangGraph - application serves as an example of building research-augmented conversational AI using LangGraph and Google's Gemini models
LlamaIndex, docs - a “data framework” to help you build LLM apps
LLaMA2-Accessory - an open-source toolkit for pre-training, fine-tuning and deployment of LLMs and mutlimodal LLMs
LLaMA-Adapter - a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models
streaming-llm - Efficient Streaming Language Models with Attention Sinks
llamafile - run LLMs with a single file
outlines, docs - a library to write reliable programs for interactions with generative models: language models, diffusers, multimodal models, classifiers, etc
OneLLM - One Framework to Align All Modalities with Language
guidance - interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text
nanoGPT - the simplest, fastest repository for training/finetuning medium-sized GPTs
TorchScale - a PyTorch library that allows researchers and developers to scale up Transformers efficiently and effectively
InvokeAI - an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator
ComfyUI - a powerful and modular Stable Diffusion GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface
StableSwarmUI - Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility
Wanda - Pruning LLMs by Weights and Activation: removes weights on a per-output basis, by the product of weight magnitudes and input activation norms
LOMO: LOw-Memory Optimization - a new optimizer, which fuses the gradient computation and the parameter update in one step to reduce memory usage
LMFlow - an extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community
Heron - a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. Additionally, we provide pretrained weights trained on various datasets
Curated Transformers - a transformer library for PyTorch. It provides state-of-the-art models that are composed from a set of reusable components, by Explosion
spacy-llm - integrates LLMs into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required, by Explosion
Medusa - a simple framework that democratizes the acceleration techniques for LLM generation with multiple decoding heads
Self-RAG - a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs
Mirascope, docs - a toolkit for developing production-ready LLM-powered tools using Python and Pydantic
gateway — route to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency
corenet - a library for training deep neural networks for variety of tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation
MONSTER API - a platform for no code LLM fine tuning and deployments
Lamini Platform - a LLM platform that seamlessly integrates every step of the model refinement and deployment process – making model selection, model tuning and inference usage incredibly straightforward for your dev team
PowerInfer - a CPU/GPU LLM inference engine leveraging activation locality for your device
mixtral-offloading - efficient inference of Mixtral-8x7B models
bitnet.cpp - is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next)
LayerSkip - end-to-end solution promises to accelerate LLM generation times without the need for specialized hardware, by MetaAI
Lingua - a lean, efficient, and easy-to-hack codebase to research LLMs, by MetaAI
fairchem - the FAIR Chemistry's centralized repository of all its data, models, demos, and application efforts for materials science and quantum chemistry, by MetaAI
LangExtract - a Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization, by Google

Agents

rowboat - AI-powered multi-agent builder, powered by OpenAI's Agents SDK
UI-TARS-1.5 - an open-source multimodal agent built upon a powerful vision-language model. It is capable of effectively performing diverse tasks within virtual worlds
A2A - an open protocol enabling communication and interoperability between opaque agentic applications, by Google
Browser-Use - enable AI to control your browser
smolagents - a very simple library that unlocks agentic capabilities for language models
OpenHands - a platform for software development agents powered by AI
Multi-Agent Orchestrator - framework for managing multiple AI agents and handling complex conversations, by AWS
swarm - educational framework exploring ergonomic, lightweight multi-agent orchestration, by OpenAI
Agent-S - an open agentic framework that uses computers like a human
TEN-Agent - a real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities
bee-agent-framework - open-source framework for building, deploying, and serving powerful agentic workflows at scale
agent.exe - the easiest way to let Claude's new computer use capabilities take over your computer
Pearl - a production-ready RL AI Agent Library, by MetaAI
OpenAgents - an open platform for using and hosting language agents in the wild of everyday life
agents - an open-source library/framework for building autonomous language agents
ChatDev - highly customizable and extendable framework, which is based on LLMs and serves as an ideal scenario for studying collective intelligence
JARVIS-1 - Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models, generate sophisticated plans, and perform embodied control, within the open-world Minecraft universe
AppAgent - Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone app

AI Code Editors

Cursor -
Kiro -
Windsurf Editor -
Replit -
Kilo Code - AI coding agent for VS Code
Recurse - custom semantic code checks for PRs
Jules - asynchronous coding agent, by Google
Claude Code - code onboarding, turn issues into PRs, make powerful edits
Claude Code Security Reviewer - an AI-powered security review GitHub Action using Claude to analyze code changes for security vulnerabilities

Devices

Reachy Mini - open-source robot designed for human-robot interaction, creative coding, and AI experimentation, by Hugging Face
Omi - AI wearables that revolutionize how you capture and manage conversations
NotePin - wearable AI memory capsule, by Plaud
biped.ai - an AI wearable vest that helps blind and visually impaired people avoid obstacles, follow GPS instructions, and find crosswalks or door
LPU Inference Engine - Language Processing Units, by Groq
FigureAI - AI robotics company bringing a general purpose humanoid to life
SanctuaryAI - company on a mission to create the world’s first human-like intelligence in general-purpose robots
Mytra - warehouse robotics
friend - AI-Powered Necklace companion designed not to help you get things done but to be there for you—anytime, anywhere
Limitless - personalized AI powered by what you’ve seen, said, and heard
rabbit r1 - a personalized operating system through a natural language interface
01 Project - the open-source language model computer, by Open Interpreter

Glasses

Halo X - smart glasses powered by Google Gemini and Perplexity AI that listen to, record, and transcribe every conversation around you
Aria Gen 2 - a wearable device that combines the latest advancements in computer vision, machine learning, and sensor technology, by MetaAI
G1 - , by Even Realities
AirGo Vision - Audio Smartglasses powered by ChatGPT, by Solosglasses
Ray-Ban Meta Smart Glasses - a 12 MP camera and five-mic system, updates, by Ray-Ban & MetaAI
Frame - AI glasses designed to be worn as a pair of glasses with a suite of AI capabilities out of the box, by Brilliant Labs
air2 - , by xreal
TCL RayNeo X2 - AR Glasses, by RayNeo

Income

Poe - price-per-message revenue model for AI bot creators
GPTs Store - create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills
Voice Library - share your voice in the Voice Library today and earn cash rewards when it's used
HuggingChat - making the community's best AI chat models available to everyone

Tools

Text-to-Image	Text-to-Music	Text-to-Video	Games	Brand	Prompt Generator
Midjourney	Mubert	fal.ai	Leonardo.Ai - Assets	Flair	G-prompter
Adobe Firefly	Waveformer	PIKA LABS	Dreamlab - Animated Sprites	Logolivery	Prompt Builder
Catbird	Morph Studio	Kaiber	Didimo		Midjourney PromptHelper1
BlueWillow		Invideo	Scenario - Assets		Midjourney PromptHelper2
Lexica		Moonvalley	Skybox - World-building		FlowGPT
Imgcreator		ilumine AI	Bezi - 3D Assets	Anthropic
Craiyon		LTX Studio	Charmed - 3D Assets

Text-to-image

	Models
Google	Muse, Imagen, Parti, HyperDreamBooth, DreamBooth StyleDrop, Imagen 2, ImageFX, Imagen 3
OpenAI	CLIP, DALL·E, DALL·E 2, DALL·E 3, 4o Image Generation
MetaAI	CM3leon, Emu Video, Emu Edit, Imagine
Stability.ai	Stable Diffusion XL, DreamStudio, Clipdrop, DeepFloyd IF: (Code, Demo: HF) SDXL Turbo, Stable Cascade, Stable Diffusion 3, Stable Diffusion 3 Medium, Adversarial Diffusion Distillation, Stable Diffusion 3.5, Stable Diffusion 3.5 Large
Black Forest Labs	FLUX.1, FLUX1.1 [pro], FLUX1.1 [pro] Ultra, FLUX.1 Tools FLUX Pro Finetuning API, FLUX.1 Kontext, FLUX.1 Krea [dev]
Playground	Playground v2, Playground v3

UniDisc - model, which is capable of jointly processing text and images for a variety of downstream tasks
Reve Image 1.0 - a new model trained from the ground up to excel at prompt adherence, aesthetics, and typography
Frames - an image generation model offering unprecedented stylistic control, by Runway
Ideogram - AI tools that will make creative expression more accessible, fun, and efficient
Kolors - a large-scale text-to-image generation model based on latent diffusion, by the Kuaishou Kolors team
StoryDiffusion - Consistent Self-Attention for Long-Range Image and Video Generation
Ilus AI - AI illustration generator
Improving Diffusion Models for Authentic Virtual Try-on in the Wild - image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively
Distribution Matching Distillation - one-step generator achieves comparable image quality with StableDiffusion v1.5 while being 30x faster
Generative Powers of Ten - a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches
Delta Denoising Score - a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt
Prompt-to-Prompt - editing framework, where the edits are controlled by text only
OpenCLIP - an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training)
LEDITS - combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion
Würstchen - Fast Diffusion for Image Generation
ExactlyAI - create images in seconds with an AI that understands your style
ConceptLab - generative models have enabled us to transform our words into vibrant, captivating imagery
IP-Adapter - Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
MatchAI - a powerful web app that can copy the color grading from images so you can apply it to your own, by color.io
Picogen - nonofficial API to Midjourney AI, Stability AI and DALLE-2 AI
FABRIC - Feedback via Attention-Based Reference Image Conditioning - a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion
Controlling Text-to-Image Diffusion by Orthogonal Finetuning (OFT) - for adapting text-to-image diffusion models to downstream tasks
InstructPix2Pix Learning to Follow Image Editing Instructions - a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image
Composer - a large (5 billion parameters) controllable diffusion model trained on billions of (text, image) pairs. It can exponentially expand the control space through composition, leading to an enormous number of ways to generate and manipulate images, i.e., making the infinite use of finite means
GigaGAN: Large-scale GAN for Text-to-Image Synthesis - changing texture with prompting, changing style with prompting, by Adobe Research
AI Image Generator - Free AI-powered text-to-image image generator

Images

Qwen-Image-Edit - the image editing version of Qwen-Image
EvoVLM-JP - drops image models to generate Japan’s traditional ukiyo-e artwork, by Sakana
PaintsUndo - A Base Model of Drawing Behaviors in Digital Paintings
SkyReels - generate comics from stories or files you upload
PhotoMaker - Customizing Realistic Human Photos via Stacked ID Embedding
NSF - Neural Spline Fields for Burst Image Fusion and Layer Separation
Material Palette - a method to extract Physically-Based-Rendering (PBR) materials from a single real-world image
DiffusionLight - a simple yet effective technique to estimate lighting in a single input image
Magnific - the image Upscaler & Enhancer
wasitai - check if an image was generated by a machine
Textify - a tool for replacing the gibberish in AI-generated images with your desired text
Interpolating between Images with Diffusion Models - a method for zero-shot controllable interpolation using latent diffusion models
AnyDoor: Zero-shot Object-level Image Customization - a diffusion-based image generator with the power to move target objects to new scenes at user-specified locations in a harmonious way
Matting Anything, Code, Demo: HF - an efficient and versatile framework for estimating the alpha matte of any instance in an image with user-prompt guidance
Plug-and-Play, Code - a large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts
Real-Time Neural Appearance Models - a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use, by NVIDIA
Designer - generate stunning designs and original images just by typing what you want. Get writing assistance and automatic layout suggestions for anything you add. Designer expands preview with new AI design features, by Microsoft
Scribble Diffusion - turn your sketch into a refined image using AI
StudioGPT - a tool for reimagining an existing image
UnblurImage - remove blur from photos and achieve stunning clarity online for free

Watermarks

SynthID, SynthID Text - watermarks and identifies AI-generated content by embedding digital watermarks directly into AI-generated images, audio, text or video, by Google DeepMind and Hugging Face
Stable Signature - a new method for watermarking images, by MetaAI
ClixMagicAI - a professional AI-powered watermark removal tool
DeWatermark - remove Watermark from photos online free with AI
UnwatermarkAI - remove watermark from image and video with AI For Free, No Sign-up, No Ads

Computer Vision

DINOv2, DINOv3 - self-supervised learning for vision at unprecedented scale, by MetaAI
Depth-Anything - a depth estimation solution that can deal with any images under any circumstance
TAO-Amodal - benchmark is a dataset that includes amodal and modal bounding boxes for visible and occluded objects
OMG-Seg - One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation
PUG (Photorealistic Unreal Graphics) - 3 datasets for representation learning research
Tracking Anything in High Quality - a framework for high performance video object tracking and segmentation
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data - a new benchmark of synthetic image triplets that span a wide range of mid-level variations, labeled with human similarity judgments
CoTracker, CoTracker3 - an architecture that jointly tracks multiple points throughout an entire video, by MetaAI
TAPIR - a model for Tracking Any Point (TAP) that effectively tracks a query point in a video sequence, by Google DeepMind
DreamTeache - a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones, by NVIDIA
ImageBind, Demo, Code - Image->Audio, Audio->Image, Text->Image&Audio, Aidio&Image->Image, Audio->Generated Image, by MetaAI
I-JEPA, Code - Image Joint Embedding Predictive Architecture is a method for self-supervised learning. At a high level, I-JEPA predicts the representations of part of an image from the representations of other parts of the same image, by MetaAI
Visual Prompting - an innovative approach that takes text prompting, used in applications such as ChatGPT, to computer vision
Tracking Everything Everywhere All at Once - a new test-time optimization method for estimating dense and long-range motion from a video sequence
Track-Anything - a flexible and interactive tool for video object tracking and segmentation. It is developed upon Segment Anything, can specify anything to track and segment via user clicks only
EdgeSAM - an accelerated variant of the SAM, optimized for efficient execution on edge devices with minimal compromise in performance
EfficientSAM - light-weight SAM models that exhibit decent performance with largely reduced complexity, by MetaAI
SAM, SAM2 - Segment Anything Model is a new AI model that can "cut out" any object, in any image, with a single click. SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training, by MetaAI
Behind the Scenes: Density Fields for Single View Reconstruction - a neural network that predicts an implicit density field from a single image

Video & Animation

Matrix-Game 2.0 - an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion (high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS)
SkyReels V2 - the model weights and inference code for our infinite-length film generative models
Wan2.1, Wan2.2 - Effective MoE Architecture; Cinematic-level Aesthetics; Complex Motion Generation; Efficient High-Definition Hybrid TI2V
Veo - generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute; Veo2 - creates videos with realistic motion and high quality output, up to 4K; VideoFX - a new experimental tool designed to help support creatives through the storytelling journey, by Google
GEN-1 & Research, GEN-2 & Research, GEN-3-alpha & Research, Gen-4 & Research - a new frontier for high-fidelity, controllable video generation. It is a major improvement in fidelity, consistency, and motion over Gen-2, and a step towards building General World Models, by Runway
HunyuanVideo - A Systematic Framework For Large Video Generation Model
LTX-Video - real-time AI video generation, open-source model, by Lightricks
AutoVFX - Physically Realistic Video Editing from Natural Language Instructions
FacePoke - real-time facial animation
X-Portrait 2 - highly expressive portrait animation
Meta Movie Gen - our latest research breakthroughs demonstrate how you can use simple text inputs to produce custom videos and sounds, edit existing videos or transform your personal image into a unique video
Mochi 1 - an open-source model for generating high-quality videos from text prompts, by genmo
Haiper - simplifies video creation with text-to-video, image-to-video, and video enhancement options
Hailuo AI - Image-to-Video
Krea - generate images and videos (Luma, Runway, Kling, Hailuo, Pika) with a delightful AI-powered design tool
Pyramid Flow - a training-efficient Autoregressive Video Generation model based on Flow Matching
Videolulu - create engaging content in popular formats for TikTok, Instagram, and YouTube
GoVidify - an AI-powered tool that turns your written content into short-form videos for TikTok, YouTube, and Instagram
hotshot - a large-scale diffusion transformer model that serves as the foundation for our upcoming consumer product
ClipAnything - the first-ever multimodal AI clipping that lets you clip any moment from any video using visual, audio, and sentiment cues, by Opus
Text2Infographic - converts your written content into eye-catching infographics without any need for design skills
Flow Studio - uses AI to transform your text prompts into visually captivating short films and videos
LivePortrait - Efficient Portrait Animation with Stitching and Retargeting Control
Odyssey - Hollywood-grade visual AI
VideoPoet - a large language model for zero-shot video generation, by Google Reasearch
Character-1 - model allows you to create lip-synced videos to any audio from a still image; imagine worlds, characters and stories with complete creative control, by Hedra
Showrunner - AI platform designed to let you create an animated TV episode with just a prompt
Luma Dream Machine - an AI model that makes high quality, realistic videos fast from text and images, by Luma
Kling - video generation with enhanced features and quality
ToonCrafter - interpolate two cartoon images by leveraging the pre-trained image-to-video diffusion priors
VideoGigaGAN: Towards Detail-rich Video Super-Resolution - a generative VSR model that can produce videos with high-frequency details and temporal consistency, by Adobe Research
VASA-1 - Lifelike Audio-Driven Talking Faces Generated in Real Time, by Microdoft
MagicTime - Time-lapse Video Generation Models as Metamorphic Simulators
Stable Video Diffusion - a foundation model for generative video based on the image model Stable Diffusion
EMO - Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
SORA - a model (a latent diffusion model that learned to transform noise into videos using an encoder-decoder and transformer) that can create realistic and imaginative scenes from text instructions, by OpenAI
LUMIERE - A Space-Time Diffusion Model for Video Generation: Text-to-Video, Image-to-Video, Stylized Generation, Video Stylization, Cinemagraphs, Video Inpainting
ActAnywhere - Subject-Aware Video Background Generation
MagicVideo-V2 - integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline
I2VGen-XL - High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
StreamDiffusion - an innovative diffusion pipeline designed for real-time interactive generation
WALT - Window Attention Latent Transformer - a transformer-based method for latent video diffusion models (LVDMs)
Hotshot - GIF generator
Unscreen - remove video background
Motrica - technologies and tools for advanced character animation
CoDeF - Content Deformation Fields for Temporally Consistent Video Processing
MagicEdit - supports various editing applications, including video stylization, local editing, video-MagicMix and video outpainting
To Infinity and Beyond - an approach to generating high-quality episodic content for IP's (Intellectual Property) using LLMs, custom state-of-the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control
PlazmaPunk - create your own music video with the power of AI
Video-LLaMA, Code, Demo: HF - a multi-model LLM that achieves video-grounded conversations between humans and computers by connecting language decoder with off-the-shelf unimodal pre-trained models
AnimateDiff prompt travel - AnimateDiff with prompt travel + ControlNet + IP-Adapter
AnimateDiff, Code - Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Animate-A-Story - a video storytelling approach which can synthesize high-quality, structure-controlled, and character-controlled videos
Zeroscope - a watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output
Klap - a tool that analyzes the video and finds short clips
Lalamu - low-quality video lip sync with preselected videos/video templates (take clips from videos, give the video new audio, and then the lips will sync up to that new audio within the video)
D-ID - uses generative AI to create customized videos featuring talking avatars at a touch of a button for businesses and creators.
Rooms.xyz - create & remix interactive rooms from your browser
Wonder Dynamics - an AI tool that automatically animates, lights, and composes CG characters into a live-action scene
REVELxyz - a tool for creating Animated Avatars from a single photo
ANIMATED DRAWINGS - a tool that brings children's drawings to life, by animating characters to move around, by MetaAI
RERENDER A VIDEO, Demo: HF - a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos
Roop, Code - take a video and replace the face in it with a face of your choice. You only need one image of the desired face
Text2Performer - Text-Driven Human Video Generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer
DragGAN, Code, Demo: HF - way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc
DragDiffusion - Harnessing Diffusion Models for Interactive Point-based Image Editing
In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition - our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction
High-Resolution Video Synthesis with Latent Diffusion Models - Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space, by NVIDIA

3D

Stable Point Aware 3D - real-time editing and complete structure generation of a 3D object from a single image, by by Stability AI
backflip - AI 3D design tools for the physical world
cadwithai - a tool that allows users to create and edit CAD models using an AI chatbot to enhance efficiency and creativity in design work
Meshy - create stunning 3D models with AI
Generative 3D API Toolkit - generate 3D models, materials, and HDRIs at the speed of your imagination. Supercharge your 3D workflow with our groundbreaking Gen3D toolkit from Shutterstock powered by NVIDIA
Stable Fast 3D - generates high-quality 3D assets from a single image, by Stability AI
Stable Video 4D - a single object video into multiple novel-view videos of eight different angles/views, by Stability AI
VGGHeads - A Large-Scale Synthetic Dataset for 3D Human Heads
CharacterGen- Efficient 3D Character Generation from Single Images with Multi-View Pose Calibration
3D Gen - fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minut, by MetaAI
InstantMesh - Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Spline - Generate 3D objects from text prompts and images
SIMA - a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings
Stable Video 3D - Quality Novel View Synthesis and 3D Generation from Single Images, by Stability AI
TripoSR - Fast 3D Object Generation from Single Images, by Stability AI
BlendNeRF - 3D-aware Blending with Generative NeRFs
4DGen - Grounded 4D Content Generation with Spatial-tempsoral Consistency
MobileBrick - Building LEGO for 3D Reconstruction on Mobile Devices. A novel data capturing and 3D annotation pipeline to obtain precise 3D ground-truth shapes without relying on expensive 3D scanners
PoseGPT - Chatting about 3D Human Pose
ProlificDreamer - High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
Stable Zero123 - 3D Object Generation from Single Images, by Stability AI
SMERF - Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
DreamCraft3D - a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects
Genie - 3D fundational model, by Lumalabs
Masterpiece X - the generative text-to-3D app that allows users to create 3D objects and characters complete with mesh, texture, and animations
GAUSSIAN SPLAT - a rasterization technique for 3D reconstruction and rendering
SyncDreamer - generating multiview-consistent images from a single-view image
MAV3D (Make-A-Video3D) - a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model
HiFA - High-fidelity Text-to-3D with Advanced Diffusion Guidance
AutoRecon - a framework named for the automated discovery and reconstruction of an object from multi-view images
BITE - enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground
CSM (Common Sense Machines) - generate your own textured 3D assets
MotionGPT: Human Motion as Foreign Language - a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° - the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360° with diverse appearance and detailed geometry using only in-the-wild unstructured images for training
AvatarBooth - a text-to-3D model. It creates an animatable 3D model with your word description. Also, it can generate customized model with 4~6 photos from your phone or a character design generated from diffusion model
Infinigen, Code - a procedural generator of 3D scenes, creating depth maps and labeling every aspect of the world it generates, by Princeton Vision & Learning Lab
USD - Universal Scene Description - an open and extensible framework and ecosystem for describing, composing, simulating and collaborating within 3D worlds, originally developed by Pixar Animation Studios
Shap-E: Demo, Code - a conditional generative model for 3D assets, by OpenAI
Neural Kernel Surface Reconstruction, Code - a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point, by NVIDIA
Neuralangelo - a framework for high-fidelity 3D surface reconstruction from RGB video captures. Using ubiquitous mobile devices, we enable users to create digital twins of both object-centric and large-scale real-world scenes with highly detailed 3D geometry, by NVIDIA
Rodin Diffusion - a Generative Model for Sculpting 3D Digital Avatars, by Microsoft
3D Gaussian Splatting for Real-Time Radiance Field Rendering - three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution
ConsistentNeRF - a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels
Text2NeRF - a text-driven 3D scene generation framework, combines the neural radiance field (NeRF) and a pre-trained text-to-image diffusion model to generate diverse view-consistent indoor and outdoor 3D scenes from natural language descriptions
Zip-NeRF - a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP
S-NeRF - a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly
Mip-NeRF 360 - Unbounded Anti-Aliased Neural Radiance Fields, an extension of mip-NeRF that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes
3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model synthesizes a photo from different viewpoints
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior - can create high-fidelity 3D content from only a single image
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - generates textured 3D meshes from a given text prompt using 2D text-to-image models
Objaverse-XL - an open dataset of over 10 million 3D objects
OmniObject3D - a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world

Audio & Speech & Music

MetaAI

Spirit LM - a foundation multimodal language model that freely mixes text and speech
Audiobox - generate voices and sound effects using a combination of voice inputs and natural language text prompts — making it easy to create custom audio for a wide range of use cases
Seamless - system that unlocks expressive cross-lingual communication in real time
SeamlessM4T - a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text: automatic speech recognition, speech-to-text and speech-to-speech translation, text-to-text and text-to-speech translation
AudioCraft - simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls
- MusicGen, Demo: HF, Code - a simple and controllable model for music generation
- AudioGen - an auto-regressive generative model that generates audio samples conditioned on text inputs
- EnCodec - a neural network that is trained end to end to reconstruct the input signal
MuAViC - a Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Voicebox - Text-Guided Multilingual Universal Speech Generation at Scale

Google

Music AI Sandbox - an app that generates and modifies music according to text prompts, now accepts lyrics to generate songs as well as instrumental music, powered by Lyria2
Lyria, Lyria2 - AI music generation model that qdelivers high-fidelity music and professional-grade audio, capturing subtle nuances across a range of genres and intricate compositions
V2A - video-to-audio research uses video pixels and text prompts to generate rich soundtracks
MusicFX - a new experimental tool that enables users to generate their own music using AI
SingSong - a system which generates instrumental music to accompany input vocals
Translatotron 3 - unsupervised speech-to-speech translation from monolingual data
AudioPaLM - a LLM for speech understanding and generation
MusicLM, Demo - a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff"
Universal Speech Model (USM) - a state-of-the-art speech AI for 100+ languages

Eleven Labs

SFX v2 - generate any sound imaginable from a text prompt
Eleven Music - generate studio-quality tracks instantly, any genre, any style, vocals or instrumental, in minutes using simple text prompts
11ai - a personal AI voice assistant, built with ElevenLabs Conversational AI
Flash - a newest model that generates speech in 75ms + application & network latency
xtovoice - analyze your X profile to generate a unique voice using ElevenLabs
Sound Effects - create distinctive sound effects directly from text descriptions, streamlining your audio production process
Dubbing Studio - a tool, enabling automatic, end-to-end video translation across 29 languages. hands-on control over transcript, translation, timing, and more
Speech to Speech - a tool that lets you turn the recording of one voice to sound as if spoken by another
Eleven Multilingual v2 - a Foundational AI Speech Model for Nearly 30 Languages
Eleven Multilingual v1, Demo - generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there
AI Speech Classifier, Demo - detect whether an audio clip was created using ElevenLab

Other

Stable Audio, Stable Audio 2.0, Stable Audio 2.5 - audio generation model designed specifically for enterprise-grade sound production, by Stability AI
Stable Audio Open - an open source text-to-audio model for generating up to 47 seconds of samples and sound effects, by Stability AI
Voxtral - state‑of‑the‑art speech understanding models are available in two sizes—a 24B variant for production-scale applications and a 3B variant for local and edge deployments, by Mistral
chatterbox - SoTA open-source TTS, by Resemble AI
dia - a 1.6B parameter text to speech model capable of generating ultra-realistic dialogue in one pass, by Nari Labs
Amazon Nova Sonic - a state-of-the-art speech-to-speech model that delivers real-time, human-like voice conversations with industry-leading price performance and low latency
MoshiVis - an open-source Vision Speech Model (VSM) with the same low-latency and natural conversation skills as Moshi, with the additional ability to discuss visual inputs
Conversational Speech Model (CSM) - an end-to-end multimodal learning system designed to generate more natural and contextually appropriate AI speech, by Sesame
Di♪♪Rhythm - Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
YuE - Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
Riffusion is an AI-powered music generator that turns text into sound
GitPodcast - turn any GitHub repository into an engaging podcast in seconds
Qwen2-Audio - capable of accepting audio and text inputs and generating text outputs
Neutone Morpho - pre-trained AI models you can transform any incoming audio into the characteristics, or “style”, of the sounds that the model is based on
Lazybird - AI-powered voice over generator – perfect for videos, podcasts, audiobooks, and educational content
AI Jukebox - a free in-browser text-to-music generation tool
Chatter - an interactive podcast, by Hume
OpenVoice, OpenVoice2 - a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages
Voice Engine - a model for creating custom voices, by OpenAI
Udio - discover, create, and share music with the world
Image to SFX - compare sound effects generation models from image caption
DubbingAI - AI tool can convert your voice into high-quality cloned voices—from celebrities to your favorite gaming characters—in real time
StockMusic - a platform for AI-generated tunes that allows you to generate up to 10 minutes of copyright-free music
RIFFUSION - the model to generate images of spectrograms and can then be converted to an audio clip
CLAP - you can extract a latent representation of any given audio and text for your own model, or for different downstream task
Vscoped - effortlessly transcribe your video content to boost click-through rates and watch time
MERT, Code, Demo: HF - an Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Ecoute - a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation
SadTalker: Demo - Stylized Audio-Driven Single Image Talking Face Animation
Recast - turn your want-to-read articles into rich audio summaries
AudioGPT, Demo: HuggingFace, Code - Understanding and Generating Speech, Music, Sound, and Talking Head
Chirp - music model, generates realistic audio - including speech, music and sound effects
Bark - a transformer-based text-to-audio model created, by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communication like laughing, sighing and crying
Whisper - an automatic speech recognition (ASR) system, that approaches human level robustness and accuracy on English speech recognition
Musicfy - music like you've never heard. Create and discover AI covers of your favorite songs
Jukebox - a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles, by OpenAI
Koe Recast - transform your voice using AI

Code & Math

	Code	Math
Mistral AI	Codestral, Codestral Mamba, Codestral 25.01, Devstral, Mistral Code	MathΣtral
Stablility AI	StableCode, Stable Code 3B, Stable Code Instruct 3B
Google DeepMind		FunSearch, alphageometry
Salesforce	CodeT5 & CodeT5+, CodeGen2.5
Alibaba Cloud	CodeQwen1.5, Qwen2.5-Coder, Qwen3-Coder	Qwen2-Math, Qwen2.5-Math
DeepSeek		DeepSeek-Prover-V2

Opal - an experimental tool that lets you build and share powerful AI mini apps that chain together prompts, models, and tools, by Google
Codex - a cloud-based software engineering agent that can work on many tasks in parallel, powered by codex-1; codex-cli - lightweight coding agent that runs in your terminal, by OpenAI
DeepCoder, GitHub - a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL, by Agentica team and Together AI
AlphaEvolve - a Gemini-powered an agentic system that used LLMs to generate code in an evolutionary process, by Google DeepMind
SWE-1 - a family of models optimized for the entire software engineering process, not just the task of coding, by Windsurf
DeepWiki - understand unfamiliar codebases by automatically generating architecture diagrams, documentation, and source code links for public GitHub repositories, by Devin
Devin, Devin 2.0 - a new agent-native IDE experience for working with Devin
Sleek - tool that generate a sleek landing page with AI in minutes
MathGPT - simplifies math problems with step-by-step solutions
neo - a fully autonomous Machine Learning Engineer
bolt.new - prompt, run, edit, and deploy full-stack web apps
Genie - AI software engineer - achieving a 30% eval score on the industry standard benchmark SWE-Bench. Genie is a fine-tuned version of GPT-4o with a larger context window of undisclosed size. Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot
The AI Scientist - Towards Fully Automated Open-Ended Scientific Discovery
Dracarys - a new family of open LLMs for coding, by Abacus.AI
MathPile - a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens
magicoder - a model family empowered by OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code
LearnLM - a family of models fine-tuned for learning, and grounded in educational research to make teaching and learning experiences more active, personal and engaging, by Google
Llemma - an open language model for mathematics (repository also contains submodules related to the overlap, fine-tuning, and theorem proving experiments described in the paper)
AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems
sketch-2-app - generate code based on sketch
GPT Pilot - a true AI developer that writes code, debugs it, talks to you when it needs help, etc
MAmmoTH - a series of open-source LLMs specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset
WrenAI - open-source Text-to-SQL solutionf or data teams to get results and insights faster by asking business questions without writing SQL
Defog - a state-of-the-art LLM for converting natural language questions to SQL queries, which outperforms major open-source models and slightly outperforms gpt-3
v0 - a generative user interface system. It generates copy-and-paste friendly React code based on Shadcn UI and Tailwind CSS that people can use in their projects, by Vercel Labs
SafeCoder - a code assistant solution built for the enterprise. In marketing speak: “your own on-prem GitHub copilot”, by Hugging Face
Code Llama - a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts, by MetaAI
Teaching Arithmetic to Small Transformers - small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective
InterCode - framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations
LeanDojo - set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research
GPT Engineer - is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt
CodeTF - a one-stop Python transformer-based library for code large language models (Code LLMs) and code intelligence, provides a seamless interface for training and inferencing on code intelligence tasks like code summarization, translation, code generation and so on. It aims to facilitate easy integration of SOTA CodeLLMs into real-world applications
Let’s Verify Step by Step - a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”), by OpenAI
🦍 Gorilla: LLM Connected with Massive APIs - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls
Framer - a tool that constructs a completely unique website for you based on a text prompt
Pico - a tool that use GPT4 to instantly build simple, shareable web apps
dropbase - uild and prototype web apps faster with AI

Games

Genie, Genie 2, Genie 3 - a general purpose world model that can generate an unprecedented diversity of interactive environment, by Google DeepMind
Muse - the first World and Human Action Model (WHAM), a generative AI model of a video game that can generate game visuals, controller actions, or both, by Microsoft
GenChess - turns your ideas into playable art pieces using Google’s Imagen 3 model
ExistAI - games from text
PokemonRedExperiments - train RL agents to play Pokemon Red
BitMagic - game creation
AI Town - a deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize
Generative Agents: Interactive Simulacra of Human Behavior - contains our core simulation module for generative agents—computational agents that simulate believable human behaviors—and their game environment
STEVE-1 - a Generative Model for Text-to-Behavior in Minecraft
Mastering Stratego - DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself
Voyager: An Open-Ended Embodied Agent with LLMs - the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention

Robotics

V-JEPA 2, V-JEPA - a new world model that achieves state-of-the art visual understanding and prediction in the physical world, improving the physical reasoning of AI agents, by MetaAI
Gemini Robotics - Gemini 2.0-based model designed for robotics, by Google DeepMind
Helix - a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics
ASAP - Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills
π0, Open Sourcing π0 - a machine learning system that enables robots to perform housekeeping tasks that require high coordination and dexterity, like folding clothes and cleaning tables, by Physical Intelligence (π)
Genesis - a comprehensive physics simulation platform designed for general purpose Robotics, Embodied AI, & Physical AI applications
Open-TeleVision - an open-sourced immersive teleoperation system with stereo visual feedback. Robots executing highly precise, extremely long-horizon tasks with high success rate, autonomously
unitree_il_lerobot - open-source project is a modification of the LeRobot open-source training framework, enabling the training and testing of data collected using the dual-arm dexterous hands of Unitree's G1 robot
LeRobot - aims to provide models, datasets, and tools for real-world robotics in PyTorch
DrEurek - Language Model Guided Sim-To-Real Transfer
UniSim - a real-world simulator range from controllable content creation in games and movies to training embodied agents purely in simulation that can be directly deployed in the real world
JAT (Jack of All Trades) - a transformer-based agent capable of playing video games, controlling a robot to perform a wide variety of tasks, understanding and executing commands in a simple navigation environment
Dobb·E - an open-source, general framework for learning household robotic manipulation
OpenEQA - from word models to world models, by MetaAI
Mobile ALOHA - Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, by Stanford
AutoRT, SARA-RT and RT-Trajectory - by Google DeepMind
Robot Parkour Learning - a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data
Open X-Embodiment - Robotic Learning Datasets and RT-X Models
Eureka - a human-level reward design algorithm powered by LLMs, by NVIDIA
Language to rewards for robotic skill synthesis - an approach to teaching robots novel actions through natural language input is proposed, using reward functions as an interface to bridge the gap between language and low-level robot actions
VIMA - General Robot Manipulation with Multimodal Prompts
RT-2 - a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, by Google DeepMind
Robots That Ask For Help - a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed
ViNT: A Foundation Model for Visual Navigation - a goal-conditioned navigation policy trained on diverse, cross-embodiment training data, and can control many different robots in zero-shot
Navigating to Objects in the Real World -
RVT: Robotic View Transformer - a multi-view transformer for 3D manipulation that is both scalable and accurate. RVT takes camera images and task language description as inputs and predicts the gripper pose action, by NVIDIA
TidyBot - personalized Robot Assistance with Large Language Models
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning - by OP3 Soccer Team, by Google DeepMind
PaLM-E: An Embodied Multimodal Language Model - embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts
Scaling Robot Learning with Semantically Imagined Experience -
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware - low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface

Typography

GenType - make an alphabet out of anything, by Google
Fontjoy - uses deep learning algorithms to suggest font pairings that balance style and readability
ControlNet, Demo: HF, How to make a QR code with Stable Diffusion - QR Code Conditioned ControlNet Models for Stable Diffusion. They provide a solid foundation for generating QR code-based artwork that is aesthetically pleasing, while still maintaining the integral QR code shape
Word-As-Image for Semantic Typography - A few examples of our Word-As-Image illustrations in various fonts and for different textual concept. The semantically adjusted letters are created completely automatically using our method, and can then be used for further creative design as we illustrate here
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion - create artistic typography automatically, a novel method to automatically generate artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable

Bio & Med

AlphaGenome - an AI tool that more comprehensively and accurately predicts how single variants or mutations in human DNA sequences impact a wide range of biological processes regulating genes, by Google
TxGemma - a LLM designed to improve the efficiency of therapeutic development, from identifying promising targets to helping predict clinical trial outcomes, by Google
DolphinGemma - a LLM is helping scientists study how dolphins communicate — and hopefully find out what they're saying, by Google
AI co-scientist - a multi-agent AI system built with Gemini 2.0 as a virtual scientific collaborator to help scientists generate novel hypotheses and research proposals, and to accelerate the clock speed of scientific and biomedical discoveries, by Google
BioEmu-1 - exploring the structural changes driving protein function
AlphaFold 3, Code - an AI model that predict the structure of proteins, DNA, RNA, ligands and more, and how they interact, by Google DeepMind and Isomorphic Labs
AMIE - a research AI system for diagnostic medical reasoning and conversations, by Google
MentalLLaMA - mental health analysis with LLMs
AlphaMissense - an AI model classifying missense variants to help pinpoint the cause of diseases, by Google DeepMind
meditron - a suite of open-source medical LLM adapted to the medical domain from Llama-2 through continued pretraining on a comprehensively curated medical corpus, including selected PubMed papers and abstracts, a new dataset of internationally-recognized medical guidelines, and a general domain corpus
evodiff - combines evolutionary-scale data with diffusion models for controllable protein sequence generation
SAM-Med2D - applying the Segment Anything Model (SAM) to medical 2D images
Med-Flamingo - a medical vision-language model with multimodal in-context learning abilities
Brain2Music - Reconstructing Music from Human Brain Activity
Seeing the World through Your Eyes - reconstruct a 3D scene beyond the camera's line-of-sight using portrait images containing eye reflections
Mind-Video - High-quality Video Reconstruction from Brain Activity
Med-PaLM - a large language model (LLM) designed to provide high-quality answers to medical questions
PMC-LLaMA - the official codes for "PMC-LLaMA: Continue Training LLaMA on Medical Papers"

Science

AlphaEarth Foundations - an artificial intelligence (AI) model that functions like a virtual satellite, by Google DeepMind
MatterGen - a generative AI tool that tackles materials discovery from a different angle. Instead of screening the candidates, it directly generates novel materials given prompts of the design requirements for an application, by Microsoft
GNoME - DL tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials, by Google DeepMind
AlphaQubit - AI system accurately identifies errors inside quantum computers, helping to make this new technology more reliable, by Google DeepMind

Climat

Weather Lab, app - experimental cyclone predictions, by Google DeepMind
Planet Parasol - simulating the impacts of Stratospheric Aerosol Injection (SAI) deployment
GraphCast - AI model for faster and more accurate global weather forecasting, by Google DeepMind
OpenDAC - a research project aimed at significantly reducing the cost of Direct Air Capture (DAC), by FAIR at Meta and Georgia Tech
MetNet-3 - the first AI weather model to learn from sparse observations and outperform the top operational systems up to 24 hours ahead at high resolutions. A portion of its forecasts are now available across various Google products, by Google
ClimaX A foundation model for weather and climate - a flexible and generalizable deep learning model for weather and climate science. Introducing ClimaX: The first foundation model for weather and climate

Military

AIP Pillars - activate LLMs and other AI on your private network, subject to full control
GeoSpy - upload satellite or aerial images, and GeoSpy’s AI examines visual details like landmarks, terrain features, and vegetation patterns to provide precise location predictions

Other: Fin, Presentation

AI Sheets - open-source tool for building, enriching, and transforming datasets using AI models with no code, by Hugging Face
AI Spreadsheet - analyze data and complete tasks in seconds with Sourcetable's Excel & data analyst
TacticAI - an AI assistant for football tactics, by Google DeepMind
FactSnap - reliable fact-checking companion. Verify information while browsing the web with the Chrome extension
Bricks - an AI-powered tool that generates reports, visuals, and presentations from your data
Learn About - generates tailored educational experiences based on user questions and uploaded materials
Hautech AI - an AI platform that turns simple clothing images into professional-grade fashion photos
Atlas - a school AI assistant that provides personalized help by studying your specific class materials
Ollie - delivers new and favorite recipes written just for you each week
Food Mood - a fusion recipe generator powered, by Google
FinGPT - an open-source financial LLMs
guidde - create documentation/presentation/FAQ from captured video
Gamma - create visually appealing presentations
Tome - create a compelling starting point for your presentation in minutes

For Tasks:

Click tags to check more tools for each tasks

generate music from text create 3d objects from images train rl agents for games solve mathematical problems assist in medical diagnostics

For Jobs:

ai researcher software developer data scientist machine learning engineer natural language processing specialist

Alternative AI tools for llms-tools

Similar Open Source Tools

llms-tools

github

: 278

awesome-openvino

Awesome OpenVINO is a curated list of AI projects based on the OpenVINO toolkit, offering a rich assortment of projects, libraries, and tutorials covering various topics like model optimization, deployment, and real-world applications across industries. It serves as a valuable resource continuously updated to maximize the potential of OpenVINO in projects, featuring projects like Stable Diffusion web UI, Visioncom, FastSD CPU, OpenVINO AI Plugins for GIMP, and more.

github

: 87

AI4Animation

AI4Animation is a comprehensive framework for data-driven character animation, including data processing, neural network training, and runtime control, developed in Unity3D/PyTorch. It explores deep learning opportunities for character animation, covering biped and quadruped locomotion, character-scene interactions, sports and fighting games, and embodied avatar motions in AR/VR. The research focuses on generative frameworks, codebook matching, periodic autoencoders, animation layering, local motion phases, and neural state machines for character control and animation.

github

: 7.5k

daily-ai-papers

github

: 87

awesome-RLAIF

Reinforcement Learning from AI Feedback (RLAIF) is a concept that describes a type of machine learning approach where **an AI agent learns by receiving feedback or guidance from another AI system**. This concept is closely related to the field of Reinforcement Learning (RL), which is a type of machine learning where an agent learns to make a sequence of decisions in an environment to maximize a cumulative reward. In traditional RL, an agent interacts with an environment and receives feedback in the form of rewards or penalties based on the actions it takes. It learns to improve its decision-making over time to achieve its goals. In the context of Reinforcement Learning from AI Feedback, the AI agent still aims to learn optimal behavior through interactions, but **the feedback comes from another AI system rather than from the environment or human evaluators**. This can be **particularly useful in situations where it may be challenging to define clear reward functions or when it is more efficient to use another AI system to provide guidance**. The feedback from the AI system can take various forms, such as: - **Demonstrations** : The AI system provides demonstrations of desired behavior, and the learning agent tries to imitate these demonstrations. - **Comparison Data** : The AI system ranks or compares different actions taken by the learning agent, helping it to understand which actions are better or worse. - **Reward Shaping** : The AI system provides additional reward signals to guide the learning agent's behavior, supplementing the rewards from the environment. This approach is often used in scenarios where the RL agent needs to learn from **limited human or expert feedback or when the reward signal from the environment is sparse or unclear**. It can also be used to **accelerate the learning process and make RL more sample-efficient**. Reinforcement Learning from AI Feedback is an area of ongoing research and has applications in various domains, including robotics, autonomous vehicles, and game playing, among others.

github

: 64

foundations-of-gen-ai

This repository contains code for the O'Reilly Live Online Training for 'Transformer Architectures for Generative AI'. The course provides a deep understanding of transformer architectures and their impact on natural language processing (NLP) and vision tasks. Participants learn to harness transformers to tackle problems in text, image, and multimodal AI through theory and practical exercises.

github

: 74

awesome-llms-fine-tuning

This repository is a curated collection of resources for fine-tuning Large Language Models (LLMs) like GPT, BERT, RoBERTa, and their variants. It includes tutorials, papers, tools, frameworks, and best practices to aid researchers, data scientists, and machine learning practitioners in adapting pre-trained models to specific tasks and domains. The resources cover a wide range of topics related to fine-tuning LLMs, providing valuable insights and guidelines to streamline the process and enhance model performance.

github

: 119

awesome-generative-ai-guide

This repository serves as a comprehensive hub for updates on generative AI research, interview materials, notebooks, and more. It includes monthly best GenAI papers list, interview resources, free courses, and code repositories/notebooks for developing generative AI applications. The repository is regularly updated with the latest additions to keep users informed and engaged in the field of generative AI.

github

: 4.5k

llm-course

The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑‍🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | ![Open In Colab](img/colab.svg) | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | ![Open In Colab](img/colab.svg) | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | ![Open In Colab](img/colab.svg) | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | ![Open In Colab](img/colab.svg) | | 🌳 Model Family Tree | Visualize the family tree of merged models. | ![Open In Colab](img/colab.svg) | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | ![Open In Colab](img/colab.svg) |

github

: 42.1k

Here-Comes-the-AI-Worm

Large Language Models (LLMs) are now embedded in everyday tools like email assistants, chat apps, and productivity software. This project introduces DonkeyRail, a lightweight guardrail that detects and blocks malicious self-replicating prompts known as RAGworm within GenAI-powered applications. The guardrail is fast, accurate, and practical for real-world GenAI systems, preventing activities like spam, phishing campaigns, and data leaks.

github

: 205

ManipVQA

ManipVQA is a framework that enhances Multimodal Large Language Models (MLLMs) with manipulation-centric knowledge through a Visual Question-Answering (VQA) format. It addresses the deficiency of conventional MLLMs in understanding affordances and physical concepts crucial for manipulation tasks. By infusing robotics-specific knowledge, including tool detection, affordance recognition, and physical concept comprehension, ManipVQA improves the performance of robots in manipulation tasks. The framework involves fine-tuning MLLMs with a curated dataset of interactive objects, enabling robots to understand and execute natural language instructions more effectively.

github

: 51

nlp-llms-resources

The 'nlp-llms-resources' repository is a comprehensive resource list for Natural Language Processing (NLP) and Large Language Models (LLMs). It covers a wide range of topics including traditional NLP datasets, data acquisition, libraries for NLP, neural networks, sentiment analysis, optical character recognition, information extraction, semantics, topic modeling, multilingual NLP, domain-specific LLMs, vector databases, ethics, costing, books, courses, surveys, aggregators, newsletters, papers, conferences, and societies. The repository provides valuable information and resources for individuals interested in NLP and LLMs.

github

: 82

ML-news-of-the-week

github

: 129

CodeFuse-muAgent

CodeFuse-muAgent is a Multi-Agent framework designed to streamline Standard Operating Procedure (SOP) orchestration for agents. It integrates toolkits, code libraries, knowledge bases, and sandbox environments for rapid construction of complex Multi-Agent interactive applications. The framework enables efficient execution and handling of multi-layered and multi-dimensional tasks.

github

: 181

oreilly-hands-on-gpt-llm

This repository contains code for the O'Reilly Live Online Training for Deploying GPT & LLMs. Learn how to use GPT-4, ChatGPT, OpenAI embeddings, and other large language models to build applications for experimenting and production. Gain practical experience in building applications like text generation, summarization, question answering, and more. Explore alternative generative models such as Cohere and GPT-J. Understand prompt engineering, context stuffing, and few-shot learning to maximize the potential of GPT-like models. Focus on deploying models in production with best practices and debugging techniques. By the end of the training, you will have the skills to start building applications with GPT and other large language models.

github

: 113

SuperKnowa

SuperKnowa is a fast framework to build Enterprise RAG (Retriever Augmented Generation) Pipelines at Scale, powered by watsonx. It accelerates Enterprise Generative AI applications to get prod-ready solutions quickly on private data. The framework provides pluggable components for tackling various Generative AI use cases using Large Language Models (LLMs), allowing users to assemble building blocks to address challenges in AI-driven text generation. SuperKnowa is battle-tested from 1M to 200M private knowledge base & scaled to billions of retriever tokens.

github

: 98

For similar tasks

llms-tools

github

: 278

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675