models

CLI and TUI for browsing AI models, benchmarks, and coding agents. Compare 2000+ models across 85+ providers, explore ~400 benchmark entries from Artificial Analysis, and track 12+ coding agents with version detection.

Stars: 63

Visit

A fast CLI and TUI for browsing AI models, benchmarks, and coding agents. Browse 2000+ models across 85+ providers from models.dev. Track AI coding assistants with version detection and GitHub integration. Compare model performance across 15+ benchmarks from Artificial Analysis. Features CLI commands, interactive TUI, cross-provider search, copy to clipboard, JSON output. Includes curated catalog of AI coding assistants, auto-updating benchmark data, per-model open weights detection, and detail panel for benchmarks. Supports customization of tracked agents and quick sorting of benchmarks. Utilizes data from models.dev, Artificial Analysis, curated catalog in data/agents.json, and GitHub API.

README:

models

A fast CLI and TUI for browsing AI models, benchmarks, and coding agents.

Models Tab: Browse 2000+ models across 85+ providers from models.dev, categorized by type (Origin, Cloud, Inference, Gateway, Dev Tool)
Agents Tab: Track AI coding assistants (Claude Code, Aider, Cursor, etc.) with version detection and GitHub integration
Benchmarks Tab: Compare model performance across 15+ benchmarks from Artificial Analysis, with creator filtering by source, region, and type

What's New (v0.8.8)

Improved Open Weights Matching

91% match rate — three-stage matching pipeline using Jaro-Winkler similarity (strsim) to determine open/closed status per model
Global fallback — when creator-scoped matching fails, searches all models.dev providers for the best slug match
Known creator overrides — hardcoded open/closed status for 12 well-known creators absent from models.dev (IBM, AI2, TII, etc.)
No more "Mixed" labels — removed CreatorOpenness fallback; unmatched models show an em dash instead of misleading labels

v0.8.7: Benchmark Data Freshness

jsDelivr cache purging — GitHub Action now purges CDN cache after committing new data for faster propagation
No disk cache — benchmark data fetched fresh from CDN on every launch for simplicity

v0.8.6: Cost Sorting & Open Weights

Price sort columns — sort benchmarks by input, output, or blended price per million tokens via [s] cycle
Per-model source detection — runtime matching of AA entries against models.dev data
Source filter — [4] cycles through All / Open / Closed
Region and type grouping — [5] and [6] toggle grouped layout with colored section headers

v0.8.5: Release Profile

Optimized release binary — strip, LTO, single codegen unit, panic=abort (~6MB, down from ~11MB)

v0.8.0–0.8.4: Benchmarks Tab

Dedicated Benchmarks tab — browse ~400 model entries from Artificial Analysis with quality, speed, and pricing data
Creator sidebar with 40+ creators, classified by region and type with grouping toggles
Quick-sort keys — [1] Intelligence, [2] Date, [3] Speed — press again to flip direction
Dynamic columns, detail panel, TTFAT, AIME benchmarks and more

Other

Provider categories — filter and group providers by type (Origin, Cloud, Inference, Gateway, Dev Tool)
OpenClaw agent added to the agents catalog
Responsive layouts — models tab detail panel scales with terminal height

Features

Models Tab

CLI commands for scripting and quick lookups
Interactive TUI for browsing and comparing models
Provider categories — filter and group providers by type (Origin, Cloud, Inference, Gateway, Dev Tool)
Cross-provider search to compare the same model across different providers
Copy to clipboard with a single keypress
JSON output for scripting and automation

Agents Tab

Curated catalog of 12+ AI coding assistants
Version detection — automatically detects installed agents
GitHub integration — stars, releases, changelogs, update availability
Persistent cache — instant startup with ETag-based conditional fetching
Customizable tracking — choose which agents to monitor

Benchmarks Tab

~400 benchmark entries from Artificial Analysis with quality, speed, and pricing scores
Auto-updating — data fetched fresh from CDN on every launch; GitHub Action refreshes source data every 6 hours
Creator sidebar with 40+ creators — group by region or type with colored section headers
Per-model open weights detection — runtime matching against models.dev, with source filter toggle
Quick-sort keys — instantly sort by Intelligence, Date, or Speed
Dynamic columns — list columns adapt to show the most relevant benchmarks for the active sort
Detail panel — full benchmark breakdown with indexes, scores, performance, and pricing

Installation

Cargo (from crates.io)

cargo install modelsdev

Homebrew (macOS/Linux)

brew install arimxyer/tap/models

Scoop (Windows)

scoop bucket add arimxyer https://github.com/arimxyer/scoop-bucket
scoop install models

Pre-built binaries

Download the latest release for your platform from GitHub Releases.

Build from source

git clone https://github.com/arimxyer/models
cd models
cargo build --release
./target/release/models

Usage

TUI (Interactive Browser)

Just run models with no arguments to launch the interactive browser:

models

TUI Keybindings

Global

Key	Action
`]` / `[`	Switch tabs (Models / Agents / Benchmarks)
`?`	Show context-aware help
`q`	Quit

Navigation

Key	Action
`j` / `↓`	Move down
`k` / `↑`	Move up
`g`	Jump to first item
`G`	Jump to last item
`Ctrl+d` / `PageDown`	Page down
`Ctrl+u` / `PageUp`	Page up
`Tab` / `Shift+Tab`	Switch panels
`←` / `→`	Switch panels

Search

Key	Action
`/`	Enter search mode
`Enter` / `Esc`	Exit search mode
`Esc`	Clear search (in normal mode)

Models Tab

Filters & Sort

Key	Action
`s`	Cycle sort (name → date → cost → context)
`1`	Toggle reasoning filter
`2`	Toggle tools filter
`3`	Toggle open weights filter
`4`	Cycle provider category filter (All → Origin → Cloud → Inference → Gateway → Tool)
`5`	Toggle category grouping

Copy & Open

Key	Action
`c`	Copy `provider/model-id`
`C`	Copy `model-id` only
`o`	Open provider docs in browser
`D`	Copy provider docs URL
`A`	Copy provider API URL

Agents Tab

Filters & Sort

Key	Action
`s`	Cycle sort (name → updated → stars → status)
`1`	Toggle installed filter
`2`	Toggle CLI tools filter
`3`	Toggle open source filter

Actions

Key	Action
`a`	Open tracked agents picker
`o`	Open docs in browser
`r`	Open GitHub repo
`c`	Copy agent name

Customizing Tracked Agents

By default, models tracks 4 popular agents: Claude Code, Codex, Gemini CLI, and OpenCode.

Press a in the Agents tab to open the picker and customize which agents you track. Your preferences are saved to ~/.config/models/config.toml.

You can also add custom agents not in the catalog:

# ~/.config/models/config.toml
[[agents.custom]]
name = "My Agent"
repo = "owner/repo"
binary = "my-agent"
version_command = ["--version"]

See Custom Agents for the full reference.

Benchmarks Tab

Quick Sort (press again to toggle direction)

Key	Action
`1`	Sort by Intelligence index
`2`	Sort by Release date
`3`	Sort by Speed (tok/s)

Filters & Grouping

Key	Action
`4`	Cycle source filter (All / Open / Closed)
`5`	Toggle region grouping
`6`	Toggle type grouping

Sort (full cycle)

Key	Action
`s`	Cycle through all 20 sort columns
`S`	Toggle sort direction (asc/desc)

Actions

Key	Action
`c`	Copy benchmark name
`o`	Open Artificial Analysis page

CLI Commands

List providers

models list providers

List models

# All models
models list models

# Models from a specific provider
models list models anthropic

Show model details

models show claude-opus-4-5-20251101

Claude Opus 4.5
===============

ID:          claude-opus-4-5-20251101
Provider:    Anthropic (anthropic)
Family:      claude-opus

Limits
------
Context:     200k tokens
Max Output:  64k tokens

Pricing (per million tokens)
----------------------------
Input:       $5.00
Output:      $25.00
Cache Read:  $0.50
Cache Write: $6.25

Capabilities
------------
Reasoning:   Yes
Tool Use:    Yes
Attachments: Yes
Modalities:  text, image, pdf -> text

Metadata
--------
Released:    2025-11-01
Updated:     2025-11-01
Knowledge:   2025-03-31
Open Weights: No

Search models

models search "gpt-4"
models search "claude opus"

JSON output

All commands support --json for scripting:

models list providers --json
models show claude-opus-4-5 --json
models search "llama" --json

Data Sources

Lots of gratitude and couldn't have made this application without these workhorses doing the legwork. Shout out to the sources!:

Model data: Fetched from models.dev, an open-source database of AI models maintained by SST
Benchmark data: Fetched from Artificial Analysis — quality indexes, benchmark scores, speed, and pricing for ~400 model entries
Agent data: Curated catalog in data/agents.json — contributions welcome!
GitHub data: Fetched from GitHub API (stars, releases, changelogs)

License

MIT

For Tasks:

Click tags to check more tools for each tasks

browse models compare benchmarks track coding assistants customize tracked agents search for models

For Jobs:

data scientist machine learning engineer ai researcher software developer ai model analyst

Alternative AI tools for models

Similar Open Source Tools

models

github

: 63

tokscale

Tokscale is a high-performance CLI tool and visualization dashboard for tracking token usage and costs across multiple AI coding agents. It helps monitor and analyze token consumption from various AI coding tools, providing real-time pricing calculations using LiteLLM's pricing data. Inspired by the Kardashev scale, Tokscale measures token consumption as users scale the ranks of AI-augmented development. It offers interactive TUI mode, multi-platform support, real-time pricing, detailed breakdowns, web visualization, flexible filtering, and social platform features.

github

: 678

llamafarm

LlamaFarm is a comprehensive AI framework that empowers users to build powerful AI applications locally, with full control over costs and deployment options. It provides modular components for RAG systems, vector databases, model management, prompt engineering, and fine-tuning. Users can create differentiated AI products without needing extensive ML expertise, using simple CLI commands and YAML configs. The framework supports local-first development, production-ready components, strategy-based configuration, and deployment anywhere from laptops to the cloud.

github

: 811

llm-checker

LLM Checker is an AI-powered CLI tool that analyzes your hardware to recommend optimal LLM models. It features deterministic scoring across 35+ curated models with hardware-calibrated memory estimation. The tool helps users understand memory bandwidth, VRAM limits, and performance characteristics to choose the right LLM for their hardware. It provides actionable recommendations in seconds by scoring compatible models across four dimensions: Quality, Speed, Fit, and Context. LLM Checker is designed to work on any Node.js 16+ system, with optional SQLite search features for advanced functionality.

github

: 514

oh-my-pi

oh-my-pi is an AI coding agent for the terminal, providing tools for interactive coding, AI-powered git commits, Python code execution, LSP integration, time-traveling streamed rules, interactive code review, task management, interactive questioning, custom TypeScript slash commands, universal config discovery, MCP & plugin system, web search & fetch, SSH tool, Cursor provider integration, multi-credential support, image generation, TUI overhaul, edit fuzzy matching, and more. It offers a modern terminal interface with smart session management, supports multiple AI providers, and includes various tools for coding, task management, code review, and interactive questioning.

github

: 893

dexto

Dexto is a lightweight runtime for creating and running AI agents that turn natural language into real-world actions. It serves as the missing intelligence layer for building AI applications, standalone chatbots, or as the reasoning engine inside larger products. Dexto features a powerful CLI and Web UI for running AI agents, supports multiple interfaces, allows hot-swapping of LLMs from various providers, connects to remote tool servers via the Model Context Protocol, is config-driven with version-controlled YAML, offers production-ready core features, extensibility for custom services, and enables multi-agent collaboration via MCP and A2A.

github

: 584

claude-talk-to-figma-mcp

A Model Context Protocol (MCP) plugin named Claude Talk to Figma MCP that enables Claude Desktop and other AI tools to interact directly with Figma for AI-assisted design capabilities. It provides document interaction, element creation, smart modifications, text mastery, and component integration. Users can connect the plugin to Figma, start designing, and utilize various tools for document analysis, element creation, modification, text manipulation, and component management. The project offers installation instructions, AI client configuration options, usage patterns, command references, troubleshooting support, testing guidelines, architecture overview, contribution guidelines, version history, and licensing information.

github

: 370

ruby_llm-agents

RubyLLM::Agents is a production-ready Rails engine for building, managing, and monitoring LLM-powered AI agents. It seamlessly integrates with Rails apps, providing features like automatic execution tracking, cost analytics, budget controls, and a real-time dashboard. Users can build intelligent AI agents in Ruby using a clean DSL and support various LLM providers like OpenAI GPT-4, Anthropic Claude, and Google Gemini. The engine offers features such as agent DSL configuration, execution tracking, cost analytics, reliability with retries and fallbacks, budget controls, multi-tenancy support, async execution with Ruby fibers, real-time dashboard, streaming, conversation history, image operations, alerts, and more.

github

: 77

mindnlp

MindNLP is an open-source NLP library based on MindSpore. It provides a platform for solving natural language processing tasks, containing many common approaches in NLP. It can help researchers and developers to construct and train models more conveniently and rapidly. Key features of MindNLP include: * Comprehensive data processing: Several classical NLP datasets are packaged into a friendly module for easy use, such as Multi30k, SQuAD, CoNLL, etc. * Friendly NLP model toolset: MindNLP provides various configurable components. It is friendly to customize models using MindNLP. * Easy-to-use engine: MindNLP simplified complicated training process in MindSpore. It supports Trainer and Evaluator interfaces to train and evaluate models easily. MindNLP supports a wide range of NLP tasks, including: * Language modeling * Machine translation * Question answering * Sentiment analysis * Sequence labeling * Summarization MindNLP also supports industry-leading Large Language Models (LLMs), including Llama, GLM, RWKV, etc. For support related to large language models, including pre-training, fine-tuning, and inference demo examples, you can find them in the "llm" directory. To install MindNLP, you can either install it from Pypi, download the daily build wheel, or install it from source. The installation instructions are provided in the documentation. MindNLP is released under the Apache 2.0 license. If you find this project useful in your research, please consider citing the following paper: @misc{mindnlp2022, title={{MindNLP}: a MindSpore NLP library}, author={MindNLP Contributors}, howpublished = {\url{https://github.com/mindlab-ai/mindnlp}}, year={2022} }

github

: 909

agentscope

AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.

github

: 6.7k

everything-claude-code

The 'Everything Claude Code' repository is a comprehensive collection of production-ready agents, skills, hooks, commands, rules, and MCP configurations developed over 10+ months. It includes guides for setup, foundations, and philosophy, as well as detailed explanations of various topics such as token optimization, memory persistence, continuous learning, verification loops, parallelization, and subagent orchestration. The repository also provides updates on bug fixes, multi-language rules, installation wizard, PM2 support, OpenCode plugin integration, unified commands and skills, and cross-platform support. It offers a quick start guide for installation, ecosystem tools like Skill Creator and Continuous Learning v2, requirements for CLI version compatibility, key concepts like agents, skills, hooks, and rules, running tests, contributing guidelines, OpenCode support, background information, important notes on context window management and customization, star history chart, and relevant links.

github

: 45.4k

axonhub

AxonHub is an all-in-one AI development platform that serves as an AI gateway allowing users to switch between model providers without changing any code. It provides features like vendor lock-in prevention, integration simplification, observability enhancement, and cost control. Users can access any model using any SDK with zero code changes. The platform offers full request tracing, enterprise RBAC, smart load balancing, and real-time cost tracking. AxonHub supports multiple databases, provides a unified API gateway, and offers flexible model management and API key creation for authentication. It also integrates with various AI coding tools and SDKs for seamless usage.

github

: 1.9k

deepfabric

DeepFabric is a CLI tool and SDK designed for researchers and developers to generate high-quality synthetic datasets at scale using large language models. It leverages a graph and tree-based architecture to create diverse and domain-specific datasets while minimizing redundancy. The tool supports generating Chain of Thought datasets for step-by-step reasoning tasks and offers multi-provider support for using different language models. DeepFabric also allows for automatic dataset upload to Hugging Face Hub and uses YAML configuration files for flexibility in dataset generation.

github

: 533

Edit-Banana

Edit Banana is a universal content re-editor that allows users to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction while preserving original diagram details and logical relationships. The platform offers advanced segmentation, fixed multi-round VLM scanning, high-quality OCR, user system with credits, multi-user concurrency, and a web interface. Users can upload images or PDFs to get editable DrawIO (XML) or PPTX files in seconds. The project structure includes components for segmentation, text extraction, frontend, models, and scripts, with detailed installation and setup instructions provided. The tool is open-source under the Apache License 2.0, allowing commercial use and secondary development.

github

: 1.4k

ai-coders-context

The @ai-coders/context repository provides the Ultimate MCP for AI Agent Orchestration, Context Engineering, and Spec-Driven Development. It simplifies context engineering for AI by offering a universal process called PREVC, which consists of Planning, Review, Execution, Validation, and Confirmation steps. The tool aims to address the problem of context fragmentation by introducing a single `.context/` directory that works universally across different tools. It enables users to create structured documentation, generate agent playbooks, manage workflows, provide on-demand expertise, and sync across various AI tools. The tool follows a structured, spec-driven development approach to improve AI output quality and ensure reproducible results across projects.

github

: 380

augustus

Augustus is a Go-based LLM vulnerability scanner designed for security professionals to test large language models against a wide range of adversarial attacks. It integrates with 28 LLM providers, covers 210+ adversarial attacks including prompt injection, jailbreaks, encoding exploits, and data extraction, and produces actionable vulnerability reports. The tool is built for production security testing with features like concurrent scanning, rate limiting, retry logic, and timeout handling out of the box.

github

: 120

For similar tasks

sd-civitai-browser-plus

sd-civitai-browser-plus is an extension designed for Automatic1111's Stable Difussion Web UI, providing features to browse models from CivitAI, check for updates, download specific model versions hassle-free, assign tags to models, access model info quickly, and download models with high-speed using Aria2. The extension offers a sleek and intuitive user interface, actively maintained with feature requests welcome. It also addresses known issues like frozen downloads with possible solutions. The tool is actively developed with regular updates and bug fixes, ensuring a smooth user experience.

github

: 264

models

github

: 63

LLaVA-pp

This repository, LLaVA++, extends the visual capabilities of the LLaVA 1.5 model by incorporating the latest LLMs, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B. It provides various models for instruction-following LMMS and academic-task-oriented datasets, along with training scripts for Phi-3-V and LLaMA-3-V. The repository also includes installation instructions and acknowledgments to related open-source contributions.

github

: 499

llm-checker

github

: 514

For similar jobs

explain-openclaw

Explain OpenClaw is a comprehensive documentation repository for the OpenClaw framework, a self-hosted AI assistant platform. It covers various aspects such as plain English explanations, technical architecture, deployment scenarios, privacy and safety measures, security audits, worst-case security scenarios, optimizations, and AI model comparisons. The repository serves as a living knowledge base with beginner-friendly explanations and detailed technical insights for contributors.

github

: 69

models

github

: 63

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 1.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.9k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 32.9k