tt-metal

tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Stars: 659

Visit
 screenshot

TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

README:

tt-metal CI

Buy hardware | Install | Discord | Join Us

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.

API Reference | Model Demos


LLMs

Model Batch Hardware ttft (ms) t/s/u Target
t/s/u
t/s TT-Metalium Release vLLM Tenstorrent Repo Release
QwQ 32B (TP=8) 32 QuietBox 133 25.2 464.0 v0.56.0-rc51 e2e0002
DeepSeek R1 Distill Llama 3.3 70B (TP=8) 32 QuietBox 180 15.2 20 486.4 v0.56.0-rc47 e2e0002
Llama 3.1 70B (TP=8) 32 QuietBox 180 15.2 20 486.4 v0.56.0-rc47 e2e0002
Llama 3.2 11B Vision (TP=2) 16 n300 2550 15.8 17 252.8 v0.56.0-rc6 e2e0002
Qwen 2.5 7B (TP=2) 32 n300 126 32.5 38 1040.0 v0.56.0-rc33 e2e0002
Qwen 2.5 72B (TP=8) 32 QuietBox 333 14.5 20 464.0 v0.56.0-rc33 e2e0002
Falcon 7B 32 n150 70 18.3 26 585.6 v0.56.0-rc47
Falcon 7B (DP=8) 256 QuietBox 88 15.5 26 3968.0 v0.56.0-rc47
Falcon 7B (DP=32) 1024 Galaxy 223 4.8 26 4915.2 v0.56.0-rc6
Falcon 40B (TP=8) 32 QuietBox 5.3 36 169.6 v0.56.0-rc45
Llama 3.1 8B 32 n150 141 24.6 23 787.2 v0.56.0-rc47 e2e0002
Llama 3.2 1B 32 n150 50 67.6 160 2163.2 v0.56.0-rc47 e2e0002
Llama 3.2 3B 32 n150 78 43.5 60 1392.0 v0.56.0-rc47 e2e0002
Mamba 2.8B 32 n150 48 12.3 41 393.6 v0.51.0-rc26
Mistral 7B 32 n150 9.9 25 316.8 v0.51.0-rc28
Mixtral 8x7B (TP=8) 32 QuietBox 227 15.2 33 486.4 v0.56.0-rc47

Last Update: March 10, 2025

Notes:

  • ttft = time to first token | t/s/u = tokens/second/user | t/s = tokens/second; where t/s = t/s/u * batch.
  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

CNNs

Model Batch Hardware fps Target fps Release
ResNet-50 (224x224) 16 n150 4,700 7,000
ResNet-50 (224x224) (DP=2) 32 n300 9,200 14,000
ResNet-50 (224x224) (DP=8) 128 QuietBox 35,800 56,000
ResNet-50 (224x224) (DP=32) 512 Galaxy 96,800 224,000
ResNet-50 (224x224) (DP=64) 1024 Two Galaxies 145,000 448,000
ViT (224x224) 8 n150 912 1,600
Stable Diffusion 1.4 (512x512) 1 n150 0.167 0.3
YOLOv4 (320x320) 1 n150 120 300
SegFormer Semantic Segmentation (512x512) 1 n150 90 300
Stable Diffusion 3.5 medium (512x512) 1 n150 0.06 0.3

NLPs

Model Batch Hardware sen/sec Target sen/sec Release
BERT-Large 8 n150 270 400

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

Model Bring-Up and Testing

For information on initial model procedures, please see Model Bring-Up and Testing

TT-NN Tech Reports

Benchmarks


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Programming Guide | API Reference

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

Tenstorrent Bounty Program Terms and Conditions

This repo is a part of Tenstorrent’s bounty program. If you are interested in helping to improve tt-metal, please make sure to read the Tenstorrent Bounty Program Terms and Conditions before heading to the issues tab. Look for the issues that are tagged with both “bounty” and difficulty level!

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for tt-metal

Similar Open Source Tools

For similar tasks

For similar jobs