tt-metal

tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Stars: 434

Visit
 screenshot

TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

README:

Buy hardware | Install | Discord | Join Us

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.

API Reference | Model Demos


LLMs

Model Batch Hardware ttft (s) t/s/u Target
t/s/u
t/s Release
Falcon7B-decode 32 e150 4.2 4.4 134.4
Falcon7B 32 n150 0.07 16.7 26 534.4 v0.52.0-rc31
Mistral-7B 32 n150 9.9 25 316.8 v0.51.0-rc28
Mamba-2.8B 32 n150 0.04 12.3 41 393.6 v0.51.0-rc26
LLaMA-3.1-8B 1 n150 0.20 21.4 23 21.4 v0.52.0-rc31
Falcon7B (DP=8) 256 QuietBox 0.10 14.4 26 3686.4 v0.52.0-rc31
LLaMA-2-70B - (TP=8) 32 QuietBox 0.19 15.1 20 483.2 v0.52.0-rc31
LLaMA-3.1-70B (TP=8) 32 QuietBox 0.19 15.1 20 483.2 v0.52.0-rc31
Falcon40B (TP=8) 32 QuietBox 5.3 36 169.6 v0.52.0-rc31
Mixtral7Bx8 (TP=8) 32 QuietBox 0.23 14.2 33 454.4 v0.52.0-rc31
Falcon7B (DP=32) 1024 Galaxy 0.24 4.4 26 4505.6 v0.52.0-rc31
LLaMA-3.1-70B (DP=4, TP=8) 128 Galaxy 0.19 14.3 20 1835.5 v0.52.0-rc31

Last Update: October 7, 2024

Notes:

  • TP = Tensor Parallel, DP = Data Parallel; Defines parallelization factors across multiple devices.
  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

CNNs

Model Batch Hardware fps Target fps Release
ResNet-50 (224x224) 20 e150 5,100 10,000
ResNet-50 (224x224) 16 n150 4,100 7,000
ResNet-50 (224x224) (DP=2) 32 n300 8,200 14,000
ResNet-50 (224x224) (DP=8) 128 QuietBox 32,250 56,000
ResNet-50 (224x224) (DP=32) 512 Galaxy 95,900 224,000
ResNet-50 (224x224) (DP=64) 1024 Two Galaxies 145,000 448,000
ViT 9 e150 1,360 2,000
ViT 8 n150 912 1,600
Stable Diffusion 1.4 (512x512) 1 n150 0.167 0.3

NLPs

Model Batch Hardware sen/sec Target sen/sec Release
BERT-Large 12 e150 370 410
BERT-Large 8 n150 270 400
T5 small e150 140
Bloom e150 70

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

TT-NN Tech Reports


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Programming Guide | API Reference

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

TT-Metalium Programming Examples

Hello World

Add Integers

Simple Tensor Manipulation

DRAM Data Movement

Eltwise

Matmul

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for tt-metal

Similar Open Source Tools

For similar tasks

For similar jobs