
tt-forge
Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-source, general, and performant compiler.
Stars: 112

TT-Forge is Tenstorrent's MLIR-based compiler that integrates into various compiler technologies from AI/ML frameworks to enable running models and custom kernel generation. It aims to provide abstraction of different frontend frameworks, compile various model architectures with good performance, and abstract all Tenstorrent device architectures. The repository serves as the central hub for the tt-forge compiler project, bringing together sub-projects into a cohesive product with releases, demos, model support, roadmaps, and key resources. Users can explore the documentation for individual front ends to get started running tests and demos.
README:
TT-Forge is Tenstorrent's MLIR-based compiler. It integrates into various compiler technologies from AI/ML frameworks, to both enable running models and create custom kernel generation.
- Getting Started / How to Run a Model
- Interactive Tenstorrent Software Diagram
- TT-Forge-FE
- TT-Torch
- TT-XLA
- TT-MLIR
- TT-Metal
- TT-TVM
This repository is the central hub for the tt-forge compiler project, bringing together its various sub-projects into a cohesive product. Here, you'll find releases, demos, model support, roadmaps, and other key resources as the project evolves. Please file any issues with questions or feedback you may have here.
See the documentation available for individual front ends in the Front End section to get started running some tests. You can also try running a demo using the TT-Forge Getting Started page.
- Provide abstraction of many different frontend frameworks
- Generically compile many kinds of model architectures without modification and with good performance
- Abstract all Tenstorrent device architectures
tt-forge is composed of various projects ranging from front ends to support popular third-party AI Frameworks, MLIR compiler project, performance optimizations and tools to support the project. tt-forge lowers to our tt-metalium project, providing additional functionality to our AI Sofware ecosystem.
Overview of Tenstorrent's Opensource AI software ecosystem. Click on components to navigate to their repositories:
flowchart TD
%% Define styles for the diagram with improved contrast and font size
classDef frameworks fill:#f9d6d2,stroke:#e05d44,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef frontends fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef compiler fill:#d1e7dd,stroke:#198754,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef runtime fill:#cfe2ff,stroke:#0d6efd,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef system fill:#e2e3e5,stroke:#6c757d,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef software fill:#d3d3ff,stroke:#6610f2,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef hardware fill:#f8f9fa,stroke:#212529,stroke-width:2px,color:#000000,font-size:14px,font-weight:bold
classDef invisible opacity:0,fill:none,stroke:none
%% Top level layout with invisible container to center frameworks
subgraph TopLevel[" "]
direction LR
%% Left spacer (invisible)
LeftSpacer[" "]:::invisible
%% Center container for frameworks
subgraph FrameworksContainer[" "]
direction TB
%% Top level frameworks
subgraph Frameworks["<span style='font-size:16px;font-weight:bold'>Frameworks</span>"]
direction LR
JAX("<span style='font-size:14px;font-weight:bold'>JAX</span>")
ONX("<span style='font-size:14px;font-weight:bold'>ONNX</span>")
PYTORCH("<span style='font-size:14px;font-weight:bold'>PyTorch</span>")
TF("<span style='font-size:14px;font-weight:bold'>TensorFlow</span>")
end
%% Front-ends
subgraph FrontEnds["<span style='font-size:16px;font-weight:bold'>Front Ends</span>"]
direction LR
%% Add extra spacing between frontend components
TT_TORCH("<span style='font-size:14px;font-weight:bold'>tt-torch</span>")
TT_FORGE_FE("<span style='font-size:14px;font-weight:bold'>tt-forge-FE</span>")
TT_XLA("<span style='font-size:14px;font-weight:bold'>tt-xla</span>")
end
end
%% Right spacer (invisible)
RightSpacer[" "]:::invisible
end
%% Style invisible containers
TopLevel:::invisible
FrameworksContainer:::invisible
%% Compiler sections side by side
subgraph CompilerLayer["<span style='font-size:16px;font-weight:bold'>Compiler Layer</span>"]
%% tt-MLIR Compiler section
subgraph TTMLIR["<span style='font-size:16px;font-weight:bold'>tt-MLIR Compiler</span>"]
TTIR("<span style='font-size:14px;font-weight:bold'>TT-IR</span>")
STABLEHLO("<span style='font-size:14px;font-weight:bold'>StableHLO-IR</span>")
PYKERNEL("<span style='font-size:14px;font-weight:bold'>PyKernel</span>")
%% Graph Passes - using hexagon shape
GRAPH_PASSES{{"<span style='font-size:14px;font-weight:bold'>Graph Passes</span>"}}
TTMETAL_IR("<span style='font-size:14px;font-weight:bold'>TT-Metal-IR</span>")
TTNN("<span style='font-size:14px;font-weight:bold'>TTNN-IR</span>")
TTKERNEL("<span style='font-size:14px;font-weight:bold'>TTKernel-IR</span>")
%% Connect PyKernel to Graph Passes
PYKERNEL --> GRAPH_PASSES
%% Connect Graph Passes to IRs
GRAPH_PASSES --> TTKERNEL
GRAPH_PASSES --> TTNN
GRAPH_PASSES --> TTMETAL_IR
end
%% Compiler Tools section with vertical layout
subgraph CompilerTools["<span style='font-size:16px;font-weight:bold'>Compiler Tools</span>"]
direction TB
TTMLIROPT("<span style='font-size:14px;font-weight:bold'>ttmlir-opt</span>")
TTNNSTANDALONE("<span style='font-size:14px;font-weight:bold'>ttnn-standalone</span>")
TTEXPLORER("<span style='font-size:14px;font-weight:bold'>tt-explorer</span>")
end
end
%% Set direction for compiler sections to be side by side
CompilerLayer:::none
TTMLIR --- CompilerTools
%% TT-Metalium section with Tools
subgraph MetaliumLayer["<span style='font-size:16px;font-weight:bold'>Metalium Layer</span>"]
%% TT-Metalium section
subgraph TTMETALIUM["<span style='font-size:16px;font-weight:bold'>TT-Metalium</span>"]
TTNN_HW("<span style='font-size:14px;font-weight:bold'>TTNN</span>")
TTMETAL("<span style='font-size:14px;font-weight:bold'>TTMetal</span>")
%% Connect TTNN to TTMetal within TT-Metalium
TTNN_HW --> TTMETAL
end
%% Metalium Tools section with vertical layout
subgraph MetaliumTools["<span style='font-size:16px;font-weight:bold'>Metalium Tools</span>"]
direction TB
TRACY("<span style='font-size:14px;font-weight:bold'>tracy</span>")
TTNPE("<span style='font-size:14px;font-weight:bold'>tt-npe</span>")
TTNNVISUALIZER("<span style='font-size:14px;font-weight:bold'>ttnn-visualizer</span>")
end
end
%% Set direction for Metalium sections to be side by side
MetaliumLayer:::none
TTMETALIUM --- MetaliumTools
%% LLK outside of TT-Metalium
LLK("<span style='font-size:14px;font-weight:bold'>LLK</span>")
%% System Tools and System Software sections side by side
subgraph SystemLayer["<span style='font-size:16px;font-weight:bold'>System Layer</span>"]
%% System Tools section
subgraph SystemTools["<span style='font-size:16px;font-weight:bold'>System Tools</span>"]
TTSMI("<span style='font-size:14px;font-weight:bold'>tt-smi</span>")
LUWEN("<span style='font-size:14px;font-weight:bold'>luwen</span>")
TTTOPOLOGY("<span style='font-size:14px;font-weight:bold'>tt-topology</span>")
end
%% System Software section
subgraph SystemSoftware["<span style='font-size:16px;font-weight:bold'>System Software</span>"]
UMD("<span style='font-size:14px;font-weight:bold'>UMD</span>")
KMD("<span style='font-size:14px;font-weight:bold'>KMD</span>")
end
end
%% Set direction for system sections to be side by side
SystemLayer:::none
%% Hardware section
subgraph Hardware["<span style='font-size:16px;font-weight:bold'>Hardware</span>"]
WORMHOLE("<span style='font-size:14px;font-weight:bold'>Wormhole</span>")
BLACKHOLE("<span style='font-size:14px;font-weight:bold'>Blackhole</span>")
end
%% Connect TTMetal to LLK, LLK to System Software, and System Layer to Hardware
TTMETAL --> LLK
LLK --> SystemSoftware
SystemLayer --> Hardware
%% Connect frameworks to front-ends with longer arrows
ONX -.-> TT_TORCH
ONX -.-> TT_FORGE_FE
JAX -.-> TT_XLA
PYTORCH -.-> TT_TORCH
PYTORCH -.-> TT_FORGE_FE
TF -.-> TT_FORGE_FE
%% Connect front-ends to tt-MLIR Compiler
TT_TORCH --> STABLEHLO
TT_XLA --> STABLEHLO
TT_FORGE_FE --> TTIR
%% Connect tt-MLIR Compiler components
STABLEHLO --> TTIR
TTIR --> GRAPH_PASSES
%% Connect IRs to hardware
TTNN --> TTNN_HW
TTMETAL_IR --> TTMETAL
TTKERNEL --> TTMETALIUM
%% Apply styles
class ONX,JAX,PYTORCH,TF frameworks
class TT_TORCH,TT_XLA,TT_FORGE_FE frontends
class TTIR,TTKERNEL,TTNN,TTMETAL_IR,GRAPH_PASSES,PYKERNEL,TTMLIROPT,TTNNSTANDALONE,TTEXPLORER compiler
class TTMETAL,TTNN_HW,LLK,TRACY,TTNPE,TTNNVISUALIZER runtime
class TTSMI,LUWEN,TTTOPOLOGY system
class UMD,KMD software
class WORMHOLE,BLACKHOLE hardware
classDef none opacity:0,fill:none,stroke:none
class LeftSpacer,RightSpacer,TopLevel,FrameworksContainer invisible
%% Add clickable URLs to frontend components
click TT_XLA "https://github.com/tenstorrent/tt-xla" "tt-xla GitHub Repository" _blank
click TT_TORCH "https://github.com/tenstorrent/tt-torch" "tt-torch GitHub Repository" _blank
click TT_FORGE_FE "https://github.com/tenstorrent/tt-forge-fe" "tt-forge-fe GitHub Repository" _blank
%% Add clickable URLs to IR components
click TTKERNEL "https://github.com/tenstorrent/tt-mlir/tree/main/lib/Dialect/TTKernel/IR" "TTKernel-IR GitHub Repository" _blank
click TTIR "https://github.com/tenstorrent/tt-mlir/tree/main/lib/Dialect/TTIR/IR" "TT-IR GitHub Repository" _blank
click TTMETAL_IR "https://github.com/tenstorrent/tt-mlir/tree/main/lib/Dialect/TTMetal/IR" "TT-Metal-IR GitHub Repository" _blank
click TTNN "https://github.com/tenstorrent/tt-mlir/tree/main/lib/Dialect/TTNN/IR" "TTNN-IR GitHub Repository" _blank
click PYKERNEL "https://github.com/tenstorrent/tt-mlir/tree/main/python/pykernel" "PyKernel GitHub Repository" _blank
click STABLEHLO "https://openxla.org/stablehlo/spec" "StableHLO Specification" _blank
%% Add clickable URLs to System Software components
click UMD "https://github.com/tenstorrent/tt-umd" "UMD GitHub Repository" _blank
click KMD "https://github.com/tenstorrent/tt-kmd" "KMD GitHub Repository" _blank
%% Add clickable URLs to System Tools components
click TTSMI "https://github.com/tenstorrent/tt-smi" "tt-smi GitHub Repository" _blank
click LUWEN "https://github.com/tenstorrent/luwen" "luwen GitHub Repository" _blank
click TTTOPOLOGY "https://github.com/tenstorrent/tt-kmd" "tt-topology GitHub Repository" _blank
%% Add clickable URLs to TT-Metalium components
click TTMETAL "https://github.com/tenstorrent/tt-metal" "TTMetal GitHub Repository" _blank
click TTNN_HW "https://github.com/tenstorrent/tt-metal/tree/main/ttnn" "TTNN GitHub Repository" _blank
click LLK "https://github.com/tenstorrent/tt-llk" "LLK GitHub Repository" _blank
%% Add clickable URLs to Metalium Tools components
click TRACY "https://github.com/tenstorrent/tt-metal/tree/main/ttnn/tracy" "tracy GitHub Repository" _blank
click TTNPE "https://github.com/tenstorrent/tt-npe" "tt-npe GitHub Repository" _blank
click TTNNVISUALIZER "https://github.com/tenstorrent/ttnn-visualizer" "ttnn-visualizer GitHub Repository" _blank
%% Add clickable URLs to Compiler Tools components
click TTEXPLORER "https://github.com/tenstorrent/tt-mlir/tree/main/tools/explorer" "tt-explorer GitHub Repository" _blank
click TTNNSTANDALONE "https://github.com/tenstorrent/tt-mlir/tree/main/tools/ttnn-standalone" "ttnn-standalone GitHub Repository" _blank
click TTMLIROPT "https://github.com/tenstorrent/tt-mlir/tree/main/tools/ttmlir-opt" "ttmlir-opt GitHub Repository" _blank
%% Add clickable URLs to Hardware components
click WORMHOLE "https://tenstorrent.com/hardware/wormhole" "Wormhole Hardware Product Page" _blank
click BLACKHOLE "https://tenstorrent.com/hardware/blackhole" "Blackhole Hardware Product Page" _blank
-
- A TVM based graph compiler designed to optimize and transform computational graphs for deep learning models. Supports ingestion of PyTorch, ONNX, TensorFlow, PaddlePaddle and similar ML frameworks via TVM (TT-TVM).
- See the TT-Forge-FE docs pages for an overview and getting started guide.
-
- A MLIR-native, open-source, PyTorch 2.X and torch-mlir based front-end. It provides stableHLO (SHLO) graphs to TT-MLIR. Supports ingestion of PyTorch models via PT2.X compile and ONNX models via torch-mlir (ONNX->SHLO)
- See the TT-Torch docs pages for an overview and getting started guide.
-
- Leverages a PJRT interface to integrate JAX (and in the future other frameworks), TT-MLIR and Tenstorrent hardware. Supports ingestion of JAX models via jit compile, providing StableHLO (SHLO) graph to TT-MLIR compiler
- See the TT-XLA docs pages for an overview and getting started guide.
TT-MLIR Project
At its core TT-MLIR is our compiler that is interfacing with tt-metalium our opens source low level AI Hardware SDK. TT-MLIR provides a solution for optimizing machine learning and other compute workloads for all tenstorrent hardware. TT-MLIR bridges the gap between all different ML Frameworks and Tenstorrent Hardware. TT-MLIR is broken into different dialects:
-
TTIR Dialect: Our common IR that can then be lowered into multiple different backends
-
TTNN Dialect: Our entry point into the TTNN Library of Ops
-
TTMetalium Dialect: Our entry point into directly accessing tt-metalium kernels.
The compiler employs various optimization passes, including layout transformation, operation fusing, decomposition, and sharding, ensuring the efficient lowering to the target dialect.
-
ttmlir-opt: This tool is used to run the TT-MLIR compiler passes on .mlir source files and is central to developing and testing the cmpiler.
-
ttmlir-translate: ttmlir-translate allows us to ingest something (e.g., code) into MLIR compiler, and produce something (e.g., executable binary, or even code again) from MLIR compiler.
-
ttrt: It is a standalone runtime tool that can inspect and run compiler executable files without front-end.
-
tt-explorer: It provides a “Human-In-Loop” interface such that the compiler results can be actively tuned and understood by the person compiling the model.
-
ttnn-standalone: This is a post-compile tuning/debugging tool for C++ TTNN generated code.
This repo is a part of Tenstorrent’s bounty program. If you are interested in helping to improve TT-Forge, please make sure to read the Tenstorrent Bounty Program Terms and Conditions before heading to the issues tab. Look for the issues that are tagged with both “bounty” and difficulty level!
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for tt-forge
Similar Open Source Tools

tt-forge
TT-Forge is Tenstorrent's MLIR-based compiler that integrates into various compiler technologies from AI/ML frameworks to enable running models and custom kernel generation. It aims to provide abstraction of different frontend frameworks, compile various model architectures with good performance, and abstract all Tenstorrent device architectures. The repository serves as the central hub for the tt-forge compiler project, bringing together sub-projects into a cohesive product with releases, demos, model support, roadmaps, and key resources. Users can explore the documentation for individual front ends to get started running tests and demos.

galah
Galah is an LLM-powered web honeypot designed to mimic various applications and dynamically respond to arbitrary HTTP requests. It supports multiple LLM providers, including OpenAI. Unlike traditional web honeypots, Galah dynamically crafts responses for any HTTP request, caching them to reduce repetitive generation and API costs. The honeypot's configuration is crucial, directing the LLM to produce responses in a specified JSON format. Note that Galah is a weekend project exploring LLM capabilities and not intended for production use, as it may be identifiable through network fingerprinting and non-standard responses.

llmware
LLMWare is a framework for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely.

mutahunter
Mutahunter is an open-source language-agnostic mutation testing tool maintained by CodeIntegrity. It leverages LLM models to inject context-aware faults into codebase, ensuring comprehensive testing. The tool aims to empower companies and developers to enhance test suites and improve software quality by verifying the effectiveness of test cases through creating mutants in the code and checking if the test cases can catch these changes. Mutahunter provides detailed reports on mutation coverage, killed mutants, and survived mutants, enabling users to identify potential weaknesses in their test suites.

gptel
GPTel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It's async and fast, streams responses, and interacts with LLMs from anywhere in Emacs. LLM responses are in Markdown or Org markup. Supports conversations and multiple independent sessions. Chats can be saved as regular Markdown/Org/Text files and resumed later. You can go back and edit your previous prompts or LLM responses when continuing a conversation. These will be fed back to the model. Don't like gptel's workflow? Use it to create your own for any supported model/backend with a simple API.

smile
Smile (Statistical Machine Intelligence and Learning Engine) is a comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. It covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc. Smile implements major machine learning algorithms and provides interactive shells for Java, Scala, and Kotlin. It supports model serialization, data visualization using SmilePlot and declarative approach, and offers a gallery showcasing various algorithms and visualizations.

go-utcp
The Universal Tool Calling Protocol (UTCP) is a modern, flexible, and scalable standard for defining and interacting with tools across various communication protocols. It emphasizes scalability, interoperability, and ease of use. It provides built-in transports for HTTP, CLI, Server-Sent Events, streaming HTTP, GraphQL, MCP, and UDP. Users can use the library to construct a client and call tools using the available transports. The library also includes utilities for variable substitution, in-memory repository for storing providers and tools, and OpenAPI conversion to UTCP manuals.

evedel
Evedel is an Emacs package designed to streamline the interaction with LLMs during programming. It aims to reduce manual code writing by creating detailed instruction annotations in the source files for LLM models. The tool leverages overlays to track instructions, categorize references with tags, and provide a seamless workflow for managing and processing directives. Evedel offers features like saving instruction overlays, complex query expressions for directives, and easy navigation through instruction overlays across all buffers. It is versatile and can be used in various types of buffers beyond just programming buffers.

R-Judge
R-Judge is a benchmarking tool designed to evaluate the proficiency of Large Language Models (LLMs) in judging and identifying safety risks within diverse environments. It comprises 569 records of multi-turn agent interactions, covering 27 key risk scenarios across 5 application categories and 10 risk types. The tool provides high-quality curation with annotated safety labels and risk descriptions. Evaluation of 11 LLMs on R-Judge reveals the need for enhancing risk awareness in LLMs, especially in open agent scenarios. Fine-tuning on safety judgment is found to significantly improve model performance.

chronos-forecasting
Chronos is a family of pretrained time series forecasting models based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes.

LLM-Fine-Tuning-Azure
A fine-tuning guide for both OpenAI and Open-Source Large Language Models on Azure. Fine-Tuning retrains an existing pre-trained LLM using example data, resulting in a new 'custom' fine-tuned LLM optimized for task-specific examples. Use cases include improving LLM performance on specific tasks and introducing information not well represented by the base LLM model. Suitable for cases where latency is critical, high accuracy is required, and clear evaluation metrics are available. Learning path includes labs for fine-tuning GPT and Llama2 models via Dashboards and Python SDK.

aimet
AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy. AIMET is designed to work with PyTorch, TensorFlow and ONNX models. We also host the AIMET Model Zoo - a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET.

Adaptive-MT-LLM-Fine-tuning
The repository Adaptive-MT-LLM-Fine-tuning contains code and data for the paper 'Fine-tuning Large Language Models for Adaptive Machine Translation'. It focuses on enhancing Mistral 7B, a large language model, for real-time adaptive machine translation in the medical domain. The fine-tuning process involves using zero-shot and one-shot translation prompts to improve terminology and style adherence. The repository includes training and test data, data processing code, fuzzy match retrieval techniques, fine-tuning methods, conversion to CTranslate2 format, tokenizers, translation codes, and evaluation metrics.

ALMA
ALMA (Advanced Language Model-based Translator) is a many-to-many LLM-based translation model that utilizes a two-step fine-tuning process on monolingual and parallel data to achieve strong translation performance. ALMA-R builds upon ALMA models with LoRA fine-tuning and Contrastive Preference Optimization (CPO) for even better performance, surpassing GPT-4 and WMT winners. The repository provides ALMA and ALMA-R models, datasets, environment setup, evaluation scripts, training guides, and data information for users to leverage these models for translation tasks.

awesome-deliberative-prompting
The 'awesome-deliberative-prompting' repository focuses on how to ask Large Language Models (LLMs) to produce reliable reasoning and make reason-responsive decisions through deliberative prompting. It includes success stories, prompting patterns and strategies, multi-agent deliberation, reflection and meta-cognition, text generation techniques, self-correction methods, reasoning analytics, limitations, failures, puzzles, datasets, tools, and other resources related to deliberative prompting. The repository provides a comprehensive overview of research, techniques, and tools for enhancing reasoning capabilities of LLMs.

unitxt
Unitxt is a customizable library for textual data preparation and evaluation tailored to generative language models. It natively integrates with common libraries like HuggingFace and LM-eval-harness and deconstructs processing flows into modular components, enabling easy customization and sharing between practitioners. These components encompass model-specific formats, task prompts, and many other comprehensive dataset processing definitions. The Unitxt-Catalog centralizes these components, fostering collaboration and exploration in modern textual data workflows. Beyond being a tool, Unitxt is a community-driven platform, empowering users to build, share, and advance their pipelines collaboratively.
For similar tasks

tt-forge
TT-Forge is Tenstorrent's MLIR-based compiler that integrates into various compiler technologies from AI/ML frameworks to enable running models and custom kernel generation. It aims to provide abstraction of different frontend frameworks, compile various model architectures with good performance, and abstract all Tenstorrent device architectures. The repository serves as the central hub for the tt-forge compiler project, bringing together sub-projects into a cohesive product with releases, demos, model support, roadmaps, and key resources. Users can explore the documentation for individual front ends to get started running tests and demos.

tt-metal
TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

mscclpp
MSCCL++ is a GPU-driven communication stack for scalable AI applications. It provides a highly efficient and customizable communication stack for distributed GPU applications. MSCCL++ redefines inter-GPU communication interfaces, delivering a highly efficient and customizable communication stack for distributed GPU applications. Its design is specifically tailored to accommodate diverse performance optimization scenarios often encountered in state-of-the-art AI applications. MSCCL++ provides communication abstractions at the lowest level close to hardware and at the highest level close to application API. The lowest level of abstraction is ultra light weight which enables a user to implement logics of data movement for a collective operation such as AllReduce inside a GPU kernel extremely efficiently without worrying about memory ordering of different ops. The modularity of MSCCL++ enables a user to construct the building blocks of MSCCL++ in a high level abstraction in Python and feed them to a CUDA kernel in order to facilitate the user's productivity. MSCCL++ provides fine-grained synchronous and asynchronous 0-copy 1-sided abstracts for communication primitives such as `put()`, `get()`, `signal()`, `flush()`, and `wait()`. The 1-sided abstractions allows a user to asynchronously `put()` their data on the remote GPU as soon as it is ready without requiring the remote side to issue any receive instruction. This enables users to easily implement flexible communication logics, such as overlapping communication with computation, or implementing customized collective communication algorithms without worrying about potential deadlocks. Additionally, the 0-copy capability enables MSCCL++ to directly transfer data between user's buffers without using intermediate internal buffers which saves GPU bandwidth and memory capacity. MSCCL++ provides consistent abstractions regardless of the location of the remote GPU (either on the local node or on a remote node) or the underlying link (either NVLink/xGMI or InfiniBand). This simplifies the code for inter-GPU communication, which is often complex due to memory ordering of GPU/CPU read/writes and therefore, is error-prone.

mlir-air
This repository contains tools and libraries for building AIR platforms, runtimes and compilers.

free-for-life
A massive list including a huge amount of products and services that are completely free! ⭐ Star on GitHub • 🤝 Contribute # Table of Contents * APIs, Data & ML * Artificial Intelligence * BaaS * Code Editors * Code Generation * DNS * Databases * Design & UI * Domains * Email * Font * For Students * Forms * Linux Distributions * Messaging & Streaming * PaaS * Payments & Billing * SSL

AIMr
AIMr is an AI aimbot tool written in Python that leverages modern technologies to achieve an undetected system with a pleasing appearance. It works on any game that uses human-shaped models. To optimize its performance, users should build OpenCV with CUDA. For Valorant, additional perks in the Discord and an Arduino Leonardo R3 are required.

aika
AIKA (Artificial Intelligence for Knowledge Acquisition) is a new type of artificial neural network designed to mimic the behavior of a biological brain more closely and bridge the gap to classical AI. The network conceptually separates activations from neurons, creating two separate graphs to represent acquired knowledge and inferred information. It uses different types of neurons and synapses to propagate activation values, binding signals, causal relations, and training gradients. The network structure allows for flexible topology and supports the gradual population of neurons and synapses during training.

nextpy
Nextpy is a cutting-edge software development framework optimized for AI-based code generation. It provides guardrails for defining AI system boundaries, structured outputs for prompt engineering, a powerful prompt engine for efficient processing, better AI generations with precise output control, modularity for multiplatform and extensible usage, developer-first approach for transferable knowledge, and containerized & scalable deployment options. It offers 4-10x faster performance compared to Streamlit apps, with a focus on cooperation within the open-source community and integration of key components from various projects.
For similar jobs

Qwen-TensorRT-LLM
Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.

dl_model_infer
This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.

joliGEN
JoliGEN is an integrated framework for training custom generative AI image-to-image models. It implements GAN, Diffusion, and Consistency models for various image translation tasks, including domain and style adaptation with conservation of semantics. The tool is designed for real-world applications such as Controlled Image Generation, Augmented Reality, Dataset Smart Augmentation, and Synthetic to Real transforms. JoliGEN allows for fast and stable training with a REST API server for simplified deployment. It offers a wide range of options and parameters with detailed documentation available for models, dataset formats, and data augmentation.

ai-edge-torch
AI Edge Torch is a Python library that supports converting PyTorch models into a .tflite format for on-device applications on Android, iOS, and IoT devices. It offers broad CPU coverage with initial GPU and NPU support, closely integrating with PyTorch and providing good coverage of Core ATen operators. The library includes a PyTorch converter for model conversion and a Generative API for authoring mobile-optimized PyTorch Transformer models, enabling easy deployment of Large Language Models (LLMs) on mobile devices.

awesome-RK3588
RK3588 is a flagship 8K SoC chip by Rockchip, integrating Cortex-A76 and Cortex-A55 cores with NEON coprocessor for 8K video codec. This repository curates resources for developing with RK3588, including official resources, RKNN models, projects, development boards, documentation, tools, and sample code.

cl-waffe2
cl-waffe2 is an experimental deep learning framework in Common Lisp, providing fast, systematic, and customizable matrix operations, reverse mode tape-based Automatic Differentiation, and neural network model building and training features accelerated by a JIT Compiler. It offers abstraction layers, extensibility, inlining, graph-level optimization, visualization, debugging, systematic nodes, and symbolic differentiation. Users can easily write extensions and optimize their networks without overheads. The framework is designed to eliminate barriers between users and developers, allowing for easy customization and extension.

TensorRT-Model-Optimizer
The NVIDIA TensorRT Model Optimizer is a library designed to quantize and compress deep learning models for optimized inference on GPUs. It offers state-of-the-art model optimization techniques including quantization and sparsity to reduce inference costs for generative AI models. Users can easily stack different optimization techniques to produce quantized checkpoints from torch or ONNX models. The quantized checkpoints are ready for deployment in inference frameworks like TensorRT-LLM or TensorRT, with planned integrations for NVIDIA NeMo and Megatron-LM. The tool also supports 8-bit quantization with Stable Diffusion for enterprise users on NVIDIA NIM. Model Optimizer is available for free on NVIDIA PyPI, and this repository serves as a platform for sharing examples, GPU-optimized recipes, and collecting community feedback.

depthai
This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.