executorch
On-device AI across mobile, embedded and edge for PyTorch
Stars: 1929
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. Key value propositions of ExecuTorch are: * **Portability:** Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. * **Productivity:** Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. * **Performance:** Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.
README:
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
Key value propositions of ExecuTorch are:
- Portability: Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers.
- Productivity: Enabling developers to use the same toolchains and Developer Tools from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms.
- Performance: Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.
For a comprehensive technical overview of ExecuTorch and step-by-step tutorials, please visit our documentation website for the latest release (or the main branch).
Check out the Getting Started page for a quick spin.
Check out the examples of Llama, Llava and other models running on edge devices using ExecuTorch.
[UPDATE - 09/25] We have added support for running Llama 3.2 1B/3B models via ExecuTorch.
We welcome any feedback, suggestions, and bug reports from the community to help us improve our technology. Please use the PyTorch Forums for discussion and feedback about ExecuTorch using the ExecuTorch category, and our GitHub repository for bug reporting.
We recommend using the latest release tag from the Releases page when developing.
See CONTRIBUTING.md for details about issues, PRs, code style, CI jobs, and other development topics.
executorch
├── backends # Backend delegate implementations.
├── build # Utilities for managing the build system.
├── codegen # Tooling to autogenerate bindings between kernels and the runtime.
├── configurations
├── docs # Static docs tooling.
├── examples # Examples of various user flows, such as model export, delegates, and runtime execution.
├── exir # Ahead-of-time library: model capture and lowering APIs.
| ├── _serialize # Serialize final export artifact.
| ├── backend # Backend delegate ahead of time APIs
| ├── capture # Program capture.
| ├── dialects # Op sets for various dialects in the export process.
| ├── emit # Conversion from ExportedProgram to ExecuTorch execution instructions.
| ├── operator # Operator node manipulation utilities.
| ├── passes # Built-in compiler passes.
| ├── program # Export artifacts.
| ├── serde # Graph module
serialization/deserialization.
| ├── verification # IR verification.
├── extension # Extensions built on top of the runtime.
| ├── android # ExecuTorch wrappers for Android apps.
| ├── apple # ExecuTorch wrappers for iOS apps.
| ├── aten_util # Converts to and from PyTorch ATen types.
| ├── data_loader # 1st party data loader implementations.
| ├── evalue_util # Helpers for working with EValue objects.
| ├── gguf_util # Tools to convert from the GGUF format.
| ├── kernel_util # Helpers for registering kernels.
| ├── memory_allocator # 1st party memory allocator implementations.
| ├── module # A simplified C++ wrapper for the runtime.
| ├── parallel # C++ threadpool integration.
| ├── pybindings # Python API for executorch runtime.
| ├── pytree # C++ and Python flattening and unflattening lib for pytrees.
| ├── runner_util # Helpers for writing C++ PTE-execution
tools.
| ├── testing_util # Helpers for writing C++ tests.
| ├── training # Experimental libraries for on-device training
├── kernels # 1st party kernel implementations.
| ├── aten
| ├── optimized
| ├── portable # Reference implementations of ATen operators.
| ├── prim_ops # Special ops used in executorch runtime for control flow and symbolic primitives.
| ├── quantized
├── profiler # Utilities for profiling runtime execution.
├── runtime # Core C++ runtime.
| ├── backend # Backend delegate runtime APIs.
| ├── core # Core structures used across all levels of the runtime.
| ├── executor # Model loading, initalization, and execution.
| ├── kernel # Kernel registration and management.
| ├── platform # Layer between architecture specific code and portable C++.
├── schema # ExecuTorch PTE file format flatbuffer
schemas.
├── scripts # Utility scripts for size management, dependency management, etc.
├── devtools # Model profiling, debugging, and introspection.
├── shim # Compatibility layer between OSS and Internal builds
├── test # Broad scoped end-to-end tests.
├── third-party # Third-party dependencies.
├── util # Various helpers and scripts.
ExecuTorch is BSD licensed, as found in the LICENSE file.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for executorch
Similar Open Source Tools
executorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. Key value propositions of ExecuTorch are: * **Portability:** Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. * **Productivity:** Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. * **Performance:** Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.
awesome-algorand
Awesome Algorand is a curated list of resources related to the Algorand Blockchain, including official resources, wallets, blockchain explorers, portfolio trackers, learning resources, development tools, DeFi platforms, nodes & consensus participation, subscription management, security auditing services, blockchain bridges, oracles, name services, community resources, Algorand Request for Comments, metrics and analytics services, decentralized voting tools, and NFT marketplaces. The repository provides a comprehensive collection of tools, tutorials, protocols, and platforms for developers, users, and enthusiasts interested in the Algorand ecosystem.
InvokeAI
InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.
agentUniverse
agentUniverse is a framework for developing applications powered by multi-agent based on large language model. It provides essential components for building single agent and multi-agent collaboration mechanism for customizing collaboration patterns. Developers can easily construct multi-agent applications and share pattern practices from different fields. The framework includes pre-installed collaboration patterns like PEER and DOE for complex task breakdown and data-intensive tasks.
GenAI_Agents
GenAI Agents is a comprehensive repository for developing and implementing Generative AI (GenAI) agents, ranging from simple conversational bots to complex multi-agent systems. It serves as a valuable resource for learning, building, and sharing GenAI agents, offering tutorials, implementations, and a platform for showcasing innovative agent creations. The repository covers a wide range of agent architectures and applications, providing step-by-step tutorials, ready-to-use implementations, and regular updates on advancements in GenAI technology.
swirl-search
Swirl is an open-source software that allows users to simultaneously search multiple content sources and receive AI-ranked results. It connects to various data sources, including databases, public data services, and enterprise sources, and utilizes AI and LLMs to generate insights and answers based on the user's data. Swirl is easy to use, requiring only the download of a YML file, starting in Docker, and searching with Swirl. Users can add credentials to preloaded SearchProviders to access more sources. Swirl also offers integration with ChatGPT as a configured AI model. It adapts and distributes user queries to anything with a search API, re-ranking the unified results using Large Language Models without extracting or indexing anything. Swirl includes five Google Programmable Search Engines (PSEs) to get users up and running quickly. Key features of Swirl include Microsoft 365 integration, SearchProvider configurations, query adaptation, synchronous or asynchronous search federation, optional subscribe feature, pipelining of Processor stages, results stored in SQLite3 or PostgreSQL, built-in Query Transformation support, matching on word stems and handling of stopwords, duplicate detection, re-ranking of unified results using Cosine Vector Similarity, result mixers, page through all results requested, sample data sets, optional spell correction, optional search/result expiration service, easily extensible Connector and Mixer objects, and a welcoming community for collaboration and support.
awesome-gpt-security
Awesome GPT + Security is a curated list of awesome security tools, experimental case or other interesting things with LLM or GPT. It includes tools for integrated security, auditing, reconnaissance, offensive security, detecting security issues, preventing security breaches, social engineering, reverse engineering, investigating security incidents, fixing security vulnerabilities, assessing security posture, and more. The list also includes experimental cases, academic research, blogs, and fun projects related to GPT security. Additionally, it provides resources on GPT security standards, bypassing security policies, bug bounty programs, cracking GPT APIs, and plugin security.
magpie
This is the official repository for 'Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing'. Magpie is a tool designed to synthesize high-quality instruction data at scale by extracting it directly from an aligned Large Language Models (LLMs). It aims to democratize AI by generating large-scale alignment data and enhancing the transparency of model alignment processes. Magpie has been tested on various model families and can be used to fine-tune models for improved performance on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench.
qdrant
Qdrant is a vector similarity search engine and vector database. It is written in Rust, which makes it fast and reliable even under high load. Qdrant can be used for a variety of applications, including: * Semantic search * Image search * Product recommendations * Chatbots * Anomaly detection Qdrant offers a variety of features, including: * Payload storage and filtering * Hybrid search with sparse vectors * Vector quantization and on-disk storage * Distributed deployment * Highlighted features such as query planning, payload indexes, SIMD hardware acceleration, async I/O, and write-ahead logging Qdrant is available as a fully managed cloud service or as an open-source software that can be deployed on-premises.
raycast-g4f
Raycast-G4F is a free extension that allows users to leverage powerful AI models such as GPT-4 and Llama-3 within the Raycast app without the need for an API key. The extension offers features like streaming support, diverse commands, chat interaction with AI, web search capabilities, file upload functionality, image generation, and custom AI commands. Users can easily install the extension from the source code and benefit from frequent updates and a user-friendly interface. Raycast-G4F supports various providers and models, each with different capabilities and performance ratings, ensuring a versatile AI experience for users.
pluto
Pluto is a development tool dedicated to helping developers **build cloud and AI applications more conveniently** , resolving issues such as the challenging deployment of AI applications and open-source models. Developers are able to write applications in familiar programming languages like **Python and TypeScript** , **directly defining and utilizing the cloud resources necessary for the application within their code base** , such as AWS SageMaker, DynamoDB, and more. Pluto automatically deduces the infrastructure resource needs of the app through **static program analysis** and proceeds to create these resources on the specified cloud platform, **simplifying the resources creation and application deployment process**.
twinny
Twinny is a free and open-source AI code completion plugin for Visual Studio Code and compatible editors. It integrates with various tools and frameworks, including Ollama, llama.cpp, oobabooga/text-generation-webui, LM Studio, LiteLLM, and Open WebUI. Twinny offers features such as fill-in-the-middle code completion, chat with AI about your code, customizable API endpoints, and support for single or multiline fill-in-middle completions. It is easy to install via the Visual Studio Code extensions marketplace and provides a range of customization options. Twinny supports both online and offline operation and conforms to the OpenAI API standard.
generative-bi-using-rag
Generative BI using RAG on AWS is a comprehensive framework designed to enable Generative BI capabilities on customized data sources hosted on AWS. It offers features such as Text-to-SQL functionality for querying data sources using natural language, user-friendly interface for managing data sources, performance enhancement through historical question-answer ranking, and entity recognition. It also allows customization of business information, handling complex attribution analysis problems, and provides an intuitive question-answering UI with a conversational approach for complex queries.
CogVideo
CogVideo is an open-source repository that provides pretrained text-to-video models for generating videos based on input text. It includes models like CogVideoX-2B and CogVideo, offering powerful video generation capabilities. The repository offers tools for inference, fine-tuning, and model conversion, along with demos showcasing the model's capabilities through CLI, web UI, and online experiences. CogVideo aims to facilitate the creation of high-quality videos from textual descriptions, catering to a wide range of applications.
obsidian-pieces
Pieces for Developers is a closed-source Obsidian plugin designed to revolutionize coding workflows by incorporating key capabilities and favorite features directly into the Obsidian environment. The plugin, Pieces Copilot for Obsidian, enhances coding and problem-solving experiences by providing insights on code snippets, generating samples, and facilitating navigation through PRs. Users can capture, manage, share, and discover code snippets and developer materials with ease, bringing efficiency and organization to their coding experience.
twinny
Twinny is a free and private AI extension for Visual Studio Code that offers AI-based code completion and code discussion features. It provides real-time code suggestions, function explanations, test generation, refactoring requests, and more. Twinny operates both online and offline, supports customizable API endpoints, conforms to OpenAI API standards, and offers various customization options for prompt templates, API providers, model names, and more. It is compatible with multiple APIs and allows users to accept code solutions directly in the editor, create new documents from code blocks, and copy generated code solution blocks. Twinny is open-source under the MIT license and welcomes contributions from the community.
For similar tasks
executorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. Key value propositions of ExecuTorch are: * **Portability:** Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. * **Productivity:** Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. * **Performance:** Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.