nncase

Open deep learning compiler stack for Kendryte AI accelerators ✨

Stars: 757

Visit

nncase is a neural network compiler for AI accelerators that supports multiple inputs and outputs, static memory allocation, operators fusion and optimizations, float and quantized uint8 inference, post quantization from float model with calibration dataset, and flat model with zero copy loading. It can be installed via pip and supports TFLite, Caffe, and ONNX ops. Users can compile nncase from source using Ninja or make. The tool is suitable for tasks like image classification, object detection, image segmentation, pose estimation, and more.

README:

切换中文

nncase is a neural network compiler for AI accelerators.

Telegram: nncase community Technical Discussion QQ Group: 790699378 . Answer: 人工智能

K230

Install

Linux：
```
pip install nncase nncase-kpu
```

Windows:

1. pip install nncase
2. Download `nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl` in below link.
3. pip install nncase_kpu-2.x.x-py2.py3-none-win_amd64.whl

All version of nncase and nncase-kpu in Release.

Supported operators

benchmark test

kind	model	shape	quant_type(If/W)	nncase_fps	tflite_onnx_result	accuracy	info
Image Classification	mobilenetv2	[1,224,224,3]	u8/u8	600.24	top-1 = 71.3% top-5 = 90.1%	top-1 = 71.1% top-5 = 90.0%	dataset(ImageNet 2012, 50000 images) tflite
	resnet50V2	[1,3,224,224]	u8/u8	86.17	top-1 = 75.44% top-5 = 92.56%	top-1 = 75.11% top-5 = 92.36%	dataset(ImageNet 2012, 50000 images) onnx
	yolov8s_cls	[1,3,224,224]	u8/u8	130.497	top-1 = 72.2% top-5 = 90.9%	top-1 = 72.2% top-5 = 90.8%	dataset(ImageNet 2012, 50000 images) yolov8s_cls(v8.0.207)
Object Detection	yolov5s_det	[1,3,640,640]	u8/u8	23.645	bbox mAP50-90 = 0.374 mAP50 = 0.567	bbox mAP50-90 = 0.369 mAP50 = 0.566	dataset(coco val2017, 5000 images) yolov5s_det(v7.0 tag, rect=False, conf=0.001, iou=0.65)
Object Detection	yolov8s_det	[1,3,640,640]	u8/u8	9.373	bbox mAP50-90 = 0.446 mAP50 = 0.612 mAP75 = 0.484	bbox mAP50-90 = 0.404 mAP50 = 0.593 mAP75 = 0.45	dataset(coco val2017, 5000 images) yolov8s_det(v8.0.207, rect = False)
Image Segmentation	yolov8s_seg	[1,3,640,640]	u8/u8	7.845	bbox mAP50-90 = 0.444 mAP50 = 0.606 mAP75 = 0.484 segm mAP50-90 = 0.371 mAP50 = 0.578 mAP75 = 0.396	bbox mAP50-90 = 0.444 mAP50 = 0.606 mAP75 = 0.484 segm mAP50-90 = 0.371 mAP50 = 0.579 mAP75 = 0.397	dataset(coco val2017, 5000 images) yolov8s_seg(v8.0.207, rect = False, conf_thres = 0.0008)
Pose Estimation	yolov8n_pose_320	[1,3,320,320]	u8/u8	36.066	bbox mAP50-90 = 0.6 mAP50 = 0.843 mAP75 = 0.654 keypoints mAP50-90 = 0.358 mAP50 = 0.646 mAP75 = 0.353	bbox mAP50-90 = 0.6 mAP50 = 0.841 mAP75 = 0.656 keypoints mAP50-90 = 0.359 mAP50 = 0.648 mAP75 = 0.357	dataset(coco val2017, 2346 images) yolov8n_pose(v8.0.207, rect = False)
	yolov8n_pose_640	[1,3,640,640]	u8/u8	10.88	bbox mAP50-90 = 0.694 mAP50 = 0.909 mAP75 = 0.776 keypoints mAP50-90 = 0.509 mAP50 = 0.798 mAP75 = 0.544	bbox mAP50-90 = 0.694 mAP50 = 0.909 mAP75 = 0.777 keypoints mAP50-90 = 0.508 mAP50 = 0.798 mAP75 = 0.54	dataset(coco val2017, 2346 images) yolov8n_pose(v8.0.207, rect = False)
	yolov8s_pose	[1,3,640,640]	u8/u8	5.568	bbox mAP50-90 = 0.733 mAP50 = 0.925 mAP75 = 0.818 keypoints mAP50-90 = 0.605 mAP50 = 0.857 mAP75 = 0.666	bbox mAP50-90 = 0.734 mAP50 = 0.925 mAP75 = 0.819 keypoints mAP50-90 = 0.604 mAP50 = 0.859 mAP75 = 0.669	dataset(coco val2017, 2346 images) yolov8s_pose(v8.0.207, rect = False)

Demo

eye gaze	space_resize	face pose

K210/K510

Supported operators

Features

Supports multiple inputs and outputs and multi-branch structure
Static memory allocation, no heap memory acquired
Operators fusion and optimizations
Support float and quantized uint8 inference
Support post quantization from float model with calibration dataset
Flat model with zero copy loading

Architecture

Build from source

It is recommended to install nncase directly through pip. At present, the source code related to k510 and K230 chips is not open source, so it is not possible to use nncase-K510 and nncase-kpu (K230) directly by compiling source code.

If there are operators in your model that nncase does not yet support, you can request them in the issue or implement them yourself and submit the PR. Later versions will be integrated, or contact us to provide a temporary version. Here are the steps to compile nncase.

git clone https://github.com/kendryte/nncase.git
cd nncase
mkdir build && cd build

# Use Ninja
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install
ninja && ninja install

# Use make
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install
make && make install

Resources

Canaan developer community

Canaan developer community contains all resources related to K210, K510, and K230.

资料下载 --> Pre-compiled images available for the development boards corresponding to the three chips.
文档 --> Documents corresponding to the three chips.
模型库 --> Examples and code for industrial, security, educational and other scenarios that can be run on the K210 and K230.
模型训练 --> The model training platform for K210 and K230 supports the training of various scenarios.

Bilibili

Canaan AI tutorial and application demonstration

K210 related repo

K230 related repo

C: K230_SDK
- Documents
- K230 end-to-end tutorial
MicroPython: Canmv_k230
- Documents
- Examples

For Tasks:

Click tags to check more tools for each tasks

classify images detect objects segment images estimate poses process ai models

For Jobs:

machine learning engineer ai software developer deep learning researcher computer vision engineer data scientist

Alternative AI tools for nncase

Similar Open Source Tools

nncase

github

: 757

Liger-Kernel

Liger Kernel is a collection of Triton kernels designed for LLM training, increasing training throughput by 20% and reducing memory usage by 60%. It includes Hugging Face Compatible modules like RMSNorm, RoPE, SwiGLU, CrossEntropy, and FusedLinearCrossEntropy. The tool works with Flash Attention, PyTorch FSDP, and Microsoft DeepSpeed, aiming to enhance model efficiency and performance for researchers, ML practitioners, and curious novices.

github

: 4.8k

FalkorDB

FalkorDB is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph. Primary features: * Adopting the Property Graph Model * Nodes (vertices) and Relationships (edges) that may have attributes * Nodes can have multiple labels * Relationships have a relationship type * Graphs represented as sparse adjacency matrices * OpenCypher with proprietary extensions as a query language * Queries are translated into linear algebra expressions

github

: 929

XLICON-V2-MD

XLICON-V2-MD is a versatile Multi-Device WhatsApp bot developed by Salman Ahamed. It offers a wide range of features, making it an advanced and user-friendly bot for various purposes. The bot supports multi-device operation, AI photo enhancement, downloader commands, hidden NSFW commands, logo generation, anime exploration, economic activities, games, and audio/video editing. Users can deploy the bot on platforms like Heroku, Replit, Codespace, Okteto, Railway, Mongenius, Coolify, and Render. The bot is maintained by Salman Ahamed and Abraham Dwamena, with contributions from various developers and testers. Misusing the bot may result in a ban from WhatsApp, so users are advised to use it at their own risk.

github

: 343

cursor-free-vip

Cursor Free VIP is a tool designed to automatically bypass Cursor's membership check, upgrade to 'pro' membership, support Windows and macOS systems, send Token requests in real-time, and reset Cursor's configuration. It provides a seamless experience for users to access premium features without the need for manual upgrades or configuration changes. The tool aims to simplify the process of accessing advanced functionalities offered by Cursor, enhancing user experience and productivity.

github

: 5.4k

omnihuman

github

: 92

skpro

skpro is a library for supervised probabilistic prediction in python. It provides `scikit-learn`-like, `scikit-base` compatible interfaces to: * tabular **supervised regressors for probabilistic prediction** \- interval, quantile and distribution predictions * tabular **probabilistic time-to-event and survival prediction** \- instance-individual survival distributions * **metrics to evaluate probabilistic predictions** , e.g., pinball loss, empirical coverage, CRPS, survival losses * **reductions** to turn `scikit-learn` regressors into probabilistic `skpro` regressors, such as bootstrap or conformal * building **pipelines and composite models** , including tuning via probabilistic performance metrics * symbolic **probability distributions** with value domain of `pandas.DataFrame`-s and `pandas`-like interface

github

: 259

ASTRA.ai

Astra.ai is a multimodal agent powered by TEN, showcasing its capabilities in speech, vision, and reasoning through RAG from local documentation. It provides a platform for developing AI agents with features like RTC transportation, extension store, workflow builder, and local deployment. Users can build and test agents locally using Docker and Node.js, with prerequisites including Agora App ID, Azure's speech-to-text and text-to-speech API keys, and OpenAI API key. The platform offers advanced customization options through config files and API keys setup, enabling users to create and deploy their AI agents for various tasks.

github

: 343

actor-core

Actor-core is a lightweight and flexible library for building actor-based concurrent applications in Java. It provides a simple API for creating and managing actors, as well as handling message passing between actors. With actor-core, developers can easily implement scalable and fault-tolerant systems using the actor model.

github

: 458

IDvs.MoRec

This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.

github

: 119

llm4ad

LLM4AD is an open-source Python-based platform leveraging Large Language Models (LLMs) for Automatic Algorithm Design (AD). It provides unified interfaces for methods, tasks, and LLMs, along with features like evaluation acceleration, secure evaluation, logs, GUI support, and more. The platform was originally developed for optimization tasks but is versatile enough to be used in other areas such as machine learning, science discovery, game theory, and engineering design. It offers various search methods and algorithm design tasks across different domains. LLM4AD supports remote LLM API, local HuggingFace LLM deployment, and custom LLM interfaces. The project is licensed under the MIT License and welcomes contributions, collaborations, and issue reports.

github

: 294

ollama4j

Ollama4j is a Java library that serves as a wrapper or binding for the Ollama server. It allows users to communicate with the Ollama server and manage models for various deployment scenarios. The library provides APIs for interacting with Ollama, generating fake data, testing UI interactions, translating messages, and building web UIs. Users can easily integrate Ollama4j into their Java projects to leverage the functionalities offered by the Ollama server.

github

: 355

rwkv-qualcomm

This repository provides support for inference RWKV models on Qualcomm HTP (Hexagon Tensor Processor) using QNN SDK. It supports RWKV v5, v6, and experimentally v7 models, inference using Qualcomm CPU, GPU, or HTP as the backend, whole-model float16 inference, activation INT16 and weights INT8 quantized inference, and activation INT16 and weights INT4/INT8 mixed quantized inference. Users can convert model weights to QNN model library files, generate HTP context cache, and run inference on Qualcomm Snapdragon SM8650 with HTP v75. The project requires QNN SDK, AIMET toolkit, and specific hardware for verification.

github

: 53

EAGLE

Eagle is a family of Vision-Centric High-Resolution Multimodal LLMs that enhance multimodal LLM perception using a mix of vision encoders and various input resolutions. The model features a channel-concatenation-based fusion for vision experts with different architectures and knowledge, supporting up to over 1K input resolution. It excels in resolution-sensitive tasks like optical character recognition and document understanding.

github

: 646

LLMs-Zero-to-Hero

LLMs-Zero-to-Hero is a repository dedicated to training large language models (LLMs) from scratch, covering topics such as dense models, MOE models, pre-training, supervised fine-tuning, direct preference optimization, reinforcement learning from human feedback, and deploying large models. The repository provides detailed learning notes for different chapters, code implementations, and resources for training and deploying LLMs. It aims to guide users from being beginners to proficient in building and deploying large language models.

github

: 955

LynxHub

LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.

github

: 338

For similar tasks

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

InternVL

InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM. It is a vision-language foundation model that can perform various tasks, including: **Visual Perception** - Linear-Probe Image Classification - Semantic Segmentation - Zero-Shot Image Classification - Multilingual Zero-Shot Image Classification - Zero-Shot Video Classification **Cross-Modal Retrieval** - English Zero-Shot Image-Text Retrieval - Chinese Zero-Shot Image-Text Retrieval - Multilingual Zero-Shot Image-Text Retrieval on XTD **Multimodal Dialogue** - Zero-Shot Image Captioning - Multimodal Benchmarks with Frozen LLM - Multimodal Benchmarks with Trainable LLM - Tiny LVLM InternVL has been shown to achieve state-of-the-art results on a variety of benchmarks. For example, on the MMMU image classification benchmark, InternVL achieves a top-1 accuracy of 51.6%, which is higher than GPT-4V and Gemini Pro. On the DocVQA question answering benchmark, InternVL achieves a score of 82.2%, which is also higher than GPT-4V and Gemini Pro. InternVL is open-sourced and available on Hugging Face. It can be used for a variety of applications, including image classification, object detection, semantic segmentation, image captioning, and question answering.

github

: 6.5k

clarifai-python

The Clarifai Python SDK offers a comprehensive set of tools to integrate Clarifai's AI platform to leverage computer vision capabilities like classification , detection ,segementation and natural language capabilities like classification , summarisation , generation , Q&A ,etc into your applications. With just a few lines of code, you can leverage cutting-edge artificial intelligence to unlock valuable insights from visual and textual content.

github

: 392

X-AnyLabeling

X-AnyLabeling is a robust annotation tool that seamlessly incorporates an AI inference engine alongside an array of sophisticated features. Tailored for practical applications, it is committed to delivering comprehensive, industrial-grade solutions for image data engineers. This tool excels in swiftly and automatically executing annotations across diverse and intricate tasks.

github

: 5.2k

ailia-models

The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024

github

: 2.2k

edenai-apis

Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !

github

: 441

NanoLLM

NanoLLM is a tool designed for optimized local inference for Large Language Models (LLMs) using HuggingFace-like APIs. It supports quantization, vision/language models, multimodal agents, speech, vector DB, and RAG. The tool aims to provide efficient and effective processing for LLMs on local devices, enhancing performance and usability for various AI applications.

github

: 156

open-ai

Open AI is a powerful tool for artificial intelligence research and development. It provides a wide range of machine learning models and algorithms, making it easier for developers to create innovative AI applications. With Open AI, users can explore cutting-edge technologies such as natural language processing, computer vision, and reinforcement learning. The platform offers a user-friendly interface and comprehensive documentation to support users in building and deploying AI solutions. Whether you are a beginner or an experienced AI practitioner, Open AI offers the tools and resources you need to accelerate your AI projects and stay ahead in the rapidly evolving field of artificial intelligence.

github

: 2.1k

For similar jobs

Qwen-TensorRT-LLM

Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.

github

: 484

dl_model_infer

This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.

github

: 87

joliGEN

JoliGEN is an integrated framework for training custom generative AI image-to-image models. It implements GAN, Diffusion, and Consistency models for various image translation tasks, including domain and style adaptation with conservation of semantics. The tool is designed for real-world applications such as Controlled Image Generation, Augmented Reality, Dataset Smart Augmentation, and Synthetic to Real transforms. JoliGEN allows for fast and stable training with a REST API server for simplified deployment. It offers a wide range of options and parameters with detailed documentation available for models, dataset formats, and data augmentation.

github

: 248

ai-edge-torch

AI Edge Torch is a Python library that supports converting PyTorch models into a .tflite format for on-device applications on Android, iOS, and IoT devices. It offers broad CPU coverage with initial GPU and NPU support, closely integrating with PyTorch and providing good coverage of Core ATen operators. The library includes a PyTorch converter for model conversion and a Generative API for authoring mobile-optimized PyTorch Transformer models, enabling easy deployment of Large Language Models (LLMs) on mobile devices.

github

: 460

awesome-RK3588

RK3588 is a flagship 8K SoC chip by Rockchip, integrating Cortex-A76 and Cortex-A55 cores with NEON coprocessor for 8K video codec. This repository curates resources for developing with RK3588, including official resources, RKNN models, projects, development boards, documentation, tools, and sample code.

github

: 106

cl-waffe2

cl-waffe2 is an experimental deep learning framework in Common Lisp, providing fast, systematic, and customizable matrix operations, reverse mode tape-based Automatic Differentiation, and neural network model building and training features accelerated by a JIT Compiler. It offers abstraction layers, extensibility, inlining, graph-level optimization, visualization, debugging, systematic nodes, and symbolic differentiation. Users can easily write extensions and optimize their networks without overheads. The framework is designed to eliminate barriers between users and developers, allowing for easy customization and extension.

github

: 119

TensorRT-Model-Optimizer

The NVIDIA TensorRT Model Optimizer is a library designed to quantize and compress deep learning models for optimized inference on GPUs. It offers state-of-the-art model optimization techniques including quantization and sparsity to reduce inference costs for generative AI models. Users can easily stack different optimization techniques to produce quantized checkpoints from torch or ONNX models. The quantized checkpoints are ready for deployment in inference frameworks like TensorRT-LLM or TensorRT, with planned integrations for NVIDIA NeMo and Megatron-LM. The tool also supports 8-bit quantization with Stable Diffusion for enterprise users on NVIDIA NIM. Model Optimizer is available for free on NVIDIA PyPI, and this repository serves as a platform for sharing examples, GPU-optimized recipes, and collecting community feedback.

github

: 438

depthai

This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.

github

: 927