dl_model_infer

🚀🚀🚀This is an AI high-performance reasoning C++ library, Currently supports the deployment of yolov5, yolov7, yolov7-pose, yolov8, yolov8-seg, yolov8-pose, yolov8-obb, yolox, RTDETR, DETR, depth-anything, yolop, yolopv2, SMOKE, yolov9 and other models.🚀🚀🚀

Stars: 87

Visit

This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.

README:

dl_model_infer

Introduce

This project is modified based on the AIInfer, Tanks for this project.

This is a c++ version of the AI reasoning library. Currently, it only supports the reasoning of the tensorrt model. The follow-up plan supports the c++ reasoning of frameworks such as Openvino, NCNN, and MNN. There are two versions for pre- and post-processing, c++ version and cuda version. It is recommended to use the cuda version., This repository provides accelerated deployment cases of deep learning CV popular models, and cuda c supports dynamic-batch image process, infer, decode, NMS.

Update

2023.05.27 update yolov5、yolov7、yolov8、yolox
2023.05.28 update rt_detr
2023.06.01 update yolov8_seg、yolov8_pose
2023.06.09 update yolov7_cutoff
2023.06.14 update yolov7-pose
2023.06.15 Adding Producer-Consumer Inference Model for yolov8-det
2023.06.24 update 3D objection detection algorithm smoke
2023.09.06 update deploy for detr in mmdetection
2024.01.26 update yolov8-obb
2024.02.06 update depth-anything
2024.02.12 update yolop & yolopv2
2024.02.26 update yolov9

Environment

The following environments have been tested：

ubuntu16.04
cuda11.1
cudnn8.6.0
TensorRT-8.5.1.7
gcc5.4.0
cmake-3.24.0
opencv-4.5.5
Eigen3
yaml

You can also use docker, How to use it is as follows:

docker pull longxiaowyh/dl_model_infer:v1.0
nvidia-docker run -itu root:root --name dl_model_infer --gpus all -v /your_path:/target_path -v /tmp/.X11-unix/:/tmp/.X11-unix/ -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE  -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility --shm-size=64g longxiaowyh/dl_model_infer:v1.0 /bin/bash

Model Export Tutorial

RT-DETR model export tutorial
- https://zhuanlan.zhihu.com/p/623794029
Yolov8 model export tutorial
- https://github.com/yhwang-hub/dl_model_infer/tree/master/application/yolov8_app/README.md
Yolov5 model export tutorial
- https://github.com/ultralytics/yolov5
Yolov7 model export tutorial
- https://github.com/WongKinYiu/yolov7
yolov7_cutoff model export tutorial
- https://github.com/yhwang-hub/dl_model_deploy/tree/master/yolov7_cutoff_TensorRT
yolov7-pose model export tutorial
- https://github.com/yhwang-hub/dl_model_infer/tree/master/application/yolov7_pose_app
smoke model export tutorial
- https://github.com/yhwang-hub/dl_model_infer/blob/dev/application/smoke_det_app/README.md
DETR model export tutorial
- Put the workspaces/detr_pytorch2onnx.py file under the mmdetection path.
- Modify the config_file and checkpoint_file paths in the detr_pytorch2onnx.py file.
- Use the detr_pytorch2onnx.py file to generate onnx file.
- Use trtexec to generate engine files.
DepthAnything model export tutorial
- https://github.com/spacewalk01/depth-anything-tensorrt
- https://github.com/daniel89710/trt-depth-anything
YOLOP model export tutorial
- Refer to the workspace/yolop_model_compile.sh file
Yolov9 model export tutorial
- https://github.com/yhwang-hub/dl_model_infer/tree/master/application/yolov9_app/README.md

Use of CPM (wrapping the inference as producer-consumer)

cpm.hpp Producer-consumer model
- For direct inference tasks, cpm.hpp can be turned into an automatic multi-batch producer-consumer model

cpm::Instance<BoxArray, Image, yolov8_detector> cpmi;
auto result_futures = cpmi.commits(yoloimages);
for (int ib = 0; ib < result_futures.size(); ++ib)
{
    auto objs = result_futures[ib].get();
    auto image = images[ib].clone();
    for (auto& obj : objs)
    {
        process....
    }
}

Quick Start

Take yolov8 target detection as an example，modify CMakeLists.txt and run the command below：

git clone [email protected]:yhwang-hub/dl_model_infer.git
cd dl_model_infer
mkdir build && cd build
cmake .. && make
cd ../workspaces
./infer -f yolov8n.transd.trt -i res/dog.jpg -b 10 -c 10 -o cuda_res -t yolov8_det

You can also use a script to execute，The above instructions are written in the compile_and_run.sh script，for example：

rm -rf build && mkdir -p build && cd build && cmake ..  && make -j9 && cd ..
# mkdir -p build && cd build && cmake ..  && make -j48 && cd ..
cd workspaces

rm -rf cuda_res/*

# ./infer -f yolov8n.transd.trt -i res/dog.jpg -b 10 -c 10 -o cuda_res -t yolov8_det

cd ..

Then execute the following command to run

cd dl_model_infer
bash compile_and_run.sh

Project directory introduction

AiInfer
   |--application # Implementation of model inference application, your own model inference can be implemented in this directory
     |--yolov8_det_app # Example: A yolov8 detection implemented
     |--xxxx
   |--utils # tools directory
     |--backend # here implements the reasoning class of backend
     |--common # There are some commonly used tools in it
       |--arg_parsing.h # Command line parsing class, similar to python's argparse
       |--cuda_utils.h # There are some common tool functions of cuda in it
       |--cv_cpp_utils.h # There are some cv-related utility functions in it
       |--memory.h # Tools related to cpu and gpu memory application and release
       |--model_info.h # Commonly used parameter definitions for pre- and post-processing of the model, such as mean variance, nms threshold, etc.
       |--utils.h # Commonly used tool functions in cpp, timing, mkdir, etc.
       |--cpm.h # Producer-Consumer Inference Model
     |--post_process # Post-processing implementation directory, cuda post-processing acceleration, if you have custom post-processing, you can also write it here
     |--pre_process # pre-processing implementation directory, cuda pre-processing acceleration, if you have custom pre-processing can also be written here
     |--tracker # This is the implementation of the target detection and tracking library, which has been decoupled and can be deleted directly if you don’t want to use it
   |--workspaces # Working directory, where you can put some test pictures/videos, models, and then directly use the relative path in main.cpp
   |--mains # This is the collection of main.cpp, where each app corresponds to a main file, which is easy to understand, and it is too redundant to write together
   |--main.cpp # Project entry

Speed Test

Tested on Jetson Orin, the test includes the entire process (image preprocessing + model inference + post-processing decoding)、

Model	Precision	Resolution	FPS(bs=1)
rtdetr_r50	FP16	640x640	19
yolov8n	FP16	640x640	126
yolov8n-seg	FP16	640x640	92
yolov8s-pose	FP16	640x640	58
yolov8s-obb	FP16	1024x1024	38
yolov5s	FP16	640x640	92
yolov7	FP16	640x640	34
yolov7_cutoff	FP16	640x640	32
yolov7-w6-pose	FP16	960x960	22
yolox_s	FP16	640x640	91
detr	FP16	800x1190	18
depth_anything_vits14	FP16	518x518	19
yolop	FP16	640x640	40
yolopv2	FP16	480x640	26

onnx downloads

model	baiduyun
yolov5	链接: https://pan.baidu.com/s/1Bwwo8--JS8Vkw6METz2dIw 提取码: 47ax
yolov7	链接: https://pan.baidu.com/s/1gb0W177xhnrseJF6CfdMEA 提取码: rvg5
yolov7_cutoff	链接: https://pan.baidu.com/s/16bKgt_DWNmk26q-utLyCfA 提取码: q7kf
yolov8	链接: https://pan.baidu.com/s/18Cm-tN21cus3XyirqLE_eg 提取码: j8br
yolov8-seg	链接: https://pan.baidu.com/s/1s2Gp_Jedhi9-p_Z2utJV-Q 提取码: wr5t
yolov8-pose	链接: https://pan.baidu.com/s/1lP8kiKu2a6h_FAZSSkhUgg 提取码: 7p6a
yolox	链接: https://pan.baidu.com/s/1U0gzW_YMbvNMtKzo4_cluA 提取码: 4xct
rt-detr	链接: https://pan.baidu.com/s/1Ft0-ewuCTK2BxTS1q1Vdtw 提取码: ekms
yolov7-pose	链接: https://pan.baidu.com/s/1uI6u5oKDrnroluQufIWF7w 提取码: ed9r
DETR	链接: https://pan.baidu.com/s/1_PQVPKy0QiFWJaB7HhyTSg 提取码: j7fs
yolov8-obb	链接: https://pan.baidu.com/s/1bMZuZPtTNjo5tl5heOdJRw 提取码: fudj
Depth-Anything	链接: https://pan.baidu.com/s/1ZiE1AVvpB6owND5wEqHwMw 提取码: cp97
YOLOP	链接: https://pan.baidu.com/s/1Q0itb-TMoYpx27x0stwUag 提取码: rc3p
YOLOPV2	链接: https://pan.baidu.com/s/19sNBrwIx2TAD7iOJtZrDuA 提取码: 15vm
YOLOV9	链接: https://pan.baidu.com/s/1Cotnig9BgeJt7gAW-Vy8Pg 提取码: n4hm

Reference

Thanks for the following items

For Tasks:

Click tags to check more tools for each tasks

infer models deploy deep learning models process images decode models track targets

For Jobs:

machine learning engineer computer vision engineer deep learning researcher ai software developer data scientist

Alternative AI tools for dl_model_infer

Similar Open Source Tools

dl_model_infer

github

: 87

Groma

Groma is a grounded multimodal assistant that excels in region understanding and visual grounding. It can process user-defined region inputs and generate contextually grounded long-form responses. The tool presents a unique paradigm for multimodal large language models, focusing on visual tokenization for localization. Groma achieves state-of-the-art performance in referring expression comprehension benchmarks. The tool provides pretrained model weights and instructions for data preparation, training, inference, and evaluation. Users can customize training by starting from intermediate checkpoints. Groma is designed to handle tasks related to detection pretraining, alignment pretraining, instruction finetuning, instruction following, and more.

github

: 374

SemanticFinder

SemanticFinder is a frontend-only live semantic search tool that calculates embeddings and cosine similarity client-side using transformers.js and SOTA embedding models from Huggingface. It allows users to search through large texts like books with pre-indexed examples, customize search parameters, and offers data privacy by keeping input text in the browser. The tool can be used for basic search tasks, analyzing texts for recurring themes, and has potential integrations with various applications like wikis, chat apps, and personal history search. It also provides options for building browser extensions and future ideas for further enhancements and integrations.

github

: 204

imodels

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. _For interpretability in NLP, check out our new package:imodelsX _

github

: 1.3k

RobustVLM

This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.

github

: 58

pipeline

Pipeline is a Python library designed for constructing computational flows for AI/ML models. It supports both development and production environments, offering capabilities for inference, training, and finetuning. The library serves as an interface to Mystic, enabling the execution of pipelines at scale and on enterprise GPUs. Users can also utilize this SDK with Pipeline Core on a private hosted cluster. The syntax for defining AI/ML pipelines is reminiscent of sessions in Tensorflow v1 and Flows in Prefect.

github

: 121

pr-agent

PR-Agent is a tool that helps to efficiently review and handle pull requests by providing AI feedbacks and suggestions. It supports various commands such as generating PR descriptions, providing code suggestions, answering questions about the PR, and updating the CHANGELOG.md file. PR-Agent can be used via CLI, GitHub Action, GitHub App, Docker, and supports multiple git providers and models. It emphasizes real-life practical usage, with each tool having a single GPT-4 call for quick and affordable responses. The PR Compression strategy enables effective handling of both short and long PRs, while the JSON prompting strategy allows for modular and customizable tools. PR-Agent Pro, the hosted version by CodiumAI, provides additional benefits such as full management, improved privacy, priority support, and extra features.

github

: 5.6k

floneum

Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.

github

: 1.1k

COLD-Attack

COLD-Attack is a framework designed for controllable jailbreaks on large language models (LLMs). It formulates the controllable attack generation problem and utilizes the Energy-based Constrained Decoding with Langevin Dynamics (COLD) algorithm to automate the search of adversarial LLM attacks with control over fluency, stealthiness, sentiment, and left-right-coherence. The framework includes steps for energy function formulation, Langevin dynamics sampling, and decoding process to generate discrete text attacks. It offers diverse jailbreak scenarios such as fluent suffix attacks, paraphrase attacks, and attacks with left-right-coherence.

github

: 84

llm-datasets

LLM Datasets is a repository containing high-quality datasets, tools, and concepts for LLM fine-tuning. It provides datasets with characteristics like accuracy, diversity, and complexity to train large language models for various tasks. The repository includes datasets for general-purpose, math & logic, code, conversation & role-play, and agent & function calling domains. It also offers guidance on creating high-quality datasets through data deduplication, data quality assessment, data exploration, and data generation techniques.

github

: 1.5k

amber-train

Amber is the first model in the LLM360 family, an initiative for comprehensive and fully open-sourced LLMs. It is a 7B English language model with the LLaMA architecture. The model type is a language model with the same architecture as LLaMA-7B. It is licensed under Apache 2.0. The resources available include training code, data preparation, metrics, and fully processed Amber pretraining data. The model has been trained on various datasets like Arxiv, Book, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia. The hyperparameters include a total of 6.7B parameters, hidden size of 4096, intermediate size of 11008, 32 attention heads, 32 hidden layers, RMSNorm ε of 1e^-6, max sequence length of 2048, and a vocabulary size of 32000.

github

: 136

rubra

Rubra is a collection of open-weight large language models enhanced with tool-calling capability. It allows users to call user-defined external tools in a deterministic manner while reasoning and chatting, making it ideal for agentic use cases. The models are further post-trained to teach instruct-tuned models new skills and mitigate catastrophic forgetting. Rubra extends popular inferencing projects for easy use, enabling users to run the models easily.

github

: 135

inference

Xorbits Inference (Xinference) is a powerful and versatile library designed to serve language, speech recognition, and multimodal models. With Xorbits Inference, you can effortlessly deploy and serve your or state-of-the-art built-in models using just a single command. Whether you are a researcher, developer, or data scientist, Xorbits Inference empowers you to unleash the full potential of cutting-edge AI models.

github

: 4.8k

Awesome-LLM-Large-Language-Models-Notes

Awesome-LLM-Large-Language-Models-Notes is a repository that provides a comprehensive collection of information on various Large Language Models (LLMs) classified by year, size, and name. It includes details on known LLM models, their papers, implementations, and specific characteristics. The repository also covers LLM models classified by architecture, must-read papers, blog articles, tutorials, and implementations from scratch. It serves as a valuable resource for individuals interested in understanding and working with LLMs in the field of Natural Language Processing (NLP).

github

: 144

dora

Dataflow-oriented robotic application (dora-rs) is a framework that makes creation of robotic applications fast and simple. Building a robotic application can be summed up as bringing together hardwares, algorithms, and AI models, and make them communicate with each others. At dora-rs, we try to: make integration of hardware and software easy by supporting Python, C, C++, and also ROS2. make communication low latency by using zero-copy Arrow messages. dora-rs is still experimental and you might experience bugs, but we're working very hard to make it stable as possible.

github

: 1.5k

langchain_dart

LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

github

: 391

For similar tasks

paxml

Pax is a framework to configure and run machine learning experiments on top of Jax.

github

: 441

dl_model_infer

github

: 87

local_multimodal_ai_chat

Local Multimodal AI Chat is a hands-on project that teaches you how to build a multimodal chat application. It integrates different AI models to handle audio, images, and PDFs in a single chat interface. This project is perfect for anyone interested in AI and software development who wants to gain practical experience with these technologies.

github

: 111

spandrel

Spandrel is a library for loading and running pre-trained PyTorch models. It automatically detects the model architecture and hyperparameters from model files, and provides a unified interface for running models.

github

: 107

openai-kotlin

OpenAI Kotlin API client is a Kotlin client for OpenAI's API with multiplatform and coroutines capabilities. It allows users to interact with OpenAI's API using Kotlin programming language. The client supports various features such as models, chat, images, embeddings, files, fine-tuning, moderations, audio, assistants, threads, messages, and runs. It also provides guides on getting started, chat & function call, file source guide, and assistants. Sample apps are available for reference, and troubleshooting guides are provided for common issues. The project is open-source and licensed under the MIT license, allowing contributions from the community.

github

: 1.4k

ai-devices

AI Devices Template is a project that serves as an AI-powered voice assistant utilizing various AI models and services to provide intelligent responses to user queries. It supports voice input, transcription, text-to-speech, image processing, and function calling with conditionally rendered UI components. The project includes customizable UI settings, optional rate limiting using Upstash, and optional tracing with Langchain's LangSmith for function execution. Users can clone the repository, install dependencies, add API keys, start the development server, and deploy the application. Configuration settings can be modified in `app/config.tsx` to adjust settings and configurations for the AI-powered voice assistant.

github

: 116

ComfyUI-BRIA_AI-RMBG

ComfyUI-BRIA_AI-RMBG is an unofficial implementation of the BRIA Background Removal v1.4 model for ComfyUI. The tool supports batch processing, including video background removal, and introduces a new mask output feature. Users can install the tool using ComfyUI Manager or manually by cloning the repository. The tool includes nodes for automatically loading the Removal v1.4 model and removing backgrounds. Updates include support for batch processing and the addition of a mask output feature.

github

: 519

easyAi

EasyAi is a lightweight, beginner-friendly Java artificial intelligence algorithm framework. It can be seamlessly integrated into Java projects with Maven, requiring no additional environment configuration or dependencies. The framework provides pre-packaged modules for image object detection and AI customer service, as well as various low-level algorithm tools for deep learning, machine learning, reinforcement learning, heuristic learning, and matrix operations. Developers can easily develop custom micro-models tailored to their business needs.

github

: 52

For similar jobs

Qwen-TensorRT-LLM

Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.

github

: 484

dl_model_infer

github

: 87

joliGEN

JoliGEN is an integrated framework for training custom generative AI image-to-image models. It implements GAN, Diffusion, and Consistency models for various image translation tasks, including domain and style adaptation with conservation of semantics. The tool is designed for real-world applications such as Controlled Image Generation, Augmented Reality, Dataset Smart Augmentation, and Synthetic to Real transforms. JoliGEN allows for fast and stable training with a REST API server for simplified deployment. It offers a wide range of options and parameters with detailed documentation available for models, dataset formats, and data augmentation.

github

: 238

ai-edge-torch

AI Edge Torch is a Python library that supports converting PyTorch models into a .tflite format for on-device applications on Android, iOS, and IoT devices. It offers broad CPU coverage with initial GPU and NPU support, closely integrating with PyTorch and providing good coverage of Core ATen operators. The library includes a PyTorch converter for model conversion and a Generative API for authoring mobile-optimized PyTorch Transformer models, enabling easy deployment of Large Language Models (LLMs) on mobile devices.

github

: 279

awesome-RK3588

RK3588 is a flagship 8K SoC chip by Rockchip, integrating Cortex-A76 and Cortex-A55 cores with NEON coprocessor for 8K video codec. This repository curates resources for developing with RK3588, including official resources, RKNN models, projects, development boards, documentation, tools, and sample code.

github

: 106

cl-waffe2

cl-waffe2 is an experimental deep learning framework in Common Lisp, providing fast, systematic, and customizable matrix operations, reverse mode tape-based Automatic Differentiation, and neural network model building and training features accelerated by a JIT Compiler. It offers abstraction layers, extensibility, inlining, graph-level optimization, visualization, debugging, systematic nodes, and symbolic differentiation. Users can easily write extensions and optimize their networks without overheads. The framework is designed to eliminate barriers between users and developers, allowing for easy customization and extension.

github

: 119

TensorRT-Model-Optimizer

The NVIDIA TensorRT Model Optimizer is a library designed to quantize and compress deep learning models for optimized inference on GPUs. It offers state-of-the-art model optimization techniques including quantization and sparsity to reduce inference costs for generative AI models. Users can easily stack different optimization techniques to produce quantized checkpoints from torch or ONNX models. The quantized checkpoints are ready for deployment in inference frameworks like TensorRT-LLM or TensorRT, with planned integrations for NVIDIA NeMo and Megatron-LM. The tool also supports 8-bit quantization with Stable Diffusion for enterprise users on NVIDIA NIM. Model Optimizer is available for free on NVIDIA PyPI, and this repository serves as a platform for sharing examples, GPU-optimized recipes, and collecting community feedback.

github

: 438

depthai

This repository contains a demo application for DepthAI, a tool that can load different networks, create pipelines, record video, and more. It provides documentation for installation and usage, including running programs through Docker. Users can explore DepthAI features via command line arguments or a clickable QT interface. Supported models include various AI models for tasks like face detection, human pose estimation, and object detection. The tool collects anonymous usage statistics by default, which can be disabled. Users can report issues to the development team for support and troubleshooting.

github

: 876