awesome-hpc-cuda-fpga
🔥🔥🔥 A collection of some awesome public High Performance Computing (HPC), NVIDIA CUDA, cuBLAS, TensorRT, AMD ROCm and FPGA projects.
Stars: 104
README:
🔥🔥🔥 This repository lists some awesome public High Performance Computing (HPC), NVIDIA CUDA, cuBLAS, TensorRT, AMD ROCm and FPGA projects.
-
Awesome-HPC-CUDA-FPGA
- Contents
- Awesome List
- Learning Resources
- Frameworks
- Applications
- Blogs
- Videos
- Jobs and Interview
-
-
codingonion/awesome-hpc-cuda-fpga : A collection of some awesome public High Performance Computing (HPC), NVIDIA CUDA, cuBLAS, TensorRT, AMD ROCm and FPGA projects.
-
Erkaman/Awesome-CUDA : This is a list of useful libraries and resources for CUDA development.
-
jslee02/awesome-gpgpu : 😎 A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources.
-
mikeroyal/CUDA-Guide : A guide covering CUDA including the applications and tools that will make you a better and more efficient CUDA developer.
-
tensorush/gpu-toolkit : 🦚 🧰 Collection of basic GPU algorithms implemented in CUDA C++.
-
-
-
drom/awesome-hdl : A curated list of amazingly awesome hardware description language projects.
-
ben-marshall/awesome-open-hardware-verification : A curated List of Free and Open Source hardware verification tools and frameworks.
-
Vitorian/awesome-fpga : A collection of resources on FPGA devices and development in general.
-
kelu124/awesome-latticeFPGAs : 📖 List of FPGA Lattice boards using open tools.
-
FPGA-Systems/fpga-awesome-list : fpga-awesome-list. Полезные ресурсы по тематике FPGA / ПЛИС.
-
hdl/awesome : A curated list of awesome resources for HDL design and verification.
-
vhdl/awesome-vhdl : A curated list of awesome VHDL IP cores, frameworks, libraries, software and resources.
-
clin99/awesome-eda : A curated list of EDA open source projects.
-
iDoka/awesome-fpga-boards : List of Repurposed FPGA boards which getting Second life in DYI or Hobby projects.
-
TM90/awesome-hwd-tools : A curated list of awesome open source hardware design tools.
-
qninth/awesome-digital-ic : A collection of great digital IC project/tutorial/website etc..
-
emanueledelsozzo/awesome-fpga-programming : A curated list of awesome languages and tools to program FPGAs.
-
fukatani/awesome-hdl : A curated list of awesome HDL, libraries, typical implementation and references.
-
mikeroyal/VHDL-Guide : A guide covering VHDL including the applications, libraries and tools that will make you a better and more efficient with VHDL development.
-
mikeroyal/Verilog-SystemVerilog-Guide : Verilog/SystemVerilog Guide. A guide covering Verilog & SystemVerilog including the applications, libraries and tools that will make you a better and more efficient developer by having a better understanding of how hardware works on the lowest level.
-
analogdevicesinc/hdl : HDL libraries and projects. wiki.analog.com/resources/fpga/docs/hdl
-
analogdevicesinc/hdl : 🌱 Open source ecosystem for open FPGA boards. github.com/FPGAwars/apio/wiki
-
-
-
NVIDIA CUDA Toolkit Documentation : CUDA Toolkit Documentation.
-
NVIDIA CUDA C++ Programming Guide : CUDA C++ Programming Guide.
-
NVIDIA CUDA C++ Best Practices Guide : CUDA C++ Best Practices Guide.
-
NVIDIA/cuda-samples : Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
-
NVIDIA/CUDALibrarySamples : CUDA Library Samples.
-
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese : This is a Chinese translation of the CUDA programming guide. 本项目为 CUDA C Programming Guide 的中文翻译版。
-
cuda-mode/lectures : Material for cuda-mode lectures.
-
cuda-mode/resource-stream : CUDA related news and material links.
-
brucefan1983/CUDA-Programming : Sample codes for my CUDA programming book.
-
YouQixiaowu/CUDA-Programming-with-Python : 关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码。
-
QINZHAOYU/CudaSteps : 基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。
-
sangyc10/CUDA-code : B站视频教程【CUDA编程基础入门系列(持续更新)】配套代码。
-
RussWong/CUDATutorial : A CUDA tutorial to make people learn CUDA program from 0.
-
DefTruth/cuda-learn-note : 🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
-
PaddleJitLab/CUDATutorial : A self-learning tutorail for CUDA High Performance Programing. 从零开始学习 CUDA 高性能编程。
-
BBuf/how-to-optim-algorithm-in-cuda : This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
Liu-xiandong/How_to_optimize_in_GPU : This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
Bruce-Lee-LY/matrix_multiply : Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
-
Bruce-Lee-LY/cuda_hgemm : Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Bruce-Lee-LY/cuda_hgemv : Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
enp1s0/ozIMMU : FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme. arxiv.org/abs/2306.11975
-
Cjkkkk/CUDA_gemm : A simple high performance CUDA GEMM implementation.
-
AyakaGEMM/Hands-on-GEMM : A GEMM tutorial.
-
AyakaGEMM/Hands-on-MLIR : Hands-on-MLIR.
-
zpzim/MSplitGEMM : Large matrix multiplication in CUDA.
-
jundaf2/CUDA-INT8-GEMM : CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API.
-
chanzhennan/cuda_gemm_benchmark : Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU.
-
YuxueYang1204/CudaDemo : Implement custom operators in PyTorch with cuda/c++.
-
CoffeeBeforeArch/cuda_programming : Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch.
-
rbaygildin/learn-gpgpu : Algorithms implemented in CUDA + resources about GPGPU.
-
godweiyang/NN-CUDA-Example : Several simple examples for popular neural network toolkits calling custom CUDA operators.
-
yhwang-hub/Matrix_Multiplication_Performance_Optimization : Matrix Multiplication Performance Optimization.
-
yao-jiashu/KernelCodeGen : GEMM/Conv2d CUDA/HIP kernel code generation using MLIR.
-
caiwanxianhust/ClusteringByCUDA : 使用 CUDA C++ 实现的一系列聚类算法。
-
ulrichstern/cuda-convnet : Alex Krizhevsky's original code from Google Code. "微信公众号「人工智能大讲堂」《找到了AlexNet当年的源代码,没用框架,从零手撸CUDA/C++》"。
-
PacktPublishing/Learn-CUDA-Programming : Learn CUDA Programming, published by Packt.
-
PacktPublishing/Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA : Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt.
-
PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA : Hands-On GPU Programming with Python and CUDA, published by Packt.
-
codingonion/cuda-beginner-course-cpp-version : bilibili视频【CUDA 12.1 并行编程入门(C++语言版)】配套代码。
-
codingonion/cuda-beginner-course-python-version : bilibili视频【CUDA 12.1 并行编程入门(Python语言版)】配套代码。
-
codingonion/cuda-beginner-course-rust-version : bilibili视频【CUDA 12.1 并行编程入门(Rust语言版)】配套代码。
-
-
-
NVIDIA TensorRT Docs : NVIDIA Deep Learning TensorRT Documentation.
-
TensorRT : NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. developer.nvidia.com/tensorrt
-
TensorRT-LLM : TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. nvidia.github.io/TensorRT-LLM
-
HeKun-NVIDIA/TensorRT-Developer_Guide_in_Chinese : 本项目是NVIDIA TensorRT的中文版开发手册, 有个人翻译并添加自己的理解。
-
kalfazed/tensorrt_starter : This repository give a guidline to learn CUDA and TensorRT from the beginning.
-
-
- AMD ROCm Docs : AMD ROCm™ documentation.
-
-
sipeed/TangPrimer-20K-example : AIoT opensource hardware platform. TangPrimer-20K-example project.
-
BrunoLevy/learn-fpga : About Learning FPGA, yosys, nextpnr, and RISC-V
-
WangXuan95/ZedBoard-Tutorial : Vivado+PetaLinux 系统搭建教程 —— 基于 Zedboard.
-
WangXuan95/UniPlug-FPGA : 体积小、低成本、易用、扩展性强的 FPGA 核心板。
-
-
-
-
-
CCCL : CUDA C++ Core Libraries. The concept for the CUDA C++ Core Libraries (CCCL) grew organically out of the Thrust, CUB, and libcudacxx projects that were developed independently over the years with a similar goal: to provide high-quality, high-performance, and easy-to-use C++ abstractions for CUDA developers.
-
HIP : HIP: C++ Heterogeneous-Compute Interface for Portability. HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code. rocmdocs.amd.com/projects/HIP/
-
-
-
PyCUDA : PyCUDA: Pythonic Access to CUDA, with Arrays and Algorithms. mathema.tician.de/software/pycuda
-
-
jessfraz/advent-of-cuda : Doing advent of code with CUDA and rust.
-
Bend : A massively parallel, high-level programming language.higherorderco.com
-
HVM : A massively parallel, optimal functional runtime in Rust.higherorderco.com
-
ZLUDA : CUDA on AMD GPUs.
-
Rust-CUDA : Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
-
cudarc : cudarc: minimal and safe api over the cuda toolkit.
-
bindgen_cuda : Similar crate than bindgen in philosophy. It will help create automatic bindgen to cuda kernels source files and make them easier to use directly from Rust.
-
cuda-driver : 基于 CUDA Driver API 的 cuda 运行时环境。
-
async-cuda : Asynchronous CUDA for Rust.
-
async-tensorrt : Asynchronous TensorRT for Rust.
-
krnl : Safe, portable, high performance compute (GPGPU) kernels.
-
custos : A minimal OpenCL, CUDA, WGPU and host CPU array manipulation engine / framework.
-
spinorml/nvlib : Rust interoperability with NVIDIA CUDA NVRTC and Driver.
-
DoeringChristian/cuda-rs : Cuda Bindings for rust generated with bindgen-cli (similar to cust_raw).
-
romankoblov/rust-nvrtc : NVRTC bindings for RUST.
-
solkitten/astro-cuda : CUDA Driver API bindings for Rust.
-
bokutotu/curs : cuda&cublas&cudnn wrapper for Rust.
-
rust-cuda/cuda-sys : Rust binding to CUDA APIs.
-
bheisler/RustaCUDA : Rusty wrapper for the CUDA Driver API.
-
tmrob2/cuda2rust_sandpit : Minimal examples to get CUDA linear algebra programs working with Rust using CC & FFI.
-
PhDP/rust-cuda-template : Simple template for Rust + CUDA.
-
neka-nat/cuimage : Rust implementation of image processing library with CUDA.
-
yanghaku/cuda-driver-sys : Rust binding to CUDA Driver APIs.
-
Canyon-ml/canyon-sys : Rust Bindings for Cuda, CuDNN.
-
cea-hpc/HARP : Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA.
-
Conqueror712/CUDA-Simulator : A self-developed version of the user-mode CUDA emulator project and a learning repository for Rust.
-
cszach/rust-cuda-template : A Rust CUDA template with detailed instructions.
-
exor2008/fluid-simulator : Rust CUDA fluid simulator.
-
chichieinstein/rustycuda : Convenience functions for generic handling of CUDA resources on the Rust side.
-
Jafagervik/cruda : CRUDA - Writing rust with cuda.
-
lennyerik/cutransform : CUDA kernels in any language supported by LLVM.
-
cjordan/hip-sys : Rust bindings for HIP.
-
rust-gpu : 🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧 shader.rs
-
wgpu : Safe and portable GPU abstraction in Rust, implementing WebGPU API. wgpu.rs
-
Vulkano : Safe and rich Rust wrapper around the Vulkan API. Vulkano is a Rust wrapper around the Vulkan graphics API. It follows the Rust philosophy, which is that as long as you don't use unsafe code you shouldn't be able to trigger any undefined behavior. In the case of Vulkan, this means that non-unsafe code should always conform to valid API usage.
-
Ash : Vulkan bindings for Rust.
-
ocl : OpenCL for Rust.
-
opencl3 : A Rust implementation of the Khronos OpenCL 3.0 API.
-
-
-
CUDA.jl : CUDA programming in Julia. juliagpu.org/
-
AMDGPU.jl : AMD GPU (ROCm) programming in Julia.
-
-
-
-
cuBLAS : Basic Linear Algebra on NVIDIA GPUs. NVIDIA cuBLAS is a GPU-accelerated library for accelerating AI and HPC applications. It includes several API extensions for providing drop-in industry standard BLAS APIs and GEMM APIs with support for fusions that are highly optimized for NVIDIA GPUs. The cuBLAS library also contains extensions for batched operations, execution across multiple GPUs, and mixed- and low-precision execution with additional tuning for the best performance.
-
CUTLASS : CUDA Templates for Linear Algebra Subroutines.
-
MatX : MatX - GPU-Accelerated Numerical Computing in Modern C++. An efficient C++17 GPU numerical computing library with Python-like syntax. nvidia.github.io/MatX
-
GenericLinearAlgebra.jl : Generic numerical linear algebra in Julia.
-
custos-math : This crate provides CUDA, OpenCL, CPU (and Stack) based matrix operations using custos.
-
-
-
cuDNN : The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, attention, matmul, pooling, and normalization.
-
PyTorch : Tensors and Dynamic neural networks in Python with strong GPU acceleration. pytorch.org
-
PaddlePaddle : PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署). www.paddlepaddle.org/
-
flashlight/flashlight : A C++ standalone library for machine learning. fl.readthedocs.io/en/latest/
-
NVlabs/tiny-cuda-nn : Lightning fast C++/CUDA neural network framework.
-
yhwang-hub/dl_model_infer : his is a c++ version of the AI reasoning library. Currently, it only supports the reasoning of the tensorrt model. The follow-up plan supports the c++ reasoning of frameworks such as Openvino, NCNN, and MNN. There are two versions for pre- and post-processing, c++ version and cuda version. It is recommended to use the cuda version., This repository provides accelerated deployment cases of deep learning CV popular models, and cuda c supports dynamic-batch image process, infer, decode, NMS.
-
-
-
-
llm.c : LLM training in simple, pure C/CUDA. There is no need for 245MB of PyTorch or 107MB of cPython. For example, training GPT-2 (CPU, fp32) is ~1,000 lines of clean code in a single file. It compiles and runs instantly, and exactly matches the PyTorch reference implementation.
-
llama2.c : Inference Llama 2 in one file of pure C. Train the Llama 2 LLM architecture in PyTorch then inference it with one simple 700-line C file (run.c).
-
-
-
TensorRT : NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. developer.nvidia.com/tensorrt
-
TensorRT-LLM : TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. nvidia.github.io/TensorRT-LLM
-
gemma.cpp : gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.
-
whisper.cpp : High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model.
-
ChatGLM.cpp : C++ implementation of ChatGLM-6B and ChatGLM2-6B.
-
MegEngine/InferLLM : InferLLM is a lightweight LLM model inference framework that mainly references and borrows from the llama.cpp project.
-
DeployAI/nndeploy : nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为内核,致力为用户提供跨平台、简单易用、高性能的模型部署体验。nndeploy-zh.readthedocs.io/zh/latest/
-
zjhellofss/KuiperInfer (自制深度学习推理框架) : 带你从零实现一个高性能的深度学习推理库,支持llama 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step.
-
skeskinen/llama-lite : Embeddings focused small version of Llama NLP model.
-
Const-me/Whisper : High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model.
-
wangzhaode/ChatGLM-MNN : Pure C++, Easy Deploy ChatGLM-6B.
-
ztxz16/fastllm : 纯c++实现,无第三方依赖的大模型库,支持CUDA加速,目前支持国产大模型ChatGLM-6B,MOSS; 可以在安卓设备上流畅运行ChatGLM-6B。
-
davidar/eigenGPT : Minimal C++ implementation of GPT2.
-
Tlntin/Qwen-TensorRT-LLM : 使用TRT-LLM完成对Qwen-7B-Chat实现推理加速。
-
FeiGeChuanShu/trt2023 : NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化。
-
TRT2022/trtllm-llama : ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化。
-
-
-
llama2.mojo : Inference Llama 2 in one file of pure 🔥
-
dorjeduck/llm.mojo : port of Andrjey Karpathy's llm.c to Mojo.
-
-
-
Candle : Minimalist ML framework for Rust.
-
Safetensors : Simple, safe way to store and distribute tensors. huggingface.co/docs/safetensors
-
Tokenizers : 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production. huggingface.co/docs/tokenizers
-
Burn : Burn - A Flexible and Comprehensive Deep Learning Framework in Rust. burn-rs.github.io/
-
dfdx : Deep learning in Rust, with shape checked tensors and neural networks.
-
luminal : Deep learning at the speed of light. www.luminalai.com/
-
crabml : crabml is focusing on the reimplementation of GGML using the Rust programming language.
-
TensorFlow Rust : Rust language bindings for TensorFlow.
-
tch-rs : Rust bindings for the C++ api of PyTorch.
-
rustai-solutions/candle_demo_openchat_35 : candle_demo_openchat_35.
-
llama2.rs : A fast llama2 decoder in pure Rust.
-
Llama2-burn : Llama2 LLM ported to Rust burn.
-
gaxler/llama2.rs : Inference Llama 2 in one file of pure Rust 🦀
-
whisper-burn : A Rust implementation of OpenAI's Whisper model using the burn framework.
-
stable-diffusion-burn : Stable Diffusion v1.4 ported to Rust's burn framework.
-
coreylowman/llama-dfdx : LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
-
tazz4843/whisper-rs : Rust bindings to whisper.cpp.
-
rustformers/llm : Run inference for Large Language Models on CPU, with Rust 🦀🚀🦙.
-
Chidori : A reactive runtime for building durable AI agents. docs.thousandbirds.ai.
-
llm-chain : llm-chain is a collection of Rust crates designed to help you work with Large Language Models (LLMs) more effectively. llm-chain.xyz
-
Atome-FE/llama-node : Believe in AI democratization. llama for nodejs backed by llama-rs and llama.cpp, work locally on your laptop CPU. support llama/alpaca/gpt4all/vicuna model. www.npmjs.com/package/llama-node
-
Noeda/rllama : Rust+OpenCL+AVX2 implementation of LLaMA inference code.
-
lencx/ChatGPT : 🔮 ChatGPT Desktop Application (Mac, Windows and Linux). NoFWL.
-
Synaptrix/ChatGPT-Desktop : Fuel your productivity with ChatGPT-Desktop - Blazingly fast and supercharged!
-
Poordeveloper/chatgpt-app : A ChatGPT App for all platforms. Built with Rust + Tauri + Vue + Axum.
-
mxismean/chatgpt-app : Tauri 项目:ChatGPT App.
-
sonnylazuardi/chat-ai-desktop : Chat AI Desktop App. Unofficial ChatGPT desktop app for Mac & Windows menubar using Tauri & Rust.
-
yetone/openai-translator : The translator that does more than just translation - powered by OpenAI.
-
m1guelpf/browser-agent : A browser AI agent, using GPT-4. docs.rs/browser-agent
-
sigoden/aichat : Using ChatGPT/GPT-3.5/GPT-4 in the terminal.
-
uiuifree/rust-openai-chatgpt-api : "rust-openai-chatgpt-api" is a Rust library for accessing the ChatGPT API, a powerful NLP platform by OpenAI. The library provides a simple and efficient interface for sending requests and receiving responses, including chat. It uses reqwest and serde for HTTP requests and JSON serialization.
-
1595901624/gpt-aggregated-edition : 聚合ChatGPT官方版、ChatGPT免费版、文心一言、Poe、chatchat等多平台,支持自定义导入平台。
-
Cormanz/smartgpt : A program that provides LLMs with the ability to complete complex tasks using plugins.
-
femtoGPT : femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer. discord.gg/wTJFaDVn45
-
shafishlabs/llmchain-rs : 🦀Rust + Large Language Models - Make AI Services Freely and Easily. Inspired by LangChain.
-
flaneur2020/llama2.rs : An rust reimplementatin of https://github.com/karpathy/llama2.c.
-
Heng30/chatbox : A Chatbot for OpenAI ChatGPT. Based on Slint-ui and Rust.
-
fairjm/dioxus-openai-qa-gui : a simple openai qa desktop app built with dioxus.
-
purton-tech/bionicgpt : Accelerate LLM adoption in your organisation. Chat with your confidential data safely and securely. bionic-gpt.com
-
-
-
llama2.zig : Inference Llama 2 in one file of pure Zig.
-
renerocksai/gpt4all.zig : ZIG build for a terminal-based chat client for an assistant-style large language model with ~800k GPT-3.5-Turbo Generations based on LLaMa.
-
EugenHotaj/zig_inference : Neural Network Inference Engine in Zig.
-
-
-
Ollama : Get up and running with Llama 2, Mistral, Gemma, and other large language models. ollama.com
-
go-skynet/LocalAI : 🤖 Self-hosted, community-driven, local OpenAI-compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. Free Open Source OpenAI alternative. No GPU required. LocalAI is an API to run ggml compatible models: llama, gpt4all, rwkv, whisper, vicuna, koala, gpt4all-j, cerebras, falcon, dolly, starcoder, and many other. localai.io
-
-
-
vllm-project/vllm : A high-throughput and memory-efficient inference and serving engine for LLMs. vllm.readthedocs.io
-
MLC LLM : Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. mlc.ai/mlc-llm
-
Lamini : Lamini: The LLM engine for rapidly customizing models 🦙.
-
datawhalechina/self-llm : 《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程。
-
-
- ninehills/llm-inference-benchmark : LLM Inference benchmark.
-
-
-
NVIDIA/nccl : Optimized primitives for collective multi-GPU communication.
-
wilicc/gpu-burn : Multi-GPU CUDA stress test.
-
-
- Cupoch : Robotics with GPU computing.
-
-
Tachyon : Modular ZK(Zero Knowledge) backend accelerated by GPU.
-
Blitzar : Zero-knowledge proof acceleration with GPUs for C++ and Rust. www.spaceandtime.io/
-
blitzar-rs : High-Level Rust wrapper for the blitzar-sys crate. www.spaceandtime.io/
-
ICICLE : ICICLE is a library for ZK acceleration using CUDA-enabled GPUs.
-
-
-
-
- LiteX : The LiteX framework provides a convenient and efficient infrastructure to create FPGA Cores/SoCs, to explore various digital design architectures and createfull FPGA based systems.
-
-
Chisel : Chisel: A Modern Hardware Design Language. www.chisel-lang.org/
-
SpinalHDL : Scala based HDL.
-
-
-
Veryl : Veryl: A Modern Hardware Description Language.
-
RustHDL : A framework for writing FPGA firmware using the Rust Programming Language.
-
VHDL-LS/rust_hdl : This repository contains a fast VHDL language server and analysis library written in Rust.
-
yupferris/kaze : An HDL embedded in Rust. kaze provides an API to describe Modules composed of Signals, which can then be used to generate Rust simulator code or Verilog modules.
-
dalance/sv-parser : SystemVerilog parser library fully compliant with IEEE 1800-2017.
-
dalance/svls : SystemVerilog language server.
-
dalance/svlint : SystemVerilog linter.
-
vivekmalneedi/veridian : A SystemVerilog Language Server.
-
zachjs/sv2v : SystemVerilog to Verilog conversion.
-
-
-
nMigen : A modern hardware definition language and toolchain based on Python.
-
Migen : A Python toolbox for building complex digital hardware.
-
MyHDL : MyHDL is a free, open-source package for using Python as a hardware description and verification language.
-
Magma : Magma is a hardware design language embedded in python.
-
PyRTL : PyRTL provides a collection of classes for pythonic register-transfer level design, simulation, tracing, and testing suitable for teaching and research.
-
Veriloggen : Veriloggen: A Mixed-Paradigm Hardware Construction Framework.
-
HWT : VHDL/Verilog/SystemC code generator, simulator API written in python/c++.
-
HDL21 : Analog Hardware Description Library in Python.
-
-
-
-
-
laugh12321/TensorRT-YOLO : 🚀 TensorRT-YOLO: Support YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, PP-YOLOE using TensorRT acceleration with EfficientNMS! TensorRT-YOLO 是一个支持 YOLOv3、YOLOv5、YOLOv6、YOLOv7、YOLOv8、YOLOv9、YOLOv10、PP-YOLOE 和 PP-YOLOE+ 的推理加速项目,使用 NVIDIA TensorRT 进行优化。项目不仅集成了 EfficientNMS TensorRT 插件以增强后处理效果,还使用了 CUDA 核函数来加速前处理过程。TensorRT-YOLO 提供了 C++ 和 Python 推理的支持,旨在提供快速而优化的目标检测解决方案。
-
l-sf/Linfer : 基于TensorRT的C++高性能推理库,Yolov10, YoloPv2,Yolov5/7/X/8,RT-DETR,单目标跟踪OSTrack、LightTrack。
-
Melody-Zhou/tensorRT_Pro-YOLOv8 : This repository is based on shouxieai/tensorRT_Pro, with adjustments to support YOLOv8. 目前已支持 YOLOv8、YOLOv8-Cls、YOLOv8-Seg、YOLOv8-OBB、YOLOv8-Pose、RT-DETR、ByteTrack、YOLOv9、YOLOv10、RTMO 高性能推理!!!🚀🚀🚀
-
shouxieai/tensorRT_Pro : C++ library based on tensorrt integration.
-
shouxieai/infer : A new tensorrt integrate. Easy to integrate many tasks.
-
kalfazed/tensorrt_starter : This repository give a guidline to learn CUDA and TensorRT from the beginning.
-
hamdiboukamcha/yolov10-tensorrt : YOLOv10 C++ TensorRT : Real-Time End-to-End Object Detection.
-
triple-Mu/YOLOv8-TensorRT : YOLOv8 using TensorRT accelerate !
-
FeiYull/TensorRT-Alpha : 🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
-
cyrusbehr/YOLOv8-TensorRT-CPP : YOLOv8 TensorRT C++ Implementation. A C++ Implementation of YoloV8 using TensorRT Supports object detection, semantic segmentation, and body pose estimation.
-
VIDIA-AI-IOT/torch2trt : An easy to use PyTorch to TensorRT converter.
-
zhiqwang/yolort : yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn. zhiqwang.com/yolort
-
Linaom1214/TensorRT-For-YOLO-Series : YOLO Series TensorRT Python/C++. tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6....), nms plugin support.
-
wang-xinyu/tensorrtx : TensorRTx aims to implement popular deep learning networks with tensorrt network definition APIs.
-
DefTruth/lite.ai.toolkit : 🛠 A lite C++ toolkit of awesome AI models with ONNXRuntime, NCNN, MNN and TNN. YOLOX, YOLOP, YOLOv6, YOLOR, MODNet, YOLOX, YOLOv7, YOLOv5. MNN, NCNN, TNN, ONNXRuntime. “🛠Lite.Ai.ToolKit: 一个轻量级的C++ AI模型工具箱,用户友好(还行吧),开箱即用。已经包括 100+ 流行的开源模型。这是一个根据个人兴趣整理的C++工具箱,, 涵盖目标检测、人脸检测、人脸识别、语义分割、抠图等领域。”
-
PaddlePaddle/FastDeploy : ⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
-
enazoe/yolo-tensorrt : TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.
-
guojianyang/cv-detect-robot : 🔥🔥🔥🔥🔥🔥Docker NVIDIA Docker2 YOLOV5 YOLOX YOLO Deepsort TensorRT ROS Deepstream Jetson Nano TX2 NX for High-performance deployment(高性能部署)。
-
BlueMirrors/Yolov5-TensorRT : Yolov5 TensorRT Implementations.
-
lewes6369/TensorRT-Yolov3 : TensorRT for Yolov3.
-
CaoWGG/TensorRT-YOLOv4 :tensorrt5, yolov4, yolov3,yolov3-tniy,yolov3-tniy-prn.
-
isarsoft/yolov4-triton-tensorrt : YOLOv4 on Triton Inference Server with TensorRT.
-
TrojanXu/yolov5-tensorrt : A tensorrt implementation of yolov5.
-
tjuskyzhang/Scaled-YOLOv4-TensorRT : Implement yolov4-tiny-tensorrt, yolov4-csp-tensorrt, yolov4-large-tensorrt(p5, p6, p7) layer by layer using TensorRT API.
-
Syencil/tensorRT : TensorRT-7 Network Lib 包括常用目标检测、关键点检测、人脸检测、OCR等 可训练自己数据。
-
SeanAvery/yolov5-tensorrt : YOLOv5 in TensorRT.
-
Monday-Leo/YOLOv7_Tensorrt : A simple implementation of Tensorrt YOLOv7.
-
ibaiGorordo/ONNX-YOLOv6-Object-Detection : Python scripts performing object detection using the YOLOv6 model in ONNX.
-
ibaiGorordo/ONNX-YOLOv7-Object-Detection : Python scripts performing object detection using the YOLOv7 model in ONNX.
-
triple-Mu/yolov7 : End2end TensorRT YOLOv7.
-
hewen0901/yolov7_trt : yolov7目标检测算法的c++ tensorrt部署代码。
-
tsutof/tiny_yolov2_onnx_cam : Tiny YOLO v2 Inference Application with NVIDIA TensorRT.
-
Monday-Leo/Yolov5_Tensorrt_Win10 : A simple implementation of tensorrt yolov5 python/c++🔥
-
Wulingtian/yolov5_tensorrt_int8 : TensorRT int8 量化部署 yolov5s 模型,实测3.3ms一帧!
-
Wulingtian/yolov5_tensorrt_int8_tools : tensorrt int8 量化yolov5 onnx模型。
-
MadaoFY/yolov5_TensorRT_inference : 记录yolov5的TensorRT量化及推理代码,经实测可运行于Jetson平台。
-
ibaiGorordo/ONNX-YOLOv8-Object-Detection : Python scripts performing object detection using the YOLOv8 model in ONNX.
-
we0091234/yolov8-tensorrt : yolov8 tensorrt 加速.
-
FeiYull/yolov8-tensorrt : YOLOv8的TensorRT+CUDA加速部署,代码可在Win、Linux下运行。
-
cvdong/YOLO_TRT_SIM : 🐇 一套代码同时支持YOLO X, V5, V6, V7, V8 TRT推理 ™️ 🔝 ,前后处理均由CUDA核函数实现 CPP/CUDA🚀
-
cvdong/YOLO_TRT_PY : 🐰 一套代码同时支持YOLOV5, V6, V7, V8 TRT推理 ™️ PYTHON
✈️ -
Psynosaur/Jetson-SecVision : Person detection for Hikvision DVR with AlarmIO ports, uses TensorRT and yolov4.
-
tatsuya-fukuoka/yolov7-onnx-infer : Inference with yolov7's onnx model.
-
MadaoFY/yolov5_TensorRT_inference : 记录yolov5的TensorRT量化及推理代码,经实测可运行于Jetson平台。
-
ervgan/yolov5_tensorrt_inference : TensorRT cpp inference for Yolov5 model. Supports yolov5 v1.0, v2.0, v3.0, v3.1, v4.0, v5.0, v6.0, v6.2, v7.0.
-
AlbinZhu/easy-trt : TensorRT for YOLOv10 with CUDA.
-
-
-
-
-
-
-
XiangShan (香山) : XiangShan (香山) is an open-source high-performance RISC-V processor project. "Towards Developing High Performance RISC-V Processors Using Agile Methodology". (MICRO 2022)
-
Rocket Chip : Rocket Chip Generator 🚀. This repository contains the Rocket chip generator necessary to instantiate the RISC-V Rocket Core.
-
MoonbaseOtago/vroom : VRoom! RISC-V CPU. A new high-end RISC-V implementation.
-
SpinalHDL/VexRiscv : SpinalHDL/VexRiscv.
-
DarkRISCV : opensouce RISC-V cpu core implemented in Verilog from scratch in one night!
-
stnolting/neorv32 : The NEORV32 RISC-V Processor. 🖥️ A tiny, customizable and highly extensible MCU-class 32-bit RISC-V soft-core CPU and microcontroller-like SoC written in platform-independent VHDL.
-
ZipCPU/zipcpu : The Zip CPU is a small, light-weight, RISC CPU.
-
olofk/serv : SERV - The SErial RISC-V CPU.
-
riscv-mcu/e203_hbirdv2 : The Ultra-Low Power RISC-V Core. doc.nucleisys.com/hbirdv2
-
ultraembedded/riscv : RISC-V CPU Core (RV32IM).
-
ultraembedded/biriscv : 32-bit Superscalar RISC-V CPU.
-
WangXuan95/USTC-RVSoC : An FPGA-based RISC-V CPU+SoC with a simple and extensible peripheral bus. 基于FPGA的RISC-V CPU+SoC,包含一个简单且可扩展的外设总线。
-
FPGAwars/FLIX-V : FLIX-V: FPGA, Linux and RISC-V.
-
-
-
-
Ventus(承影) : Ventus(承影) GPGPU. GPGPU processor supporting RISCV-V extension, developed with Chisel HDL.
-
jbush001/NyuziProcessor : Nyuzi is an experimental GPGPU processor focused on compute intensive tasks. It includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++ compiler, software libraries, and tests.
-
-
-
-
- lnis-uofu/OpenFPGA : The award-winning OpenFPGA framework is the first open-source FPGA IP generator with silicon proofs supporting highly-customizable FPGA architectures. OpenFPGA provides complete EDA support for customized FPGAs, including Verilog-to-bitstream generation and self-testing verification. OpenFPGA opens the door to democratizing FPGA technology and EDA techniques with agile prototyping approaches and constantly evolving EDA tools for chip designers and researchers. openfpga.readthedocs.io/en/master/. "OpenFPGA: An Open-Source Framework for Agile Prototyping Customizable FPGAs". (IEEE Micro, 2020)
-
-
WangXuan95/Xilinx-FPGA-PCIe-XDMA-Tutorial : Xilinx FPGA PCIe 保姆级教程 ——基于 PCIe XDMA IP核。
-
Reconfigurable-Computing/Xilinx-FPGA-PCIe-XDMA-Tutorial : Xilinx FPGA PCIe 保姆级教程 ——基于 PCIe XDMA IP核。
-
enjoy-digital/litepcie : LitePCIe provides a small footprint and configurable PCIe core.
-
alexforencich/verilog-pcie : Verilog PCI Express Components Readme.
-
-
-
ultraembedded/core_ddr3_controller : A DDR3 memory controller in Verilog for various FPGAs.
-
WangXuan95/FPGA-DDR-SDRAM : An AXI4-based DDR1 controller to realize mass, cheap memory for FPGA. 基于FPGA的DDR1控制器,为低端FPGA嵌入式系统提供廉价、大容量的存储。
-
adibis/DDR2_Controller : DDR2 memory controller written in Verilog.
-
BrianHGinc/BrianHG-DDR3-Controller : DDR3 Controller v1.60, 16 read/write ports, configurable widths, priority, auto-burst size & cache on each port. VGA/HDMI multiwindow video controller with alpha-blended layers. Docs & TBs included.
-
someone755/ddr3-controller : A DDR3(L) PHY and controller, written in Verilog, for Xilinx 7-Series FPGAs.
-
WangXuan95/FPGA-DDR-SDRAM : An AXI4-based DDR1 controller to realize mass, cheap memory for FPGA. 基于FPGA的DDR1控制器,为低端FPGA嵌入式系统提供廉价、大容量的存储。
-
-
alexforencich/verilog-ethernet : Verilog Ethernet components for FPGA implementation.
-
openwifi : open-source IEEE 802.11 WiFi baseband FPGA (chip) design: driver, software.
-
-
ZipCPU/wbuart32 : A simple, basic, formally verified UART controller.
-
WangXuan95/Verilog-UART : 3 independent modules for FPGA: UART receiver, UART transmitter, UART interactive debugger. 3个独立模块:UART接收器、UART发送器、UART交互式调试器。
-
-
- WangXuan95/FPGA-USB-Device : An FPGA-based USB full-speed device core to implement USB-serial, USB-camera, USB-audio, USB-disk, USB-keyboard, etc. It requires only 3 FPGA common IOs rather than additional chips. 基于FPGA的USB full-speed device端控制器,可实现USB串口、USB摄像头、USB音频、U盘、USB键盘等设备,只需要3个FPGA普通IO,而不需要额外的接口芯片。
-
- WangXuan95/FPGA-CAN : An FPGA-based lightweight CAN bus controller. 基于FPGA的轻量级CAN总线控制器。
-
- pulp-platform/axi : AXI SystemVerilog synthesizable IP modules and verification infrastructure for high-performance on-chip communication.
-
- hdl-util/hdmi : Send video/audio over HDMI on an FPGA. purisa.me/blog/hdmi-released/
-
-
WangXuan95/FPGA-SDcard-Reader : An FPGA-based SD-card reader to read files from FAT16 or FAT32 formatted SD-cards. 基于FPGA的SD卡读取器,可以从FAT16或FAT32格式的SD卡中读取文件。
-
WangXuan95/FPGA-SDcard-Reader-SPI : An FPGA-based SD-card reader via SPI bus, which can read files from FAT16 or FAT32 formatted SD-cards. 基于FPGA的SD卡读取器(通过SPI总线),可以从FAT16或FAT32格式的SD卡中读取文件。
-
WangXuan95/FPGA-SDfake : Imitate SDcard using FPGAs. 使用FPGA模拟(伪装) SD卡。
-
-
WangXuan95/FPGA-NFC : Build an NFC (RFID) card reader using FPGA and simple circuit instead of RFID-specfic chip. 用FPGA+分立器件电路搭建一个NFC(RFID)读卡器,不需要专门的RFID芯片。
-
- WangXuan95/FPGA-SATA-HBA : A SATA host (HBA) core based on Xilinx FPGA with GTH. Easy to read/write hard disk. 一个基于Xilinx FPGA中的GTH的SATA host控制器,用来读写硬盘。
-
- WangXuan95/FPGA-DAC-R2R-PWM : FPGA-based 14bit DAC with resistance network and PWM.
-
-
- apertus-open-source-cinema/axiom-firmware : AXIOM Beta Software. Firmware required to boot & operate the apertus° AXIOM Beta Camera. "微信公众号「OpenFPGA」《世界上最伟大的开源作品-基于FPGA的开源摄影机--Axiom Camera》"。
-
-
ChFrenkel/tinyODIN : tinyODIN Low-Cost Digital Spiking Neural Network (SNN) Processor.
-
ChFrenkel/ODIN : ODIN Spiking Neural Network (SNN) Processor.
-
ChFrenkel/ReckOn : ReckOn: A Spiking RNN Processor Enabling On-Chip Learning over Second-Long Timescales.
-
-
-
Xilinx/Vitis-AI : Vitis AI offers a unified set of high-level C++/Python programming APIs to run AI applications across edge-to-cloud platforms, including DPU for Alveo, and DPU for Zynq Ultrascale+ MPSoC and Zynq-7000. It brings the benefits to easily port AI applications from cloud to edge and vice versa. 10 samples in VART Samples are available to help you get familiar with the unfied programming APIs. Vitis-AI-Library provides an easy-to-use and unified interface by encapsulating many efficient and high-quality neural networks.
-
tensil-ai/tensil : Open source machine learning accelerators. www.tensil.ai
-
19801201/SpinalHDL_CNN_Accelerator : CNN accelerator implemented with Spinal HDL.
-
ZFTurbo/MobileNet-in-FPGA : Generator of verilog description for FPGA MobileNet implementation.
-
MasLiang/CNN-On-FPGA : This is the code of the CNN on FPGA.But this can only be used for reference at present for some files are write coarsly using ISE.
-
PipeCNN : PipeCNN is an OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks (CNNs).
-
-
-
dhm2013724/yolov2_xilinx_fpga : YOLOv2 Accelerator in Xilinx's Zynq-7000 Soc(PYNQ-z2, Zedboard and ZCU102). (硕士论文 2019, 电子技术应用 2019, 计算机科学与探索 2019)
-
Yu-Zhewen/Tiny_YOLO_v3_ZYNQ : Implement Tiny YOLO v3 on ZYNQ. "A Parameterisable FPGA-Tailored Architecture for YOLOv3-Tiny". (ARC 2020)
-
HSqure/ultralytics-pt-yolov3-vitis-ai-edge : This demo is only used for inference testing of Vitis AI v1.4 and quantitative compilation of DPU. It is compatible with the training results of ultralytics/yolov3 v9.5.0 (it needs to use the model saving method of Pytorch V1.4).
-
mcedrdiego/Kria_yolov3_ppe : Kria KV260 Real-Time Personal Protective Equipment Detection. "Deep Learning for Site Safety: Real-Time Detection of Personal Protective Equipment". (Automation in Construction 2020)
-
xlsjdjdk/Ship-Detection-based-on-YOLOv3-and-KV260 : This is the entry project of the Xilinx Adaptive Computing Challenge 2021. It uses YOLOv3 for ship target detection in optical remote sensing images, and deploys DPU on the KV260 platform to achieve hardware acceleration.
-
Pomiculture/YOLOv4-Vitis-AI : Custom YOLOv4 for apple recognition (clean/damaged) on Alveo U280 accelerator card using Vitis AI framework.
-
mkshuvo2/ZCU104_YOLOv3_Post_Processing : Tensor outputs form Vitis AI Runner Class for YOLOv3.
-
puffdrum/v4tiny_pt_quant : quantization for yolo with xilinx/vitis-ai-pytorch.
-
chanshann/LITE_YOLOV3_TINY_VITISAI : LITE_YOLOV3_TINY_VITISAI.
-
LukiBa/zybo_yolo : YOLO example implementation using Intuitus CNN accelerator on ZYBO ZYNQ-7000 FPGA board.
-
matsuda-slab/YOLO_ZYNQ_MASTER : Implementation of YOLOv3-tiny on FPGA.
-
AramisOposich/tiny_YOLO_Zedboard : tiny_YOLO_Zedboard.
-
FerberZhang/Yolov2-FPGA-CNN- : A demo for accelerating YOLOv2 in xilinx's fpga PYNQ.
-
Prithvi-Velicheti/FPGA-Accelerator-for-TinyYolov3 : An FPGA-Accelerator-for-TinyYolov3.
-
ChainZeeLi/FPGA_DPU : This project is to implement YOLO v3 on Xilinx FPGA with DPU.
-
xbdxwyh/yolov3_fpga_project : yolov3_fpga_project.
-
ZLkanyo009/Yolo-compression-and-deployment-in-FPGA : 基于FPGA量化的人脸口罩检测。
-
xiying-boy/yolov3-AX7350 : 基于HLS_YOLOV3的驱动文件。
-
himewel/yolowell : A set of hardware architectures to build a co-design of convolutional neural networks inference at FPGA devices.
-
embedeep/Free-TPU : Free TPU for FPGA with Lenet, MobileNet, Squeezenet, Resnet, Inception V3, YOLO V3, and ICNet. Deep learning acceleration using Xilinx zynq (Zedboard or ZC702 ) or kintex-7 to solve image classification, detection, and segmentation problem.
-
yarakigit/design_contest_yolo_change_ps_to_pl : Converts pytorch yolo format weights to C header files for bare-metal (FPGA implementation).
-
adamgallas/fpga_accelerator_yolov3tiny : fpga_accelerator_yolov3tiny.
-
ylk678910/tiny-yolov3-fpga : Use an all-programmable SoC board to implement locating and tracking tasks. The hardware algorithm, a row-stationary-like strategy, can parallel calculate and reduce the storage buffer area on FPGA.
-
zhen8838/K210_Yolo_framework : Yolo v3 framework base on tensorflow, support multiple models, multiple datasets, any number of output layers, any number of anchors, model prune, and portable model to K210 !
-
SEASKY-Master/SEASKY_K210 : K210 PCB YOLO.
-
SEASKY-Master/Yolo-for-k210 : Yolo-for-k210.
-
TonyZ1Min/yolo-for-k210 : keras-yolo-for-k210.
-
vseasky/yolo-for-k210 : Yolo-for-k210.
-
shilicon/kr260_robotic_arm : A robotic arm controller design based on AMD/Xilinx KR260 FPGA dev-kit. 这是一个在AMD/Xilinx Kria KR260 FPGA板卡上实现机械臂抓取物体的工程。
-
-
- sdoira/U96-SLAM : Visual SLAM on Ultra96-V2.
-
-
WangXuan95/FPGA-JPEG-LS-encoder : An FPGA-based JPEG-LS encoder, which provides lossless and near-lossless image compression with high compression ratios. 基于FPGA的JPEG-LS编码器,可实现高压缩率的无损/近无损图象压缩。
-
WangXuan95/FPGA-MPEG2-encoder : FPGA-based high performance MPEG2 encoder for video compression. 基于 FPGA 的高性能 MPEG2 视频编码器,可实现视频压缩。
-
WangXuan95/UH-JLS : FPGA-based Ultra-High Throughput JPEG-LS encoder, which provides lossless image compression. 一个超高性能的FPGA JPEG-LS编码器,用来进行无损图象压缩。
-
-
- WangXuan95/FPGA-FOC : FPGA-based Field Oriented Control (FOC) for driving BLDC/PMSM motor. 基于FPGA的FOC控制器,用于驱动BLDC/PMSM电机。
-
- WangXuan95/Verilog-FixedPoint : A Verilog fixed-point lib: custom bit width, arithmetic, converting to float, with single cycle & pipeline version. 一个Verilog定点数库,提供算术运算、与浮点数的互相转换,包含单周期和流水线两种实现。
-
-
-
bilibili「老石谈芯」| 微信公众号「老石谈芯」
- 2020-06-27,FPGA芯片在人工智能时代的独特优势
- 2020-07-05,FPGA芯片发展的三个阶段
- 2020-08-10,什么是数据中心?
- 2020-09-20,【芯片科普】国产芯片的明显短板:FPGA
- 2020-11-04,入行十年,我总结了这份FPGA学习路线:搞定这四点,你也能轻松进阶
- 2020-11-30,芯片工程师的一天 | 我如何每天高效工作12小时
- 2020-12-27,想去一线大厂做FPGA芯片开发?这些是你该学的知识
- 2021-01-11,【芯片前沿】英特尔的这个AI芯片,性能如何超过英伟达20倍?
- 2021-01-17,为什么我不需要一个“完美”的桌面? | 附完整桌面设备清单
- 2021-03-07,这就是最棒的效率软件!如果不是,我倒想试试你的 | Notion使用技巧分享
- 2021-04-04,微软如何成为FPGA芯片的全球第一大客户 | 深度解析微软Catapult FPGA项目
- 2021-04-26,【Vlog】芯片工程师休息的一天 | 高效放松身心的五个方法
- 2021-06-15,我用了两年,写了一本没有代码的芯片书
- 2021-07-04,揭秘“香山”:高性能开源RISC-V处理器 | 对话中科院计算所包云岗研究员
- 2021-07-28,【芯片硬核】如何设计一个高性能CPU?
- 2021-12-03,【芯片硬核】学习模数转换芯片ADC?这些是你该掌握的知识
- 2022-02-12,如何用Notion保持全年自律?你该试试这个原则
- 2022-03-20,风口来了?一个视频讲透电子信息类所有专业/行业!
- 2022-03-26,AMD天价收购赛灵思,竟是为了这个芯片?
- 2022-11-25,第一次看到光刻机,竟然这样?!
- 2022-12-11,用软件开发FPGA:机械臂设计保姆级教程+源码
- 2023-04-21,聊聊我发的论文:如何将芯片验证速度提升4万倍?用FPGA!
- 2019-01-28,什么是FPGA工程师的核心竞争力
- 2020-02-28,FPGA最有影响力的25个研究成果 – 系统架构篇
- 2020-03-02,FPGA20年最有影响力的25个研究成果 – 微架构篇
- 2020-11-09,入行10年后,我总结了这份FPGA学习路线
- 2021-01-18,Stratix10 NX:超越GPU的人工智能时代“最强”FPGA?
- 2021-07-20,芯片开发语言:Verilog在左,Chisel在右
- 2021-10-30,我在隔离酒店,“做了”一个AI视觉加速器
- 2021-12-16,未来的十年,是中国芯片行业的黄金十年
- 2022-02-14,你能教教我们,二本如何去中科院实习吗?
- 2022-02-17,490亿刀!AMD收购赛灵思,动了谁的蛋糕?
- 2022-04-07,ACAP:不是FPGA,胜似FPGA
- 2022-05-18,裸辞回国+放弃百w年薪,我是不是疯了?
- 2022-08-01,如何设计一个RISC-V处理器?
- 2022-12-14,用软件开发FPGA:机械臂设计保姆级教程
- 2023-01-10,我的2022年度总结
- 2023-04-09,ChatGPT爆火,为什么英伟达又赢麻了?
- 2023-04-25,芯片从业者:你们的好日子在后头
- 2023-05-22,全网最深度分析:OPPO五百亿造芯梦碎,哲库是个错误吗?
- 2023-05-23,【万字长文】论OPPO哲库的倒下
- 微信公众号「OpenFPGA」
- 2022-01-14,谈谈Verilog和SystemVerilog简史,FPGA设计是否需要学习SystemVerilog
- 2022-05-31,优秀的 Verilog/FPGA开源项目介绍(二十四)- 脉冲神经网络 (SNN)
- 2023-01-06,优秀的 Verilog/FPGA开源项目介绍(三十六)-RISC-V(新增一)
- 2023-01-30,从FPGA说起的深度学习(一)
- 2023-02-08,从FPGA说起的深度学习(二)
- 2023-02-15,从FPGA说起的深度学习(三)
- 2023-03-02,从FPGA说起的深度学习(四)
- 2023-03-10,从FPGA说起的深度学习(五)
- 2023-04-12,从FPGA说起的深度学习(六)-任务并行性
- 2023-04-17,从FPGA说起的深度学习(七)-循环并行化
- 2023-04-28,从FPGA说起的深度学习(八)-数据并行性
- 2023-05-06,从FPGA说起的深度学习(九)- 优化最终章
- 2023-05-10,从FPGA说起的深度学习(十)
- 2023-03-13,在FPGA设计中怎么应用ChatGPT?
- 2023-03-17,卧槽,这才是最强Verilog刷题网站!
- 2023-03-17,还在为没有项目做发愁?这几个神级开源网站,都是FPGA/IC项目
- 2023-03-20,【国产FPGA】国产FPGA搭建图像处理平台
- 2023-03-22,【开源硬件】FPGA PCIe加速卡开源硬件及例程(RIFFA\XDMA\HDMI\SDI)介绍
- 2023-03-23,想用FPGA加速神经网络,这两个开源项目你必须要了解
- 2023-03-27,ChatGPT推荐的开源项目,到底靠不靠谱?
- 2023-03-31,牛客网发布了全新数字逻辑题库!会不会导致今年FPGA/IC行业更卷?!!
- 2023-04-03,FPGA有哪些优质的带源码的IP开源网站?
- 2023-04-06,世界上最伟大的开源作品-基于FPGA的开源摄影机--Axiom Camera
- 2023-04-19,基于 FPGA 的低成本、低延时成像系统
- 2023-04-21,MIPI摄像头工程=7系列FPGA + OV5640(MIPI) + 15 分钟 + VITIS
- 2023-04-24,在 FPGA 上快速构建 PID 算法
- 2023-04-24,Verilog“七宗罪”
- 2023-05-08,FPGA上的视觉 SLAM
- 2023-05-22,数字硬件建模SystemVerilog总结(完结篇)
- 2023-05-22,OpenFPGA系列文章总结
- 微信公众号「FPGA之旅」
- 2022-08-29,FPGA点亮LED灯
- 2022-08-29,FPGA实现按键模块
- 2022-08-29,FPGA实现UART串口通信
- 2022-08-16,FPGA实现串口多比特发送接收模块
- 2022-08-20,FPGA实现IIC协议
- 2022-08-29,FPGA实现数码管显示
- 2022-02-26,FPGA数字时钟
- 2022-08-30,FPGA实现DS18B20温度采集
- 2022-08-31,FPGA驱动OLED屏幕
- 2022-09-04,串口上位机模拟OLED屏
- 2022-09-06,FPGA驱动OLED显示字符
- 2022-09-07,FPGA采集DHT11温湿度
- 2022-09-08,FPGA在OLED上显示DHT11数据
- 2022-09-14,FPGA解析红外遥控信号
- 2022-09-24,FPGA实现超声波测距
- 2022-10-02,FPGA舵机驱动
- 2022-10-06,FPGA驱动VGA显示屏
- 2022-10-09,OV5640摄像头简介与SCCB时序
- 2022-10-14,FPGA驱动OV5640上电及初始化
- 2022-10-16,FPGA实现SDRAM控制器
- 2022-10-22,串口VGA搭配SDRAM_FIFO显示图片
- 2022-11-06,FPGA实现Sobel算法进行边沿检测
- 2022-12-03,FPGA的工作原理,一篇全掌握!
- 2023-05-20,FPGA实现MPU6050姿态解算
- 微信公众号「FPGA技术江湖」
- 2020-07-24,基于FPGA的单目内窥镜定位系统设计(上)
- 2020-07-25,基于FPGA的单目内窥镜定位系统设计(中)
- 2020-07-26,基于FPGA的单目内窥镜定位系统设计(下)
- 2023-02-12,往期精选:基于FPGA的电子计算器系统设计(附代码)
- 2023-02-14,国产芯片生态图谱(2022最新版)
- 2023-04-21,万能芯片 — FPGA
- 2023-04-28,为什么需要FPGA原型验证?
- 2023-05-11,基于FPGA的实时图像边缘检测系统设计(附代码)
- 2023-05-16,基于FPGA的单目内窥镜定位系统设计(附代码)
- 2023-05-18,基于FPGA的CAN总线控制器的设计
- 2023-05-22,基于FPGA的以太网控制器(MAC)设计
- 2023-05-23,如何在 FPGA 中做数学运算
- 微信公众号「疯狂的FPGA」
- 微信公众号「FPGA探索者」
- 微信公众号「FPGA之家」
- 微信公众号「深蓝AI」
- 微信公众号「AIIC Xidian」
- 微信公众号「FPGA设计论坛」
- 微信公众号「数字IC打工人」
- 微信公众号「中国计算机学会」
- 微信公众号「硬件起源」
- 微信公众号「芯东西」
- 微信公众号「IT服务圈儿」
- 微信公众号「电子工程专辑」
- 微信公众号「建约车评」
- 微信公众号「ittbank」
- 微信公众号「半导体芯闻」
- 微信公众号「AI智胜未来」
- 微信公众号「机器之心」
- 微信公众号「半导体行业观察」
- 微信公众号「嵌入式Linux」
- 微信公众号「硅农亚历山大」
- 微信公众号「传感器专家网」
- 微信公众号「泰晓科技」
- 微信公众号「佐思汽车研究」
- 微信公众号「FPGA研究院」
- 微信公众号「FPGA开发圈」
- 微信公众号「OpenIC」
- 微信公众号「腾讯科技」
-
bilibili「老石谈芯」| 微信公众号「老石谈芯」
-
- 微信公众号「NVIDIA英伟达」
- 微信公众号「NVIDIA英伟达企业解决方案」
- 微信公众号「AI不止算法」
- 微信公众号「澎峰科技PerfXLab」
- 2022-10-18,深入浅出GPU优化系列:reduce优化
- 2022-10-31,深入浅出GPU优化系列:spmv优化
- 2023-05-24,深入浅出GPU优化系列:gemv优化
- 2023-05-24,深入浅出GPU优化系列:GEMM优化(一)
- 2023-06-02,深入浅出GPU优化系列:GEMM优化(二)
- 2023-06-16,深入浅出GPU优化系列:GEMM优化(三)
- 2023-06-26,深入浅出GPU优化系列:elementwise优化及CUDA工具链介绍
- 2023-06-27,漫谈高性能计算与性能优化:访存
- 2024-07-04,澎峰科技研发的高性能计算原语库PerfIPP库技术白皮书发布(附下载)
- 微信公众号「大猿搬砖简记」
- 微信公众号「oldpan博客」
- 2024-03-19,NVIDIA大语言模型落地的全流程解析
- 2024-03-20,TensorRT-LLM初探(二)简析了结构,用的更明白
- 2024-03-21,高性能 LLM 推理框架的设计与实现
- 2024-04-15,[深入分析CUTLASS系列] 0x01 cutlass 源码分析(零) --- 软件架构(附ncu性能分析方法)
- 2024-04-21,搞懂 NVIDIA GPU 性能指标 很容易弄混的一个概念: Utilization vs Saturation
- 2024-04-22,快速提升性能,如何更好地使用GPU(上)
- 2024-05-14,快速提升性能,如何更好地使用GPU(下)
- 2024-05-22,大模型精度(FP16,FP32,BF16)详解与实践
- 微信公众号「DeepPrompting」
- 微信公众号「机器学习研究组订阅」
- 微信公众号「自动驾驶之心」
- 微信公众号「Meet DSA」
- 微信公众号「AI寒武纪」
- 微信公众号「关于NLP那些你不知道的事」
- 微信公众号「InfoQ」
- 微信公众号「机器之心」
- 微信公众号「新智元」
- 微信公众号「GitHubStore」
- 微信公众号「云云众生s」
- 微信公众号「手写AI」
- 微信公众号「美团技术团队」
- 微信公众号「GiantPandaCV」
- 微信公众号「GitHubFun网站」
- 微信公众号「大模型生态圈」
- 微信公众号「苏哲管理咨询」
- 微信公众号「后来遇见AI」
- 2022-08-08,【机器学习】K均值聚类算法原理
- 2022-08-11,【CUDA编程】基于CUDA的Kmeans算法的简单实现
- 2024-01-23,【CUDA编程】基于 CUDA 的 Kmeans 算法的进阶实现(一)
- 2024-01-24,【CUDA编程】基于 CUDA 的 Kmeans 算法的进阶实现(二)
- 2024-04-08,【CUDA编程】CUDA 统一内存
- 微信公众号「江大白」
- 微信公众号「Tim在路上」
- 微信公众号「潮观世界」
- 微信公众号「DeepDriving」
- 微信公众号「GPUS开发者」
- 微信公众号「人工智能大讲堂」
- 微信公众号「未来科技潮」
- 微信公众号「AI道上」
- 微信公众号「科技译览」
- 微信公众号「小白学视觉」
- 微信公众号「卡巴斯」
- 微信公众号「码砖杂役」
- 微信公众号「星想法」
- 微信公众号「太极图形」
- 微信公众号「硅星人Pro」
- 微信公众号「3D视觉之心」
- 微信公众号「中国企业家杂志」
- 微信公众号「CSharp与边缘模型部署」
- 微信公众号「NeuralTalk」
- 微信公众号「小吴持续学习AI」
- 微信公众号「大模型新视界」
- 微信公众号「量子位」
- 微信公众号「HPC智能流体大本营」
- 微信公众号「人工智能前沿讲习」
- bilibili「权双」
- 微信公众号「高通内推王」
- 微信公众号「美团技术团队」
- 微信公众号「大模型生态圈」
- 微信公众号「神仙外企」
- 微信公众号「Cver」
- 知乎「Tim在路上」
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for awesome-hpc-cuda-fpga
Similar Open Source Tools
awesome-cuda-tensorrt-fpga
Okay, here is a JSON object with the requested information about the awesome-cuda-tensorrt-fpga repository:
aim
Aim is an open-source, self-hosted ML experiment tracking tool designed to handle 10,000s of training runs. Aim provides a performant and beautiful UI for exploring and comparing training runs. Additionally, its SDK enables programmatic access to tracked metadata — perfect for automations and Jupyter Notebook analysis. **Aim's mission is to democratize AI dev tools 🎯**
chatgpt-infinity
ChatGPT Infinity is a free and powerful add-on that makes ChatGPT generate infinite answers on any topic. It offers customizable topic selection, multilingual support, adjustable response interval, and auto-scroll feature for a seamless chat experience.
chatgpt-auto-continue
ChatGPT Auto-Continue is a userscript that automatically continues generating ChatGPT responses when chats cut off. It relies on the powerful chatgpt.js library and is easy to install and use. Simply install Tampermonkey and ChatGPT Auto-Continue, and visit chat.openai.com as normal. Multi-reply conversations will automatically continue generating when cut-off!
duckduckgpt
DuckDuckGPT brings the magic of ChatGPT to DDG (powered by GPT-4!). DuckDuckGPT is a browser extension that allows you to use ChatGPT within DuckDuckGo. This means you can ask ChatGPT questions, get help with tasks, and generate creative content, all without leaving DuckDuckGo. DuckDuckGPT is easy to use. Once you have installed the extension, simply type your question into the DuckDuckGo search bar and hit enter. ChatGPT will then generate a response that will appear below the search results. DuckDuckGPT is a powerful tool that can help you with a wide variety of tasks. Here are just a few examples of what you can use it for: * Get help with research * Write essays and other creative content * Translate languages * Get coding help * Answer trivia questions * And much more! DuckDuckGPT is still in development, but it is already a very powerful tool. As GPT-4 continues to improve, DuckDuckGPT will only get better. So if you are looking for a way to make your DuckDuckGo searches more productive, be sure to give DuckDuckGPT a try.
chatgpt-widescreen
ChatGPT Widescreen Mode is a browser extension that adds widescreen and fullscreen modes to ChatGPT, enhancing chat sessions by reducing scrolling and creating a more immersive viewing experience. Users can experience clearer programming code display, view multi-step instructions or long recipes on a single page, enjoy original content in a visually pleasing format, customize features like a larger chatbox and hidden header/footer, and use the tool with chat.openai.com and poe.com. The extension is compatible with various browsers and relies on code from the chatgpt.js library under the MIT license.
langchat
LangChat is an enterprise AIGC project solution in the Java ecosystem. It integrates AIGC large model functionality on top of the RBAC permission system to help enterprises quickly customize AI knowledge bases and enterprise AI robots. It supports integration with various large models such as OpenAI, Gemini, Ollama, Azure, Zhifu, Alibaba Tongyi, Baidu Qianfan, etc. The project is developed solely by TyCoding and is continuously evolving. It features multi-modality, dynamic configuration, knowledge base support, advanced RAG capabilities, function call customization, multi-channel deployment, workflows visualization, AIGC client application, and more.
TensorRT-LLM
TensorRT-LLM is an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).
ST-LLM
ST-LLM is a temporal-sensitive video large language model that incorporates joint spatial-temporal modeling, dynamic masking strategy, and global-local input module for effective video understanding. It has achieved state-of-the-art results on various video benchmarks. The repository provides code and weights for the model, along with demo scripts for easy usage. Users can train, validate, and use the model for tasks like video description, action identification, and reasoning.
bravegpt
BraveGPT is a userscript that brings the power of ChatGPT to Brave Search. It allows users to engage with a conversational AI assistant directly within their search results, providing instant and personalized responses to their queries. BraveGPT is powered by GPT-4, the latest and most advanced language model from OpenAI, ensuring accurate and comprehensive answers. With BraveGPT, users can ask questions, get summaries, generate creative content, and more, all without leaving the Brave Search interface. The tool is easy to install and use, making it accessible to users of all levels. BraveGPT is a valuable addition to the Brave Search experience, enhancing its capabilities and providing users with a more efficient and informative search experience.
cf-proxy-ex
Cloudflare Proxy EX is a tool that provides Cloudflare super proxy, OpenAI/ChatGPT proxy, Github acceleration, and online proxy services. It allows users to create a worker in Cloudflare website by copying the content from worker.js file, and add their domain name before any URL to use the tool. The tool is an improvement based on gaboolic's cloudflare-reverse-proxy, offering features like removing '/proxy/', handling redirection events, modifying headers, converting relative paths to absolute paths, and more. It aims to enhance proxy functionality and address issues faced by some websites. However, users are advised not to log in to any website through the online proxy due to potential security risks.