Plug-play-modules
AI人工智能、深度学习领域,2025年全网最全即插即用模块,包含各种卷积变种、最新注意力机制、特征融合模块、上下采样模块,持续更新中......
Stars: 264
Plug-play-modules is a comprehensive collection of plug-and-play modules for AI, deep learning, and computer vision applications. It includes various convolution variants, latest attention mechanisms, feature fusion modules, up-sampling/down-sampling modules, suitable for tasks like image classification, object detection, instance segmentation, semantic segmentation, single object tracking (SOT), multi-object tracking (MOT), infrared object tracking (RGBT), image de-raining, de-fogging, de-blurring, super-resolution, and more. The modules are designed to enhance model performance and feature extraction capabilities across various tasks.
README:
2025年全网最全即插即用模块,包含各种卷积变种、最新注意力机制、特征融合模块、上下采样模块,适用于AI人工智能、深度学习、计算机视觉CV领域,适用于图像分类、目标检测、实例分割、语义分割、单目标跟踪(SOT)、多目标跟踪(MOT)、红外目标跟踪(RGBT)、图像去雨、去雾、去模糊、超分等任务,持续更新中......
题目:FFT-based Dynamic Token Mixer for Vision
中文题目:基于FFT的动态令牌混合器在视觉中的应用
地址:https://arxiv.org/pdf/2303.03932
论文解读:【AAAI 2024】计算复杂度更低,基于FFT的动态滤波器模块,即插即用
题目:Wavelet Convolutions for Large Receptive Fields
地址:https://arxiv.org/pdf/2407.05848v2
论文解读:【ECCV 2024】大感受野的小波卷积,即插即用,显著提高CNN的有效感受野
题目:Frequency-aware Feature Fusion for Dense Image Prediction
地址:https://arxiv.org/pdf/2408.12879
论文解读:【TPAMI 2024】即插即用的FreqFusion特征融合模块,语义分割、目标检测、实例分割和全景分割统统涨点!
题目:Attention Is All You Need
地址:https://arxiv.org/pdf/1706.03762
论文解读:【被引13w+】Scaled Dot-Product Attention(缩放点积注意力),被引最高的注意力机制
题目:Agent Attention: On the Integration of Softmax and Linear Attention
地址:https://arxiv.org/pdf/2312.08874
论文解读:【ECCV 2024】新注意力范式——Agent Attention,整合Softmax与线性注意力
题目:Deformable ConvNets v2: More Deformable, Better Results
地址:https://arxiv.org/pdf/1811.11168
论文解读:可变形卷积(DCNv2),即插即用,直接替换普通卷积,助力模型涨点!增强网络特征提取能力!
题目:Efficient Multi-Scale Attention Module with Cross-Spatial Learning
中文题目:高效的跨空间学习多尺度注意力模块
地址:https://arxiv.org/pdf/2305.13563v2
论文解读:【ICCASSP 2023】即插即用的高效多尺度注意力EMA,战胜SE、CBAM、SA、CA等注意力
论文题目:SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks
SimAM:一种用于卷积神经网络的简单、无参数的注意力模块
论文链接:http://proceedings.mlr.press/v139/yang21o/yang21o.pdf
论文解读:【ICML 2021】无参数注意力模块SimAM,即插即用,不超过10行代码,有效涨点!
论文题目:Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks
跑起来!不要走:追求更高FLOPS以实现更快的神经网络
论文链接:https://arxiv.org/pdf/2303.03667
论文解读:【CVPR 2023】部分卷积(PConv),简单有效,即插即用,快速涨点
论文题目:LSKNet: A Foundation Lightweight Backbone forRemote Sensing
中文题目: LSKNet: 一种用于遥感的基础轻量级骨干网络论文
链接:https://arxiv.org/pdf/2403.11735
所属机构: 南开大学天津,湖南先进技术研发研究院长沙,深圳福田NKAIRI
关键词: 遥感,CNN骨干,大核,注意力,目标检测,语义分割
论文解读:【IJCV 2024】大选择核模块LSK,可当作卷积块进行替换,即插即用,极大增加感受野
论文题目:VOLO: Vision Outlooker for Visual Recognition
中文题目: VOLO:视觉识别的视觉展望器
论文链接:https://arxiv.org/pdf/2106.13112
所属机构: Sea AI Lab和新加坡国立大学
论文解读:【TPAMI 2022】Outlook Attention,即插即用,捕捉细节和局部特征,助力模型涨点
论文题目:Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation
中文题目: Haar小波下采样:一个简单但有效的语义分割下采样模块
论文链接:https://www.sciencedirect.com/science/article/pii/S0031320323005174
论文解读:【PR 2023】Haar小波下采样,即插即用,几行代码,简单有效提高语义分割准确性
论文题目:SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention
中文题目: SCSA: 探索空间注意力和通道注意力之间的协同效应
论文链接:https://arxiv.org/pdf/2407.05128
所属机构:浙江师范大学计算机科学与技术学院,杭州人工智能研究所,北京极客智能科技有限公司
关键词:多语义信息,语义差异,空间注意力,通道注意力,协同效应
论文解读:【arXiv 2024】最新!空间和通道协同注意力SCSA,即插即用,分类、检测、分割涨点!
论文题目:Poly Kernel Inception Network for Remote Sensing Detection
中文题目: 面向遥感检测的多核Inception
所属机构:南京理工大学, 中国传媒大学, 浙江大学关键词:遥感图像, 目标检测, 多尺度卷积核, 上下文锚点注意力, PKINet
论文解读:【CVPR 2024】上下文锚点注意力CAA,即插即用,助力目标检测涨点!
论文题目:LDConv: Linear deformable convoluton for improving convolutioanl neural networks
中文题目: LDConv:用于改进卷积神经网络的线性可变形卷积
论文链接:https://doi.org/10.1016/j.imavis.2024.105190
所属机构:重庆师范大学,西南大学
关键词:新型卷积操作、任意采样形状、任意数量的参数、目标检测
论文解读:【IVC 2024】线性可变形卷积LDConv,即插即用,高效提取特征,YOLO网络涨点神器!
论文题目:Omni-Dimensional Dynamic Convolution
中文题目: 全维度动态卷积论文链接:https://openreview.net/pdf?id=DmpCfq6Mg39
官方github:https://github.com/OSVAI/ODConv
所属机构:英特尔中国实验室,香港中文大学-商汤科技联合实验室
关键词:动态卷积, 注意力机制, 卷积核空间, 深度学习, 计算机视觉
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Plug-play-modules
Similar Open Source Tools
Plug-play-modules
Plug-play-modules is a comprehensive collection of plug-and-play modules for AI, deep learning, and computer vision applications. It includes various convolution variants, latest attention mechanisms, feature fusion modules, up-sampling/down-sampling modules, suitable for tasks like image classification, object detection, instance segmentation, semantic segmentation, single object tracking (SOT), multi-object tracking (MOT), infrared object tracking (RGBT), image de-raining, de-fogging, de-blurring, super-resolution, and more. The modules are designed to enhance model performance and feature extraction capabilities across various tasks.
AstrBot
AstrBot is a powerful and versatile tool that leverages the capabilities of large language models (LLMs) like GPT-3, GPT-3.5, and GPT-4 to enhance communication and automate tasks. It seamlessly integrates with popular messaging platforms such as QQ, QQ Channel, and Telegram, enabling users to harness the power of AI within their daily conversations and workflows.
Bert-VITS2
Bert-VITS2 is a repository that provides a backbone with multilingual BERT for text-to-speech (TTS) applications. It offers an alternative to BV2/GSV projects and is inspired by the MassTTS project. Users can refer to the code to learn how to train models for TTS. The project is not maintained actively in the short term. It is not to be used for any purposes that violate the laws of the People's Republic of China, and strictly prohibits any political-related use.
Steel-LLM
Steel-LLM is a project to pre-train a large Chinese language model from scratch using over 1T of data to achieve a parameter size of around 1B, similar to TinyLlama. The project aims to share the entire process including data collection, data processing, pre-training framework selection, model design, and open-source all the code. The goal is to enable reproducibility of the work even with limited resources. The name 'Steel' is inspired by a band '万能青年旅店' and signifies the desire to create a strong model despite limited conditions. The project involves continuous data collection of various cultural elements, trivia, lyrics, niche literature, and personal secrets to train the LLM. The ultimate aim is to fill the model with diverse data and leave room for individual input, fostering collaboration among users.
chatgpt-auto-continue
ChatGPT Auto-Continue is a userscript that automatically continues generating ChatGPT responses when chats cut off. It relies on the powerful chatgpt.js library and is easy to install and use. Simply install Tampermonkey and ChatGPT Auto-Continue, and visit chat.openai.com as normal. Multi-reply conversations will automatically continue generating when cut-off!
aidea
AIdea is an app that integrates mainstream large language models and drawing models, developed using Flutter. The code is completely open-source and supports various functions such as GPT-3.5, GPT-4 from OpenAI, Claude instant, Claude 2.1 from Anthropic, Gemini Pro and visual language models from Google, as well as various Chinese and open-source models. It also supports features like text-to-image, super-resolution, coloring black and white images, artistic fonts, artistic QR codes, and more.
ruoyi-ai
ruoyi-ai is a platform built on top of ruoyi-plus to implement AI chat and drawing functionalities on the backend. The project is completely open source and free. The backend management interface uses elementUI, while the server side is built using Java 17 and SpringBoot 3.X. It supports various AI models such as ChatGPT4, Dall-E-3, ChatGPT-4-All, voice cloning based on GPT-SoVITS, GPTS, and MidJourney. Additionally, it supports WeChat mini programs, personal QR code real-time payments, monitoring and AI auto-reply in live streaming rooms like Douyu and Bilibili, and personal WeChat integration with ChatGPT. The platform also includes features like private knowledge base management and provides various demo interfaces for different platforms such as mobile, web, and PC.
VideoRefer
VideoRefer Suite is a tool designed to enhance the fine-grained spatial-temporal understanding capabilities of Video Large Language Models (Video LLMs). It consists of three primary components: Model (VideoRefer) for perceiving, reasoning, and retrieval for user-defined regions at any specified timestamps, Dataset (VideoRefer-700K) for high-quality object-level video instruction data, and Benchmark (VideoRefer-Bench) to evaluate object-level video understanding capabilities. The tool can understand any object within a video.
gpt-rss
GPT RSS is a tool that allows users to stay up-to-date on the latest AIGC/GPT/LLM articles by定时抓取前沿 AIGC / GPT / LLM 文章. It features a user-friendly interface that supports PC and mobile devices, as well as search and filter functions. GPT RSS is built using Vue3 and Vant UI component library, and utilizes Node.js for定时任务 to update articles daily.
auto-round
AutoRound is an advanced weight-only quantization algorithm for low-bits LLM inference. It competes impressively against recent methods without introducing any additional inference overhead. The method adopts sign gradient descent to fine-tune rounding values and minmax values of weights in just 200 steps, often significantly outperforming SignRound with the cost of more tuning time for quantization. AutoRound is tailored for a wide range of models and consistently delivers noticeable improvements.
AIBotPublic
AIBotPublic is an open-source version of AIBotPro, a comprehensive AI tool that provides various features such as knowledge base construction, AI drawing, API hosting, and more. It supports custom plugins and parallel processing of multiple files. The tool is built using bootstrap4 for the frontend, .NET6.0 for the backend, and utilizes technologies like SqlServer, Redis, and Milvus for database and vector database functionalities. It integrates third-party dependencies like Baidu AI OCR, Milvus C# SDK, Google Search, and more to enhance its capabilities.
intel-extension-for-transformers
Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU. The toolkit provides the below key features and examples: * Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor) * Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754)) * Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa) * [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of [plugins](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat/docs/advanced_features.md) such as [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), and [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md). This framework supports Intel Gaudi2/CPU/GPU. * [Inference](https://github.com/intel/neural-speed/tree/main) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels for Intel CPU and Intel GPU (TBD), supporting [GPT-NEOX](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox), [LLAMA](https://github.com/intel/neural-speed/tree/main/neural_speed/models/llama), [MPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/mpt), [FALCON](https://github.com/intel/neural-speed/tree/main/neural_speed/models/falcon), [BLOOM-7B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/bloom), [OPT](https://github.com/intel/neural-speed/tree/main/neural_speed/models/opt), [ChatGLM2-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/chatglm), [GPT-J-6B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptj), and [Dolly-v2-3B](https://github.com/intel/neural-speed/tree/main/neural_speed/models/gptneox). Support AMX, VNNI, AVX512F and AVX2 instruction set. We've boosted the performance of Intel CPUs, with a particular focus on the 4th generation Intel Xeon Scalable processor, codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html).
VideoLLaMA2
VideoLLaMA 2 is a project focused on advancing spatial-temporal modeling and audio understanding in video-LLMs. It provides tools for multi-choice video QA, open-ended video QA, and video captioning. The project offers model zoo with different configurations for visual encoder and language decoder. It includes training and evaluation guides, as well as inference capabilities for video and image processing. The project also features a demo setup for running a video-based Large Language Model web demonstration.
LLaVA-pp
This repository, LLaVA++, extends the visual capabilities of the LLaVA 1.5 model by incorporating the latest LLMs, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B. It provides various models for instruction-following LMMS and academic-task-oriented datasets, along with training scripts for Phi-3-V and LLaMA-3-V. The repository also includes installation instructions and acknowledgments to related open-source contributions.
chatgpt-widescreen
ChatGPT Widescreen Mode is a browser extension that adds widescreen and fullscreen modes to ChatGPT, enhancing chat sessions by reducing scrolling and creating a more immersive viewing experience. Users can experience clearer programming code display, view multi-step instructions or long recipes on a single page, enjoy original content in a visually pleasing format, customize features like a larger chatbox and hidden header/footer, and use the tool with chat.openai.com and poe.com. The extension is compatible with various browsers and relies on code from the chatgpt.js library under the MIT license.
TEN-Agent
TEN Agent is an open-source multimodal agent powered by the world’s first real-time multimodal framework, TEN Framework. It offers high-performance real-time multimodal interactions, multi-language and multi-platform support, edge-cloud integration, flexibility beyond model limitations, and real-time agent state management. Users can easily build complex AI applications through drag-and-drop programming, integrating audio-visual tools, databases, RAG, and more.
For similar tasks
Plug-play-modules
Plug-play-modules is a comprehensive collection of plug-and-play modules for AI, deep learning, and computer vision applications. It includes various convolution variants, latest attention mechanisms, feature fusion modules, up-sampling/down-sampling modules, suitable for tasks like image classification, object detection, instance segmentation, semantic segmentation, single object tracking (SOT), multi-object tracking (MOT), infrared object tracking (RGBT), image de-raining, de-fogging, de-blurring, super-resolution, and more. The modules are designed to enhance model performance and feature extraction capabilities across various tasks.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. It provides a common API to deliver inference solutions on various platforms, including CPU, GPU, NPU, and heterogeneous devices. OpenVINO™ supports pre-trained models from Open Model Zoo and popular frameworks like TensorFlow, PyTorch, and ONNX. Key components of OpenVINO™ include the OpenVINO™ Runtime, plugins for different hardware devices, frontends for reading models from native framework formats, and the OpenVINO Model Converter (OVC) for adjusting models for optimal execution on target devices.
djl-demo
The Deep Java Library (DJL) is a framework-agnostic Java API for deep learning. It provides a unified interface to popular deep learning frameworks such as TensorFlow, PyTorch, and MXNet. DJL makes it easy to develop deep learning applications in Java, and it can be used for a variety of tasks, including image classification, object detection, natural language processing, and speech recognition.
kaapana
Kaapana is an open-source toolkit for state-of-the-art platform provisioning in the field of medical data analysis. The applications comprise AI-based workflows and federated learning scenarios with a focus on radiological and radiotherapeutic imaging. Obtaining large amounts of medical data necessary for developing and training modern machine learning methods is an extremely challenging effort that often fails in a multi-center setting, e.g. due to technical, organizational and legal hurdles. A federated approach where the data remains under the authority of the individual institutions and is only processed on-site is, in contrast, a promising approach ideally suited to overcome these difficulties. Following this federated concept, the goal of Kaapana is to provide a framework and a set of tools for sharing data processing algorithms, for standardized workflow design and execution as well as for performing distributed method development. This will facilitate data analysis in a compliant way enabling researchers and clinicians to perform large-scale multi-center studies. By adhering to established standards and by adopting widely used open technologies for private cloud development and containerized data processing, Kaapana integrates seamlessly with the existing clinical IT infrastructure, such as the Picture Archiving and Communication System (PACS), and ensures modularity and easy extensibility.
MONAI
MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging. It provides a comprehensive set of tools for medical image analysis, including data preprocessing, model training, and evaluation. MONAI is designed to be flexible and easy to use, making it a valuable resource for researchers and developers in the field of medical imaging.
nnstreamer
NNStreamer is a set of Gstreamer plugins that allow Gstreamer developers to adopt neural network models easily and efficiently and neural network developers to manage neural network pipelines and their filters easily and efficiently.
cortex
Nitro is a high-efficiency C++ inference engine for edge computing, powering Jan. It is lightweight and embeddable, ideal for product integration. The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.