Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
Stars: 296
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
README:
This repo contains a comprehensive paper list of Model Quantization for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
This repo is being actively updated, and contributions in any form to make this list more comprehensive are welcome. Special thanks to collaborator Zhikai Li, and all researchers who have contributed to this repo!
If you find this repo useful, please consider ★STARing and feel free to share it with others!
[Update: Apr, 2024] Add new papers from AAAI-24.
[Update: Nov, 2023] Add new papers from NeurIPS-23.
[Update: Oct, 2023] Add new papers from ICCV-23.
[Update: Jul, 2023] Add new papers from AAAI-23 and ICML-23.
[Update: Jun, 2023] Add new arXiv papers uploaded in May 2023, especially the hot LLM quantization field.
[Update: Jun, 2023] Reborn this repo! New style, better experience!
Keywords: PTQ
: post-training quantization | Non-uniform
: non-uniform quantization | MP
: mixed-precision quantization | Extreme
: binary or ternary quantization
- "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [paper]
- "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [paper]
- "A White Paper on Neural Network Quantization", arXiv, 2021. [paper]
- "Binary Neural Networks: A Survey", PR, 2020. [Paper] [
Extreme
]
- "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [paper] [
Extreme
] - "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [paper]
- "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [paper] [
PTQ
] [MP
] - "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [paper] [
PTQ
] [MP
] - "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [paper] [code]
- "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [paper] [code] [
PTQ
] - "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [paper]
- "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [paper] [
Extreme
] - "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [paper]
- "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [paper]
- "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [paper] [code]
- "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [paper]
- "Variation-aware Vision Transformer Quantization", arXiv, 2023. [paper]
- "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [paper] [
PTQ
] - "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [paper]
- "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [paper]
- "Output Sensitivity-Aware DETR Quantization", 2023. [paper]
- "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [paper] [
PTQ
] - "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [paper] [code]
- "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [paper] [code] [
PTQ
] - "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [paper] [code] [
PTQ
] - "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [paper] [code] [
PTQ
] - "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [paper]
- "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [paper] [
PTQ
]
- "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [paper]
- "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [paper]
- "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [paper]
- "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [paper] [
PTQ
] - "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [paper]
- "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [paper]
- "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [paper]
- "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [paper]
- "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [paper]
- "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [paper]
- "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [paper]
- "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [paper]
- "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [paper]
- "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [paper]
- "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", arXiv, 2024. [paper]
- "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [paper]
- "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [paper]
- "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [paper] [
PTQ
] - "CBQ: Cross-Block Quantization for Large Language Models", arXiv, 2023. [paper] [
PTQ
] - "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [paper] [
PTQ
] - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [paper]
- "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [paper] [
PTQ
] - "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [paper]
- "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [paper]
- "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [paper]
- "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [paper] [code]
- "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [paper] [code] [
PTQ
] - "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [paper]
- "SqueezeLLM: Dense-and-Sparse Quantization", arXiv, 2023. [paper] [
PTQ
] [Non-uniform
] - "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [paper]
- "PB-LLM: Partially Binarized Large Language Models", arXiv, 2023. [paper] [
Extreme
] - "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [paper]
- "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [paper]
- "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [paper]
- "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [paper]
- "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [paper]
- "LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models", arXiv, 2023. [paper]
- "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", arXiv, 2023. [paper] [
PTQ
] - "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", arXiv, 2023. [paper]
- "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", arXiv, 2023. [paper]
- "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [paper]
- "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [paper]
- "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [paper]
- "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [paper]
- "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [paper]
- "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [paper]
- "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [paper]
- "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", arXiv, 2023. [paper]
- "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [paper] [
PTQ
] - "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [paper] [
PTQ
] - "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [paper] [
Non-uniform
] - "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [paper]
- "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [paper]
- "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [paper]
- "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [paper]
- "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [paper] [code]
- "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [paper] [
PTQ
] - "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [paper]
- "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", arXiv, 2023. [paper] [
PTQ
] - "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [paper] [
PTQ
] - "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [paper]
- "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [paper] [
PTQ
] - "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [paper] [code] [
PTQ
] - "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [paper]
- "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [paper]
- "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [paper]
- "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [paper] [code] [
PTQ
] - "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [papar] [code] [
PTQ
] - "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", arXiv, 2022. [paper]
- "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [paper] [code] [
Extreme
] - "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [paper] [code] [
Extreme
] - "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [paper] [code] [
PTQ
] - "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [paper] [code]
- "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [paper] [
PTQ
] - "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [paper] [code] [
PTQ
] - "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [paper]
- "I-BERT: Integer-only BERT Quantization", ICML, 2021. [paper] [code]
- "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [paper] [code] [
Extreme
] - "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [paper]
- "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [paper] [code]
- "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [paper]
- "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [paper] [code] [
Extreme
] - "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [paper]
- "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [paper]
- "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [paper]
- "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [paper]
- "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [paper]
- "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [paper]
- "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [paper]
- "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [paper]
- "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [paper] [
PTQ
] - "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", arXiv, 2023. [paper]
- "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [paper]
- "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [paper]
- "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [paper] [
PTQ
] - "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [paper]
- "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [paper]
- "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", arXiv, 2023. [paper]
- "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [paper] [code] [
PTQ
] - "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [paper] [
PTQ
] - "Post-training Quantization on Diffusion Models", CVPR, 2023. [paper] [code] [
PTQ
]
- "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [paper]
- "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [paper] [
MP
] - "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [paper]
- "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [paper] [
PTQ
] - "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [paper]
- "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [paper]
- "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [paper] [
Extreme
] - "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [paper]
- "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [paper]
- "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [paper] [code]
- "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [paper]
- "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [paper] [code]
- "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [paper]
- "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [paper] [
MP
] - "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [paper] [
PTQ
] - "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [paper] [code]
- "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [paper] [
PTQ
] - "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [paper]
- "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [paper] [
MP
] - "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [paper]
- "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [paper]
- "Resilient Binary Neural Network", AAAI, 2023. [paper] [
Extreme
] - "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [paper] [
Extreme
] - "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [paper]
- "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [paper] [
MP
] - "Adaptive Data-Free Quantization", CVPR, 2023. [paper]
- "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [paper] [
PTQ
] - "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [paper] [code] [
PTQ
] - "GENIE: Show Me the Data for Quantization", CVPR, 2023. [paper] [code] [
PTQ
] - "Bayesian asymmetric quantized neural networks", PR, 2023. [paper]
- "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [paper] [
Extreme
] - "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [paper] [
MP
] - "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [paper] [code]
- "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [paper] [code]
- "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [paper] [code]
- "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [paper] [code] [
Non-uniform
] - "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [paper] [code] [
Non-uniform
] - "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [paper] [
PTQ
] [Non-uniform
] - "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [paper] [
Non-uniform
] [MP
] - "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [paper] [code]
- "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [paper]
- "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [paper] [
PTQ
] - "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [paper]
- "Entropy-Driven Mixed-Precision Quantization for Deep Network Design", NeurIPS, 2022. [paper] [
MP
] - "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [paper]
- "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [paper] [code]
- "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [paper] [code] [
PTQ
] - "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [paper]
- "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [paper] [
PTQ
] [Non-uniform
] - "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [paper]
- "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [paper] [code]
- "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [paper]
- "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [paper] [Code] [code] [
MP
] - "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [paper]
- "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [paper] [code] [
PTQ
] - "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [paper]
- "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [paper]
- "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [paper] [code]
- "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [paper] [code]
- "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [paper] [code] [
PTQ
] - "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [paper] [code] [
PTQ
] - "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [paper] [
MP
] - "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [paper] [code] [
PTQ
] - "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [paper] [code]
- "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [paper] [code]
- "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [paper] [code] [
MP
] - "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [paper] [
MP
] - "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [paper] [code]
- "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [paper] [code]
- "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [paper] [code] [
PTQ
] - "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [paper] [
PTQ
] - "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [paper] [code]
- "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [paper]
- "Zero-shot Adversarial Quantization", CVPR, 2021. [paper] [code]
- "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [paper] [code]
- "High-Capacity Expert Binary Networks", ICLR, 2021. [paper] [code] [
Extreme
] - "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [paper] [code] [
Extreme
] - "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [paper] [code] [
PTQ
] - "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [paper]
- "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [paper]
- "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [paper] [code] [
MP
] - "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [paper]
- "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [paper]
- "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [paper]
- "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [paper] [
MP
] - "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [paper]
- "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [paper] [code]
- "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [paper]
- "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [paper] [
MP
] - "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [paper] [
PTQ
] [MP
] - "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [paper] [code] [
PTQ
] - "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [paper]
- "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [paper] [
MP
] - "Learned step size quantization", ICLR, 2020. [paper]
- "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [paper] [
MP
] - "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [paper] [
PTQ
] - "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [paper] [code] [
MP
] - "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [paper]
- "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [paper]
- "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [paper] [
PTQ
] - "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [paper]
- "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [paper] [code] [
Extreme
] - "Fully Quantized Network for Object Detection", CVPR, 2019. [paper]
- "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [paper]
- "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [paper] [code] [
PTQ
] - "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [paper] [
Extreme
] - "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution ", ECCV, 2022. [paper] [code]
- "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [paper] [code]
- "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [paper] [code]
- "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [paper] [code]
- "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [paper] [code]
- "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [paper] [
Extreme
]
- "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", arXiv, 2023. [paper] [
PTQ
] - "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [paper] [
Extreme
] - "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [paper] [code] [
Extreme
]
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-Quantization-Papers
Similar Open Source Tools
Awesome-Quantization-Papers
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
Everything-LLMs-And-Robotics
The Everything-LLMs-And-Robotics repository is the world's largest GitHub repository focusing on the intersection of Large Language Models (LLMs) and Robotics. It provides educational resources, research papers, project demos, and Twitter threads related to LLMs, Robotics, and their combination. The repository covers topics such as reasoning, planning, manipulation, instructions and navigation, simulation frameworks, perception, and more, showcasing the latest advancements in the field.
Awesome-LLM-Robotics
This repository contains a curated list of **papers using Large Language/Multi-Modal Models for Robotics/RL**. Template from awesome-Implicit-NeRF-Robotics Please feel free to send me pull requests or email to add papers! If you find this repository useful, please consider citing and STARing this list. Feel free to share this list with others! ## Overview * Surveys * Reasoning * Planning * Manipulation * Instructions and Navigation * Simulation Frameworks * Citation
awesome-llm-security
Awesome LLM Security is a curated collection of tools, documents, and projects related to Large Language Model (LLM) security. It covers various aspects of LLM security including white-box, black-box, and backdoor attacks, defense mechanisms, platform security, and surveys. The repository provides resources for researchers and practitioners interested in understanding and safeguarding LLMs against adversarial attacks. It also includes a list of tools specifically designed for testing and enhancing LLM security.
Awesome-Robotics-3D
Awesome-Robotics-3D is a curated list of 3D Vision papers related to Robotics domain, focusing on large models like LLMs/VLMs. It includes papers on Policy Learning, Pretraining, VLM and LLM, Representations, and Simulations, Datasets, and Benchmarks. The repository is maintained by Zubair Irshad and welcomes contributions and suggestions for adding papers. It serves as a valuable resource for researchers and practitioners in the field of Robotics and Computer Vision.
halbot
halbot is a Telegram bot that uses ChatGPT, Gemini, Mistral, and other AI engines to provide a variety of services, including text generation, translation, summarization, and question answering. It is easy to use and extend, and it can be integrated into your own projects. halbot is open source and free to use.
Awesome-Story-Generation
Awesome-Story-Generation is a repository that curates a comprehensive list of papers related to Story Generation and Storytelling, focusing on the era of Large Language Models (LLMs). The repository includes papers on various topics such as Literature Review, Large Language Model, Plot Development, Better Storytelling, Story Character, Writing Style, Story Planning, Controllable Story, Reasonable Story, and Benchmark. It aims to provide a chronological collection of influential papers in the field, with a focus on citation counts for LLMs-era papers and some earlier influential papers. The repository also encourages contributions and feedback from the community to improve the collection.
Paper-Reading-ConvAI
Paper-Reading-ConvAI is a repository that contains a list of papers, datasets, and resources related to Conversational AI, mainly encompassing dialogue systems and natural language generation. This repository is constantly updating.
Efficient-LLMs-Survey
This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
Awesome-Code-LLM
Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, in lowercase letters: 'description' (detailed description of the repo, must be less than 400 words,Ensure that no line breaks and quotation marks.),'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,user may use those keyword to find the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in lowercase letters), 'answer' (in english languages)
Awesome-Segment-Anything
Awesome-Segment-Anything is a powerful tool for segmenting and extracting information from various types of data. It provides a user-friendly interface to easily define segmentation rules and apply them to text, images, and other data formats. The tool supports both supervised and unsupervised segmentation methods, allowing users to customize the segmentation process based on their specific needs. With its versatile functionality and intuitive design, Awesome-Segment-Anything is ideal for data analysts, researchers, content creators, and anyone looking to efficiently extract valuable insights from complex datasets.
Awesome-LLM-Compression
Awesome LLM compression research papers and tools to accelerate LLM training and inference.
awesome-AIOps
awesome-AIOps is a curated list of academic researches and industrial materials related to Artificial Intelligence for IT Operations (AIOps). It includes resources such as competitions, white papers, blogs, tutorials, benchmarks, tools, companies, academic materials, talks, workshops, papers, and courses covering various aspects of AIOps like anomaly detection, root cause analysis, incident management, microservices, dependency tracing, and more.
Awesome-LLM4RS-Papers
This paper list is about Large Language Model-enhanced Recommender System. It also contains some related works. Keywords: recommendation system, large language models
Awesome-TimeSeries-SpatioTemporal-LM-LLM
Awesome-TimeSeries-SpatioTemporal-LM-LLM is a curated list of Large (Language) Models and Foundation Models for Temporal Data, including Time Series, Spatio-temporal, and Event Data. The repository aims to summarize recent advances in Large Models and Foundation Models for Time Series and Spatio-Temporal Data with resources such as papers, code, and data. It covers various applications like General Time Series Analysis, Transportation, Finance, Healthcare, Event Analysis, Climate, Video Data, and more. The repository also includes related resources, surveys, and papers on Large Language Models, Foundation Models, and their applications in AIOps.
KG-LLM-Papers
KG-LLM-Papers is a repository that collects papers integrating knowledge graphs (KGs) and large language models (LLMs). It serves as a comprehensive resource for research on the role of KGs in the era of LLMs, covering surveys, methods, and resources related to this integration.
For similar tasks
Awesome-Quantization-Papers
This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.