Awesome-LLM-Constrained-Decoding

A curated list of papers related to constrained decoding of LLM, along with their relevant code and resources.

Stars: 180

Visit

Awesome-LLM-Constrained-Decoding is a curated list of papers, code, and resources related to constrained decoding of Large Language Models (LLMs). The repository aims to facilitate reliable, controllable, and efficient generation with LLMs by providing a comprehensive collection of materials in this domain.

README:

Awesome-LLM-Constrained-Decoding

Towards reliable, controllable and more efficient generation with Large Language Models (LLMs)

A curated list of papers related to constrained decoding of LLM, along with their relevant code and resources.

Awesome-LLM-Constrained-Decoding

Libraries

Library	Feature	Stars
guidance-ai/guidance	CFG, Regex, JSON Schema, Token Forcing, compatible with Transformers, LLAMA-CPP
outlines-dev/outlines	CFG, Unicode support, Hugging Face ecosystem, VLLM support
sgl-project/sglang	Regex support, emphasis on LLM inference efficiency, compressed FSM
eth-sri/lmql	Regex support, various constraints, more powerful control flow
jxnl/instructor	Try-Reject-Repeat approach to ensure constraints are met
microsoft/aici	A general framework of LLM controller with native support for CFG, Regex, JSON Schema
noamgat/lm-format-enforcer	Regex, JSON Schema, Beam Search etc.
mlc-ai/xgrammar	CFG, careful system optimizations
epfl-dlab/transformers-CFG	CFG (EBNF Interface), Compatible with Transformers, Easy to extend for research
uiuc-focal-lab/syncode	CFG generation that supports builtin grammars like JSON, Python, Go, and more
Dan-wanna-M/formatron	Regex, JSON Schema, CFG, etc

Disclaimer:

The libraries listed above are not exhaustive and are subject to change.
The features mentioned are 100% not exhaustive and I strongly recommend checking the respective repositories for more details.
The libraries are listed by the Github stars
If you are the author of a library and would like to add or update the information, please open an issue or submit a pull request.

Papers

Papers with are newly added papers (not necessarily newly published papers).

Date	Paper	Publication
2025-02	Lost in Space: Optimizing Tokens for Grammar-Constrained Decoding	Preprint
2025-02	Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence	Preprint
2025-02	Flexible and Efficient Grammar-Constrained Decoding	Preprint
2025-01	Generating Structured Outputs from Language Models: Benchmark and Studies	Preprint
2024-11	XGRAMMAR: FLEXIBLE AND EFFICIENT STRUCTURED GENERATION ENGINE FOR LARGE LANGUAGE MODELS	Preprint
2024-10	IterGen: Iterative Structured LLM Generation	Preprint
2024-08	Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models	Preprint
2024-08	FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking	Preprint
2024-07	Automata-based constraints for language model decoding	CoLM
2024-06	Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access	ACL
2024-05	Grammar-Aligned Decoding	Preprint
2024-03	SynCode: LLM Generation with Grammar Augmentation	Preprint
2024-03	Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation	ICML
2024-02	Constrained Decoding for Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars	Preprint
2024-02	Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents	Preprint
2023-12	SGLang: Efficient Execution of Structured Language Model Programs	Preprint
2023-12	Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context	NeurIPS
2023-11	Prompt Sketching for Large Language Models	Preprint
2023-11	Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs	PADL
2023-10	Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding	Preprint
2023-10	Amortizing intractable inference in large language models	ICLR
2023-10	KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection	EMNLP
2023-07	Efficient Guided Generation for Large Language Models	Preprint
2023-06	Grammar Prompting for Domain-Specific Language Generation with Large Language Models	NeurIPS
2023-06	Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning	EMNLP
2023-06	Prompting Is Programming: A Query Language for Large Language Models	PLDI
2023-05	Measuring and Mitigating Constraint Violations of In-Context Learning for Utterance-to-API Semantic Parsing	EMNLP Findings
2023-04	Tractable Control for Autoregressive Language Generation	ICML
2022-11	Validating Large Language Models with ReLM	MLSys
2022-11	CodePAD: Sequence-based Code Generation with Pushdown Automaton	ISSTA
2022-05	Gradient-Based Constrained Sampling from Language Models	EMNLP
2022-01	Synchromesh: Reliable code generation from pre-trained language models	ICLR
2021-12	PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models	EMNLP
2021-12	Constrained Language Models Yield Few-Shot Semantic Parsers	EMNLP
2021-12	Controlled Text Generation as Continuous Optimization with Multiple Constraints	NeurIPS
2021-06	NEUROLOGIC DECODING:(Un)supervised Neural Text Generation with Predicate Logic Constraints	NAACL
2019-05	A General-Purpose Algorithm for Constrained Sequential Inference	CoNLL
2019-05	Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting	NAACL
2018-09	CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling	AAAI
2018-05	Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation	NAACL
2018-04	Incorporating Discriminator in Sentence Generation: a Gibbs Sampling Method	AAAI
2017-12	Guided Open Vocabulary Image Captioning with Constrained Beam Search	EMNLP
2017-06	Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search	ACL

Benchmark & Datasets & Evaluation

Date	Paper	Publication
2024-05	COLLIE: Systematic Construction of Constrained Text Generation Tasks	ICLR
2024-02	JSON-mode Eval dataset	HF hub
2023-12	BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and Semantic Parsing	NeurIPS Track on Datasets and Benchmarks
2023-10	Evaluating Large Language Models on Controlled Generation Tasks	Preprint
2023-09	Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?	Preprint
2021-10	NLV corpus	CHI
2020-12	CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning	EMNLP Findings
2018-09	Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task	EMNLP

Survey

Date	Paper	Publication
2024-04	"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output	Preprint

Blog Posts

Many of the blogs are written by Outlines team, many thanks to them for their great work! ❤️

Related Awesome Lists

awesome-llm-json

Disclaimer

This list is not exhaustive and will be updated regularly. If you have any suggestions or want to add a paper, please feel free to open an issue or submit a pull request. We hope to include all relevant papers in this list.

Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue. Please make sure to read the Contributing Guidelines before contributing.

For Tasks:

Click tags to check more tools for each tasks

generate structured outputs improve llm performance control llm generation enhance code generation evaluate language models

For Jobs:

natural language processing engineer machine learning researcher data scientist ai model developer computational linguist

Alternative AI tools for Awesome-LLM-Constrained-Decoding

Similar Open Source Tools

Awesome-LLM-Constrained-Decoding

github

: 180

Awesome-Model-Merging-Methods-Theories-Applications

A comprehensive repository focusing on 'Model Merging in LLMs, MLLMs, and Beyond', providing an exhaustive overview of model merging methods, theories, applications, and future research directions. The repository covers various advanced methods, applications in foundation models, different machine learning subfields, and tasks like pre-merging methods, architecture transformation, weight alignment, basic merging methods, and more.

github

: 519

Awesome-Tabular-LLMs

This repository is a collection of papers on Tabular Large Language Models (LLMs) specialized for processing tabular data. It includes surveys, models, and applications related to table understanding tasks such as Table Question Answering, Table-to-Text, Text-to-SQL, and more. The repository categorizes the papers based on key ideas and provides insights into the advancements in using LLMs for processing diverse tables and fulfilling various tabular tasks based on natural language instructions.

github

: 151

ai-infra-learning

AI Infra Learning is a repository focused on providing resources and materials for learning about various topics related to artificial intelligence infrastructure. The repository includes documentation, papers, videos, and blog posts covering different aspects of AI infrastructure, such as large language models, memory management, decoding techniques, and text generation. Users can access a wide range of materials to deepen their understanding of AI infrastructure and improve their skills in this field.

github

: 135

are-copilots-local-yet

Current trends and state of the art for using open & local LLM models as copilots to complete code, generate projects, act as shell assistants, automatically fix bugs, and more. This document is a curated list of local Copilots, shell assistants, and related projects, intended to be a resource for those interested in a survey of the existing tools and to help developers discover the state of the art for projects like these.

github

: 511

llm-compression-intelligence

This repository presents the findings of the paper "Compression Represents Intelligence Linearly". The study reveals a strong linear correlation between the intelligence of LLMs, as measured by benchmark scores, and their ability to compress external text corpora. Compression efficiency, derived from raw text corpora, serves as a reliable evaluation metric that is linearly associated with model capabilities. The repository includes the compression corpora used in the paper, code for computing compression efficiency, and data collection and processing pipelines.

github

: 98

nntrainer

NNtrainer is a software framework for training neural network models on devices with limited resources. It enables on-device fine-tuning of neural networks using user data for personalization. NNtrainer supports various machine learning algorithms and provides examples for tasks such as few-shot learning, ResNet, VGG, and product rating. It is optimized for embedded devices and utilizes CBLAS and CUBLAS for accelerated calculations. NNtrainer is open source and released under the Apache License version 2.0.

github

: 135

speech-trident

Speech Trident is a repository focusing on speech/audio large language models, covering representation learning, neural codec, and language models. It explores speech representation models, speech neural codec models, and speech large language models. The repository includes contributions from various researchers and provides a comprehensive list of speech/audio language models, representation models, and codec models.

github

: 636

Awesome-Resource-Efficient-LLM-Papers

A curated list of high-quality papers on resource-efficient Large Language Models (LLMs) with a focus on various aspects such as architecture design, pre-training, fine-tuning, inference, system design, and evaluation metrics. The repository covers topics like efficient transformer architectures, non-transformer architectures, memory efficiency, data efficiency, model compression, dynamic acceleration, deployment optimization, support infrastructure, and other related systems. It also provides detailed information on computation metrics, memory metrics, energy metrics, financial cost metrics, network communication metrics, and other metrics relevant to resource-efficient LLMs. The repository includes benchmarks for evaluating the efficiency of NLP models and references for further reading.

github

: 105

rubra

Rubra is a collection of open-weight large language models enhanced with tool-calling capability. It allows users to call user-defined external tools in a deterministic manner while reasoning and chatting, making it ideal for agentic use cases. The models are further post-trained to teach instruct-tuned models new skills and mitigate catastrophic forgetting. Rubra extends popular inferencing projects for easy use, enabling users to run the models easily.

github

: 135

awesome-mobile-llm

Awesome Mobile LLMs is a curated list of Large Language Models (LLMs) and related studies focused on mobile and embedded hardware. The repository includes information on various LLM models, deployment frameworks, benchmarking efforts, applications, multimodal LLMs, surveys on efficient LLMs, training LLMs on device, mobile-related use-cases, industry announcements, and related repositories. It aims to be a valuable resource for researchers, engineers, and practitioners interested in mobile LLMs.

github

: 154

awesome-llm-planning-reasoning

The 'Awesome LLMs Planning Reasoning' repository is a curated collection focusing on exploring the capabilities of Large Language Models (LLMs) in planning and reasoning tasks. It includes research papers, code repositories, and benchmarks that delve into innovative techniques, reasoning limitations, and standardized evaluations related to LLMs' performance in complex cognitive tasks. The repository serves as a comprehensive resource for researchers, developers, and enthusiasts interested in understanding the advancements and challenges in leveraging LLMs for planning and reasoning in real-world scenarios.

github

: 117

LLM-KG4QA

LLM-KG4QA is a repository focused on the integration of Large Language Models (LLMs) and Knowledge Graphs (KGs) for Question Answering (QA). It covers various aspects such as using KGs as background knowledge, reasoning guideline, and refiner/filter. The repository provides detailed information on pre-training, fine-tuning, and Retrieval Augmented Generation (RAG) techniques for enhancing QA performance. It also explores complex QA tasks like Explainable QA, Multi-Modal QA, Multi-Document QA, Multi-Hop QA, Multi-run and Conversational QA, Temporal QA, Multi-domain and Multilingual QA, along with advanced topics like Optimization and Data Management. Additionally, it includes benchmark datasets, industrial and scientific applications, demos, and related surveys in the field.

github

: 80

LLMs-Planning

This repository contains code for three papers related to evaluating large language models on planning and reasoning about change. It includes benchmarking tools and analysis for assessing the planning abilities of large language models. The latest addition evaluates and enhances the planning and scheduling capabilities of a specific language reasoning model. The repository provides a static test set leaderboard showcasing model performance on various tasks with natural language and planning domain prompts.

github

: 329

PredictorLLM

PredictorLLM is an advanced trading agent framework that utilizes large language models to automate trading in financial markets. It includes a profiling module to establish agent characteristics, a layered memory module for retaining and prioritizing financial data, and a decision-making module to convert insights into trading strategies. The framework mimics professional traders' behavior, surpassing human limitations in data processing and continuously evolving to adapt to market conditions for superior investment outcomes.

github

: 57

LLM4Opt

LLM4Opt is a collection of references and papers focusing on applying Large Language Models (LLMs) for diverse optimization tasks. The repository includes research papers, tutorials, workshops, competitions, and related collections related to LLMs in optimization. It covers a wide range of topics such as algorithm search, code generation, machine learning, science, industry, and more. The goal is to provide a comprehensive resource for researchers and practitioners interested in leveraging LLMs for optimization tasks.

github

: 125

For similar tasks

Awesome-LLM-Constrained-Decoding

github

: 180

LeanAide

LeanAide is a work in progress AI tool designed to assist with development using the Lean Theorem Prover. It currently offers a tool that translates natural language statements to Lean types, including theorem statements. The tool is based on GPT 3.5-turbo/GPT 4 and requires an OpenAI key for usage. Users can include LeanAide as a dependency in their projects to access the translation functionality.

github

: 97

cline-based-code-generator

HAI Code Generator is a cutting-edge tool designed to simplify and automate task execution while enhancing code generation workflows. Leveraging Specif AI, it streamlines processes like task execution, file identification, and code documentation through intelligent automation and AI-driven capabilities. Built on Cline's powerful foundation for AI-assisted development, HAI Code Generator boosts productivity and precision by automating task execution and integrating file management capabilities. It combines intelligent file indexing, context generation, and LLM-driven automation to minimize manual effort and ensure task accuracy. Perfect for developers and teams aiming to enhance their workflows.

github

: 62

hallucination-leaderboard

This leaderboard evaluates the hallucination rate of various Large Language Models (LLMs) when summarizing documents. It uses a model trained by Vectara to detect hallucinations in LLM outputs. The leaderboard includes models from OpenAI, Anthropic, Google, Microsoft, Amazon, and others. The evaluation is based on 831 documents that were summarized by all the models. The leaderboard shows the hallucination rate, factual consistency rate, answer rate, and average summary length for each model.

github

: 3.1k

h2o-llmstudio

H2O LLM Studio is a framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs). With H2O LLM Studio, you can easily and effectively fine-tune LLMs without the need for any coding experience. The GUI is specially designed for large language models, and you can finetune any LLM using a large variety of hyperparameters. You can also use recent finetuning techniques such as Low-Rank Adaptation (LoRA) and 8-bit model training with a low memory footprint. Additionally, you can use Reinforcement Learning (RL) to finetune your model (experimental), use advanced evaluation metrics to judge generated answers by the model, track and compare your model performance visually, and easily export your model to the Hugging Face Hub and share it with the community.

github

: 4.1k

llm-jp-eval

LLM-jp-eval is a tool designed to automatically evaluate Japanese large language models across multiple datasets. It provides functionalities such as converting existing Japanese evaluation data to text generation task evaluation datasets, executing evaluations of large language models across multiple datasets, and generating instruction data (jaster) in the format of evaluation data prompts. Users can manage the evaluation settings through a config file and use Hydra to load them. The tool supports saving evaluation results and logs using wandb. Users can add new evaluation datasets by following specific steps and guidelines provided in the tool's documentation. It is important to note that using jaster for instruction tuning can lead to artificially high evaluation scores, so caution is advised when interpreting the results.

github

: 137

Awesome-LLM

Awesome-LLM is a curated list of resources related to large language models, focusing on papers, projects, frameworks, tools, tutorials, courses, opinions, and other useful resources in the field. It covers trending LLM projects, milestone papers, other papers, open LLM projects, LLM training frameworks, LLM evaluation frameworks, tools for deploying LLM, prompting libraries & tools, tutorials, courses, books, and opinions. The repository provides a comprehensive overview of the latest advancements and resources in the field of large language models.

github

: 22.1k

bocoel

BoCoEL is a tool that leverages Bayesian Optimization to efficiently evaluate large language models by selecting a subset of the corpus for evaluation. It encodes individual entries into embeddings, uses Bayesian optimization to select queries, retrieves from the corpus, and provides easily managed evaluations. The tool aims to reduce computation costs during evaluation with a dynamic budget, supporting models like GPT2, Pythia, and LLAMA through integration with Hugging Face transformers and datasets. BoCoEL offers a modular design and efficient representation of the corpus to enhance evaluation quality.

github

: 270

For similar jobs

ludwig

Ludwig is a declarative deep learning framework designed for scale and efficiency. It is a low-code framework that allows users to build custom AI models like LLMs and other deep neural networks with ease. Ludwig offers features such as optimized scale and efficiency, expert level control, modularity, and extensibility. It is engineered for production with prebuilt Docker containers, support for running with Ray on Kubernetes, and the ability to export models to Torchscript and Triton. Ludwig is hosted by the Linux Foundation AI & Data.

github

: 11.0k

wenda

Wenda is a platform for large-scale language model invocation designed to efficiently generate content for specific environments, considering the limitations of personal and small business computing resources, as well as knowledge security and privacy issues. The platform integrates capabilities such as knowledge base integration, multiple large language models for offline deployment, auto scripts for additional functionality, and other practical capabilities like conversation history management and multi-user simultaneous usage.

github

: 6.3k

LLMonFHIR

LLMonFHIR is an iOS application that utilizes large language models (LLMs) to interpret and provide context around patient data in the Fast Healthcare Interoperability Resources (FHIR) format. It connects to the OpenAI GPT API to analyze FHIR resources, supports multiple languages, and allows users to interact with their health data stored in the Apple Health app. The app aims to simplify complex health records, provide insights, and facilitate deeper understanding through a conversational interface. However, it is an experimental app for informational purposes only and should not be used as a substitute for professional medical advice. Users are advised to verify information provided by AI models and consult healthcare professionals for personalized advice.

github

: 148

Chinese-Mixtral-8x7B

Chinese-Mixtral-8x7B is an open-source project based on Mistral's Mixtral-8x7B model for incremental pre-training of Chinese vocabulary, aiming to advance research on MoE models in the Chinese natural language processing community. The expanded vocabulary significantly improves the model's encoding and decoding efficiency for Chinese, and the model is pre-trained incrementally on a large-scale open-source corpus, enabling it with powerful Chinese generation and comprehension capabilities. The project includes a large model with expanded Chinese vocabulary and incremental pre-training code.

github

: 635

AI-Horde-Worker

AI-Horde-Worker is a repository containing the original reference implementation for a worker that turns your graphics card(s) into a worker for the AI Horde. It allows users to generate or alchemize images for others. The repository provides instructions for setting up the worker on Windows and Linux, updating the worker code, running with multiple GPUs, and stopping the worker. Users can configure the worker using a WebUI to connect to the horde with their username and API key. The repository also includes information on model usage and running the Docker container with specified environment variables.

github

: 254

openshield

OpenShield is a firewall designed for AI models to protect against various attacks such as prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency granting, overreliance, and model theft. It provides rate limiting, content filtering, and keyword filtering for AI models. The tool acts as a transparent proxy between AI models and clients, allowing users to set custom rate limits for OpenAI endpoints and perform tokenizer calculations for OpenAI models. OpenShield also supports Python and LLM based rules, with upcoming features including rate limiting per user and model, prompts manager, content filtering, keyword filtering based on LLM/Vector models, OpenMeter integration, and VectorDB integration. The tool requires an OpenAI API key, Postgres, and Redis for operation.

github

: 74

VoAPI

VoAPI is a new high-value/high-performance AI model interface management and distribution system. It is a closed-source tool for personal learning use only, not for commercial purposes. Users must comply with upstream AI model service providers and legal regulations. The system offers a visually appealing interface, independent development documentation page support, service monitoring page configuration support, and third-party login support. It also optimizes interface elements, user registration time support, data operation button positioning, and more.

github

: 89

VoAPI

VoAPI is a new high-value/high-performance AI model interface management and distribution system. It is a closed-source tool for personal learning use only, not for commercial purposes. Users must comply with upstream AI model service providers and legal regulations. The system offers a visually appealing interface with features such as independent development documentation page support, service monitoring page configuration support, and third-party login support. Users can manage user registration time, optimize interface elements, and support features like online recharge, model pricing display, and sensitive word filtering. VoAPI also provides support for various AI models and platforms, with the ability to configure homepage templates, model information, and manufacturer information.

github

: 383

Awesome-LLM-Constrained-Decoding

README:

Awesome-LLM-Constrained-Decoding

Table of Contents

Libraries

Papers

Benchmark & Datasets & Evaluation

Survey

Blog Posts

Related Awesome Lists

Disclaimer

Contributing

For Tasks:

For Jobs:

Alternative AI tools for Awesome-LLM-Constrained-Decoding

Similar Open Source Tools

Awesome-LLM-Constrained-Decoding

Awesome-Model-Merging-Methods-Theories-Applications

Awesome-Tabular-LLMs

ai-infra-learning

are-copilots-local-yet

llm-compression-intelligence

nntrainer

speech-trident

Awesome-Resource-Efficient-LLM-Papers

rubra

awesome-mobile-llm

awesome-llm-planning-reasoning

LLM-KG4QA

LLMs-Planning

PredictorLLM

LLM4Opt

For similar tasks

Awesome-LLM-Constrained-Decoding

LeanAide

cline-based-code-generator

hallucination-leaderboard

h2o-llmstudio

llm-jp-eval

Awesome-LLM

bocoel

For similar jobs

ludwig

wenda

LLMonFHIR

Chinese-Mixtral-8x7B

AI-Horde-Worker

openshield

VoAPI

VoAPI