minja
A minimalistic C++ Jinja templating engine for LLM chat templates
Stars: 102
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
README:
This is not an official Google product
Minja is a minimalistic reimplementation of the Jinja templating engine to integrate in/with C++ LLM projects (such as llama.cpp or gemma.cpp).
It is not general purpose: it includes just what’s needed for actual chat templates (very limited set of filters, tests and language features). Users with different needs should look at third-party alternatives such as Jinja2Cpp, Jinja2CppLight, or inja (none of which we endorse).
[!WARNING]
TL;DR: use of Minja is at your own risk, and the risks are plenty! See Security & Privacy section below.
- Support each and every major LLM found on HuggingFace
- See
MODEL_IDSin tests/CMakeLists.txt for the list of models currently supported
- See
- Easy to integrate to/with projects such as llama.cpp or gemma.cpp:
- Header-only
- C++17
- Only depend on nlohmann::json (no Boost)
- Keep codebase small (currently 2.5k LoC) and easy to understand
- Decent performance compared to Python.
- Address glaring Prompt injection risks in current Jinja chat templating practices. See Security & Privacy below
- Additional features from Jinja that aren't used by the template(s) of any major LLM (no feature creep!)
- Please don't submit PRs with such features, they will unfortunately be rejected.
- Full Jinja compliance (neither syntax-wise, nor filters / tests / globals)
This library is header-only: just copy the header(s) you need, make sure to use a compiler that handles C++11 and you're done. Oh, and get nlohmann::json's json.hpp in your include path.
See API in minja/minja.hpp and minja/chat-template.h (experimental).
For raw Jinja templating (see examples/raw.cpp):
#include <minja.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
auto tmpl = minja::Parser::parse("Hello, {{ location }}!", /* options= */ {});
auto context = minja::Context::make(minja::Value(json {
{"location", "World"},
}));
auto result = tmpl->render(context);
std::cout << result << std::endl;
}To apply a template to a JSON array of messages and tools in the HuggingFace standard (see examples/chat-template.cpp):
#include <chat-template.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
minja::chat_template tmpl(
"{% for message in messages %}"
"{{ '<|' + message['role'] + '|>\\n' + message['content'] + '<|end|>' + '\\n' }}"
"{% endfor %}",
/* bos_token= */ "<|start|>",
/* eos_token= */ "<|end|>"
);
std::cout << tmpl.apply(
json::parse(R"([
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"}
])"),
json::parse(R"([
{"type": "function", "function": {"name": "google_search", "arguments": {"query": "2+2"}}}
])"),
/* add_generation_prompt= */ true,
/* extra_context= */ {}) << std::endl;
}(Note that some template quirks are worked around by minja/chat-template.hpp so that all templates can be used the same way)
Models have increasingly complex templates (see some examples), so a fair bit of Jinja's language constructs is required to execute their templates properly.
Minja supports the following subset of the Jinja2/3 template syntax:
- Full expression syntax
- Statements
{{% … %}}, variable sections{{ … }}, and comments{# … #}with pre/post space elision{%- … -%}/{{- … -}}/{#- … -#} -
if/elif/else/endif -
for(recursive) (if) /else/endforw/loop.*(includingloop.cycle) and destructuring -
setw/ namespaces & destructuring -
macro/endmacro -
filter/endfilter - Extensible filters collection:
count,dictsort,equalto,e/escape,items,join,joiner,namespace,raise_exception,range,reject,tojson,trim
Main limitations (non-exhaustive list):
- Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.
- No difference between
noneandundefined - Single namespace with all filters / tests / functions / macros / variables
- No tuples (templates seem to rely on lists only)
- No
ifexpressions w/oelse(butifstatements are fine) - No
{% raw %},{% block … %},{% include … %}, `{% extends … %},
- [x] Fix known issues w/ CRLF on Windows
- [ ] Integrate to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/11016 + https://github.com/ggerganov/llama.cpp/pull/9639
- Improve fuzzing coverage:
- use thirdparty jinja grammar to guide exploration of inputs (or implement prettification of internal ASTs and use them to generate arbitrary values)
- fuzz each filter / test
- Measure / track test coverage
- Setup performance tests
- Simplify two-pass parsing
- Pass tokens to IfNode and such
- Macro nested set scope = global?
- Get listed in https://jbmoelker.github.io/jinja-compat-tests/, https://en.cppreference.com/w/cpp/links/libs
-
minja::Parserdoes two-phased parsing:- its
tokenize()method creates coarse template "tokens" (plain text section, or expression blocks or opening / closing blocks). Tokens may have nested expressions ASTs, parsed withparseExpression() - its
parseTemplate()method iterates on tokens to build the finalTemplateNodeAST.
- its
-
minja::Valuerepresents a Python-like value- It relies on
nlohmann/jsonfor primitive values, but does its own JSON dump to be exactly compatible w/ the Jinja / Python implementation ofdictstring representation
- It relies on
-
minja::chat_templatewraps a template and provides an interface similar to HuggingFace's chat template formatting. It also normalizes the message history to accommodate different expectations from some templates (e.g.message.tool_calls.function.argumentsis typically expected to be a JSON string representation of the tool call arguments, but some templates expect the arguments object instead) - Testing involves a myriad of simple syntax tests and full e2e chat template rendering tests. For each model in
MODEL_IDS(see tests/CMakeLists.txt), we fetch thechat_templatefield of the repo'stokenizer_config.json, use the official jinja2 Python library to render them on each of the (relevant) test contexts (in tests/contexts) into a golden file, and run a C++ test that renders w/ Minja and checks we get exactly the same output.
-
Install Prerequisites:
- cmake
- GCC / clang
- python 3.8+ (for tests)
- flake8
- editorconfig-checker
-
Optional: test additional templates:
-
Add their HuggingFace model identifier to
MODEL_IDSin tests/CMakeLists.txt (e.g.meta-llama/Llama-3.2-3B-Instruct) -
For gated models you have access to, first authenticate w/ HuggingFace:
pip install huggingface_hub huggingface-cli login
-
-
Build & run tests (shorthand:
./scripts/run_tests.sh):rm -fR build && \ cmake -B build && \ cmake --build build -j && \ ctest --test-dir build -j --output-on-failure
-
Fuzzing tests
-
Note:
fuzztestdoesn't work natively on Windows or MacOS.Show instructions to run it inside a Docker container
Beware of Docker Desktop's licensing: you might want to check out alternatives such as colima (we'll still use the docker client in the example below).
docker run --rm -it -v $PWD:/src:rw $( echo " FROM python:3.12-slim-bookworm COPY requirements.txt /tmp RUN apt update && \ apt install -y cmake clang ccache git python3 python-is-python3 python3-pip && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN pip install setuptools pip --upgrade --force-reinstall RUN pip install -r /tmp/requirements.txt CMD /usr/bin/bash WORKDIR /src " | docker build . -f - -q )
-
Build in fuzzing mode & run all fuzzing tests (optionally, set a higher
TIMEOUTas env var):./scripts/run_fuzzing_mode.sh
-
-
If your model's template doesn't run fine, please consider the following before opening a bug:
- Is the template using any unsupported filter / test / method / global function, and which one(s)?
- Is the template publicly available? Non-gated models are more likely to become supported.
- Which version of GCC / clang did you compile the tests with? On which OS version?
- If you intend to contribute a fix:
- Please read CONTRIBUTING first. You'd have to sign a CLA, which your employer may need to accept.
- Please test as many gated models as possible (use
cmake -B build -DMINJA_TEST_GATED_MODELS=1 ...and edit MODEL_LIST appropriately)
-
For bonus points, check the style of your edits with:
flake8 editorconfig-checker
This library doesn't store any data by itself, it doesn't access files or the web, it only transforms a template (string) and context (JSON w/ fields "messages", "tools"...) into a formatted string.
You should still be careful about untrusted third-party chat templates, as these could try and trigger bugs in Minja to exfiltrate user chat data (we only have limited fuzzing tests in place).
Risks are even higher with any user-defined functions.
HTML processing with this library is UNSAFE: no escaping of is performed (and the safe filter is a passthrough), leaving users vulnerable to XSS. Minja is not intended to produce HTML.
Prompt injection is NOT protected against by this library.
There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).
For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.
Note that injection of tool calls should typically not result in their execution as LLM inference engines should not try to parse the template output (just generated tokens), but this is something to watch out for when auditing such inference engines.
As there isn't any standard mechanism to escape special tokens to prevent those attacks, it is advised users of this library take their own message sanitization measures before applying chat templates. We do not recommend any specific such measure as each model reacts differently (some even understand l33tcode as instructions).
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for minja
Similar Open Source Tools
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
simpleAI
SimpleAI is a self-hosted alternative to the not-so-open AI API, focused on replicating main endpoints for LLM such as text completion, chat, edits, and embeddings. It allows quick experimentation with different models, creating benchmarks, and handling specific use cases without relying on external services. Users can integrate and declare models through gRPC, query endpoints using Swagger UI or API, and resolve common issues like CORS with FastAPI middleware. The project is open for contributions and welcomes PRs, issues, documentation, and more.
LightRAG
LightRAG is a PyTorch library designed for building and optimizing Retriever-Agent-Generator (RAG) pipelines. It follows principles of simplicity, quality, and optimization, offering developers maximum customizability with minimal abstraction. The library includes components for model interaction, output parsing, and structured data generation. LightRAG facilitates tasks like providing explanations and examples for concepts through a question-answering pipeline.
aire
Aire is a modern Laravel form builder with a focus on expressive and beautiful code. It allows easy configuration of form components using fluent method calls or Blade components. Aire supports customization through config files and custom views, data binding with Eloquent models or arrays, method spoofing, CSRF token injection, server-side and client-side validation, and translations. It is designed to run on Laravel 5.8.28 and higher, with support for PHP 7.1 and higher. Aire is actively maintained and under consideration for additional features like read-only plain text, cross-browser support for custom checkboxes and radio buttons, support for Choices.js or similar libraries, improved file input handling, and better support for content prepending or appending to inputs.
consult-llm-mcp
Consult LLM MCP is an MCP server that enables users to consult powerful AI models like GPT-5.2, Gemini 3.0 Pro, and DeepSeek Reasoner for complex problem-solving. It supports multi-turn conversations, direct queries with optional file context, git changes inclusion for code review, comprehensive logging with cost estimation, and various CLI modes for Gemini and Codex. The tool is designed to simplify the process of querying AI models for assistance in resolving coding issues and improving code quality.
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
neocodeium
NeoCodeium is a free AI completion plugin powered by Codeium, designed for Neovim users. It aims to provide a smoother experience by eliminating flickering suggestions and allowing for repeatable completions using the `.` key. The plugin offers performance improvements through cache techniques, displays suggestion count labels, and supports Lua scripting. Users can customize keymaps, manage suggestions, and interact with the AI chat feature. NeoCodeium enhances code completion in Neovim, making it a valuable tool for developers seeking efficient coding assistance.
python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
bilingual_book_maker
The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt/srt files and books. It supports various models like gpt-4, gpt-3.5-turbo, claude-2, palm, llama-2, azure-openai, command-nightly, and gemini. Users need ChatGPT or OpenAI token, epub/txt books, internet access, and Python 3.8+. The tool provides options to specify OpenAI API key, model selection, target language, proxy server, context addition, translation style, and more. It generates bilingual books in epub format after translation. Users can test translations, set batch size, tweak prompts, and use different models like DeepL, Google Gemini, Tencent TranSmart, and more. The tool also supports retranslation, translating specific tags, and e-reader type specification. Docker usage is available for easy setup.
AirspeedVelocity.jl
AirspeedVelocity.jl is a tool designed to simplify benchmarking of Julia packages over their lifetime. It provides a CLI to generate benchmarks, compare commits/tags/branches, plot benchmarks, and run benchmark comparisons for every submitted PR as a GitHub action. The tool freezes the benchmark script at a specific revision to prevent old history from affecting benchmarks. Users can configure options using CLI flags and visualize benchmark results. AirspeedVelocity.jl can be used to benchmark any Julia package and offers features like generating tables and plots of benchmark results. It also supports custom benchmarks and can be integrated into GitHub actions for automated benchmarking of PRs.
auto-playwright
Auto Playwright is a tool that allows users to run Playwright tests using AI. It eliminates the need for selectors by determining actions at runtime based on plain-text instructions. Users can automate complex scenarios, write tests concurrently with or before functionality development, and benefit from rapid test creation. The tool supports various Playwright actions and offers additional options for debugging and customization. It uses HTML sanitization to reduce costs and improve text quality when interacting with the OpenAI API.
banks
Banks is a linguist professor tool that helps generate meaningful LLM prompts using a template language. It provides a user-friendly way to create prompts for various tasks such as blog writing, summarizing documents, lemmatizing text, and generating text using a LLM. The tool supports async operations and comes with predefined filters for data processing. Banks leverages Jinja's macro system to create prompts and interact with OpenAI API for text generation. It also offers a cache mechanism to avoid regenerating text for the same template and context.
suno-api
Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.
AgentKit
AgentKit is a framework for constructing complex human thought processes from simple natural language prompts. It offers a unified way to represent and execute these processes as graphs, making it easy to design and tune agents without any programming experience. AgentKit can be used for a variety of tasks, including generating text, answering questions, and making decisions.
instructor
Instructor is a popular Python library for managing structured outputs from large language models (LLMs). It offers a user-friendly API for validation, retries, and streaming responses. With support for various LLM providers and multiple languages, Instructor simplifies working with LLM outputs. The library includes features like response models, retry management, validation, streaming support, and flexible backends. It also provides hooks for logging and monitoring LLM interactions, and supports integration with Anthropic, Cohere, Gemini, Litellm, and Google AI models. Instructor facilitates tasks such as extracting user data from natural language, creating fine-tuned models, managing uploaded files, and monitoring usage of OpenAI models.
For similar tasks
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
aio-hub
AIO Hub is a cross-platform AI hub built on Tauri + Vue 3 + TypeScript, aiming to provide developers and creators with precise LLM control experience and efficient toolchain. It features a chat function designed for complex tasks and deep exploration, a unified context pipeline for controlling every token sent to the model, interactive AI buttons, dual-view management for non-linear conversation mapping, open ecosystem compatibility with various AI models, and a rich text renderer for LLM output. The tool also includes features for media workstation, developer productivity, system and asset management, regex applier, collaboration enhancement between developers and AI, and more.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.