minja
A minimalistic C++ Jinja templating engine for LLM chat templates
Stars: 96
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
README:
This is not an official Google product
Minja is a minimalistic reimplementation of the Jinja templating engine to integrate in/with C++ LLM projects (such as llama.cpp or gemma.cpp).
It is not general purpose: it includes just what’s needed for actual chat templates (very limited set of filters, tests and language features). Users with different needs should look at third-party alternatives such as Jinja2Cpp, Jinja2CppLight, or inja (none of which we endorse).
[!WARNING]
TL;DR: use of Minja is at your own risk, and the risks are plenty! See Security & Privacy section below.
- Support each and every major LLM found on HuggingFace
- See
MODEL_IDS
in tests/CMakeLists.txt for the list of models currently supported
- See
- Easy to integrate to/with projects such as llama.cpp or gemma.cpp:
- Header-only
- C++17
- Only depend on nlohmann::json (no Boost)
- Keep codebase small (currently 2.5k LoC) and easy to understand
- Decent performance compared to Python.
- Address glaring Prompt injection risks in current Jinja chat templating practices. See Security & Privacy below
- Additional features from Jinja that aren't used by the template(s) of any major LLM (no feature creep!)
- Please don't submit PRs with such features, they will unfortunately be rejected.
- Full Jinja compliance (neither syntax-wise, nor filters / tests / globals)
This library is header-only: just copy the header(s) you need, make sure to use a compiler that handles C++11 and you're done. Oh, and get nlohmann::json's json.hpp
in your include path.
See API in minja/minja.hpp and minja/chat-template.h (experimental).
For raw Jinja templating (see examples/raw.cpp):
#include <minja.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
auto tmpl = minja::Parser::parse("Hello, {{ location }}!", /* options= */ {});
auto context = minja::Context::make(minja::Value(json {
{"location", "World"},
}));
auto result = tmpl->render(context);
std::cout << result << std::endl;
}
To apply a template to a JSON array of messages
and tools
in the HuggingFace standard (see examples/chat-template.cpp):
#include <chat-template.hpp>
#include <iostream>
using json = nlohmann::ordered_json;
int main() {
minja::chat_template tmpl(
"{% for message in messages %}"
"{{ '<|' + message['role'] + '|>\\n' + message['content'] + '<|end|>' + '\\n' }}"
"{% endfor %}",
/* bos_token= */ "<|start|>",
/* eos_token= */ "<|end|>"
);
std::cout << tmpl.apply(
json::parse(R"([
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"}
])"),
json::parse(R"([
{"type": "function", "function": {"name": "google_search", "arguments": {"query": "2+2"}}}
])"),
/* add_generation_prompt= */ true,
/* extra_context= */ {}) << std::endl;
}
(Note that some template quirks are worked around by minja/chat-template.hpp so that all templates can be used the same way)
Models have increasingly complex templates (see some examples), so a fair bit of Jinja's language constructs is required to execute their templates properly.
Minja supports the following subset of the Jinja2/3 template syntax:
- Full expression syntax
- Statements
{{% … %}}
, variable sections{{ … }}
, and comments{# … #}
with pre/post space elision{%- … -%}
/{{- … -}}
/{#- … -#}
-
if
/elif
/else
/endif
-
for
(recursive
) (if
) /else
/endfor
w/loop.*
(includingloop.cycle
) and destructuring -
set
w/ namespaces & destructuring -
macro
/endmacro
-
filter
/endfilter
- Extensible filters collection:
count
,dictsort
,equalto
,e
/escape
,items
,join
,joiner
,namespace
,raise_exception
,range
,reject
,tojson
,trim
Main limitations (non-exhaustive list):
- Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.
- No difference between
none
andundefined
- Single namespace with all filters / tests / functions / macros / variables
- No tuples (templates seem to rely on lists only)
- No
if
expressions w/oelse
(butif
statements are fine) - No
{% raw %}
,{% block … %}
,{% include … %}
, `{% extends … %},
- [x] Fix known issues w/ CRLF on Windows
- [ ] Integrate to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/11016 + https://github.com/ggerganov/llama.cpp/pull/9639
- Improve fuzzing coverage:
- use thirdparty jinja grammar to guide exploration of inputs (or implement prettification of internal ASTs and use them to generate arbitrary values)
- fuzz each filter / test
- Measure / track test coverage
- Setup performance tests
- Simplify two-pass parsing
- Pass tokens to IfNode and such
- Macro nested set scope = global?
- Get listed in https://jbmoelker.github.io/jinja-compat-tests/, https://en.cppreference.com/w/cpp/links/libs
-
minja::Parser
does two-phased parsing:- its
tokenize()
method creates coarse template "tokens" (plain text section, or expression blocks or opening / closing blocks). Tokens may have nested expressions ASTs, parsed withparseExpression()
- its
parseTemplate()
method iterates on tokens to build the finalTemplateNode
AST.
- its
-
minja::Value
represents a Python-like value- It relies on
nlohmann/json
for primitive values, but does its own JSON dump to be exactly compatible w/ the Jinja / Python implementation ofdict
string representation
- It relies on
-
minja::chat_template
wraps a template and provides an interface similar to HuggingFace's chat template formatting. It also normalizes the message history to accommodate different expectations from some templates (e.g.message.tool_calls.function.arguments
is typically expected to be a JSON string representation of the tool call arguments, but some templates expect the arguments object instead) - Testing involves a myriad of simple syntax tests and full e2e chat template rendering tests. For each model in
MODEL_IDS
(see tests/CMakeLists.txt), we fetch thechat_template
field of the repo'stokenizer_config.json
, use the official jinja2 Python library to render them on each of the (relevant) test contexts (in tests/contexts) into a golden file, and run a C++ test that renders w/ Minja and checks we get exactly the same output.
-
Install Prerequisites:
- cmake
- GCC / clang
- python 3.8+ (for tests)
- flake8
- editorconfig-checker
-
Optional: test additional templates:
-
Add their HuggingFace model identifier to
MODEL_IDS
in tests/CMakeLists.txt (e.g.meta-llama/Llama-3.2-3B-Instruct
) -
For gated models you have access to, first authenticate w/ HuggingFace:
pip install huggingface_hub huggingface-cli login
-
-
Build & run tests (shorthand:
./scripts/run_tests.sh
):rm -fR build && \ cmake -B build && \ cmake --build build -j && \ ctest --test-dir build -j --output-on-failure
-
Fuzzing tests
-
Note:
fuzztest
doesn't work natively on Windows or MacOS.Show instructions to run it inside a Docker container
Beware of Docker Desktop's licensing: you might want to check out alternatives such as colima (we'll still use the docker client in the example below).
docker run --rm -it -v $PWD:/src:rw $( echo " FROM python:3.12-slim-bookworm COPY requirements.txt /tmp RUN apt update && \ apt install -y cmake clang ccache git python3 python-is-python3 python3-pip && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* RUN pip install setuptools pip --upgrade --force-reinstall RUN pip install -r /tmp/requirements.txt CMD /usr/bin/bash WORKDIR /src " | docker build . -f - -q )
-
Build in fuzzing mode & run all fuzzing tests (optionally, set a higher
TIMEOUT
as env var):./scripts/run_fuzzing_mode.sh
-
-
If your model's template doesn't run fine, please consider the following before opening a bug:
- Is the template using any unsupported filter / test / method / global function, and which one(s)?
- Is the template publicly available? Non-gated models are more likely to become supported.
- Which version of GCC / clang did you compile the tests with? On which OS version?
- If you intend to contribute a fix:
- Please read CONTRIBUTING first. You'd have to sign a CLA, which your employer may need to accept.
- Please test as many gated models as possible (use
cmake -B build -DMINJA_TEST_GATED_MODELS=1 ...
and edit MODEL_LIST appropriately)
-
For bonus points, check the style of your edits with:
flake8 editorconfig-checker
This library doesn't store any data by itself, it doesn't access files or the web, it only transforms a template (string) and context (JSON w/ fields "messages"
, "tools"
...) into a formatted string.
You should still be careful about untrusted third-party chat templates, as these could try and trigger bugs in Minja to exfiltrate user chat data (we only have limited fuzzing tests in place).
Risks are even higher with any user-defined functions.
HTML processing with this library is UNSAFE: no escaping of is performed (and the safe
filter is a passthrough), leaving users vulnerable to XSS. Minja is not intended to produce HTML.
Prompt injection is NOT protected against by this library.
There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).
For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.
Note that injection of tool calls should typically not result in their execution as LLM inference engines should not try to parse the template output (just generated tokens), but this is something to watch out for when auditing such inference engines.
As there isn't any standard mechanism to escape special tokens to prevent those attacks, it is advised users of this library take their own message sanitization measures before applying chat templates. We do not recommend any specific such measure as each model reacts differently (some even understand l33tcode as instructions).
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for minja
Similar Open Source Tools
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
upgini
Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.
langserve
LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.
aire
Aire is a modern Laravel form builder with a focus on expressive and beautiful code. It allows easy configuration of form components using fluent method calls or Blade components. Aire supports customization through config files and custom views, data binding with Eloquent models or arrays, method spoofing, CSRF token injection, server-side and client-side validation, and translations. It is designed to run on Laravel 5.8.28 and higher, with support for PHP 7.1 and higher. Aire is actively maintained and under consideration for additional features like read-only plain text, cross-browser support for custom checkboxes and radio buttons, support for Choices.js or similar libraries, improved file input handling, and better support for content prepending or appending to inputs.
Gemini-API
Gemini-API is a reverse-engineered asynchronous Python wrapper for Google Gemini web app (formerly Bard). It provides features like persistent cookies, ImageFx support, extension support, classified outputs, official flavor, and asynchronous operation. The tool allows users to generate contents from text or images, have conversations across multiple turns, retrieve images in response, generate images with ImageFx, save images to local files, use Gemini extensions, check and switch reply candidates, and control log level.
gateway
Adaline Gateway is a fully local production-grade Super SDK that offers a unified interface for calling over 200+ LLMs. It is production-ready, supports batching, retries, caching, callbacks, and OpenTelemetry. Users can create custom plugins and providers for seamless integration with their infrastructure.
DeepPavlov
DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.
neocodeium
NeoCodeium is a free AI completion plugin powered by Codeium, designed for Neovim users. It aims to provide a smoother experience by eliminating flickering suggestions and allowing for repeatable completions using the `.` key. The plugin offers performance improvements through cache techniques, displays suggestion count labels, and supports Lua scripting. Users can customize keymaps, manage suggestions, and interact with the AI chat feature. NeoCodeium enhances code completion in Neovim, making it a valuable tool for developers seeking efficient coding assistance.
lantern
Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and handle vector search operations efficiently. It introduces a new index type called 'lantern_hnsw' for vector columns, which speeds up 'ORDER BY ... LIMIT' queries. Lantern utilizes the state-of-the-art HNSW implementation called usearch. Users can easily install Lantern using Docker, Homebrew, or precompiled binaries. The tool supports various distance functions, index construction parameters, and operator classes for efficient querying. Lantern offers features like embedding generation, interoperability with pgvector, parallel index creation, and external index graph generation. It aims to provide superior performance metrics compared to other similar tools and has a roadmap for future enhancements such as cloud-hosted version, hardware-accelerated distance metrics, industry-specific application templates, and support for version control and A/B testing of embeddings.
llm-vscode
llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.
python-tgpt
Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.
auto-playwright
Auto Playwright is a tool that allows users to run Playwright tests using AI. It eliminates the need for selectors by determining actions at runtime based on plain-text instructions. Users can automate complex scenarios, write tests concurrently with or before functionality development, and benefit from rapid test creation. The tool supports various Playwright actions and offers additional options for debugging and customization. It uses HTML sanitization to reduce costs and improve text quality when interacting with the OpenAI API.
mflux
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
ChatDBG
ChatDBG is an AI-based debugging assistant for C/C++/Python/Rust code that integrates large language models into a standard debugger (`pdb`, `lldb`, `gdb`, and `windbg`) to help debug your code. With ChatDBG, you can engage in a dialog with your debugger, asking open-ended questions about your program, like `why is x null?`. ChatDBG will _take the wheel_ and steer the debugger to answer your queries. ChatDBG can provide error diagnoses and suggest fixes. As far as we are aware, ChatDBG is the _first_ debugger to automatically perform root cause analysis and to provide suggested fixes.
llmgraph
llmgraph is a tool that enables users to create knowledge graphs in GraphML, GEXF, and HTML formats by extracting world knowledge from large language models (LLMs) like ChatGPT. It supports various entity types and relationships, offers cache support for efficient graph growth, and provides insights into LLM costs. Users can customize the model used and interact with different LLM providers. The tool allows users to generate interactive graphs based on a specified entity type and Wikipedia link, making it a valuable resource for knowledge graph creation and exploration.
gen.nvim
gen.nvim is a tool that allows users to generate text using Language Models (LLMs) with customizable prompts. It requires Ollama with models like `llama3`, `mistral`, or `zephyr`, along with Curl for installation. Users can use the `Gen` command to generate text based on predefined or custom prompts. The tool provides key maps for easy invocation and allows for follow-up questions during conversations. Additionally, users can select a model from a list of installed models and customize prompts as needed.
For similar tasks
minja
Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.