minja

A minimalistic C++ Jinja templating engine for LLM chat templates

Stars: 102

Visit

Minja is a minimalistic C++ Jinja templating engine designed specifically for integration with C++ LLM projects, such as llama.cpp or gemma.cpp. It is not a general-purpose tool but focuses on providing a limited set of filters, tests, and language features tailored for chat templates. The library is header-only, requires C++17, and depends only on nlohmann::json. Minja aims to keep the codebase small, easy to understand, and offers decent performance compared to Python. Users should be cautious when using Minja due to potential security risks, and it is not intended for producing HTML or JavaScript output.

README:

minja.hpp - A minimalistic C++ Jinja templating engine for LLM chat templates

This is not an official Google product

Minja is a minimalistic reimplementation of the Jinja templating engine to integrate in/with C++ LLM projects (such as llama.cpp or gemma.cpp).

It is not general purpose: it includes just what’s needed for actual chat templates (very limited set of filters, tests and language features). Users with different needs should look at third-party alternatives such as Jinja2Cpp, Jinja2CppLight, or inja (none of which we endorse).

[!WARNING]
TL;DR: use of Minja is at your own risk, and the risks are plenty! See Security & Privacy section below.

Design goals:

Support each and every major LLM found on HuggingFace
- See MODEL_IDS in tests/CMakeLists.txt for the list of models currently supported
Easy to integrate to/with projects such as llama.cpp or gemma.cpp:
- Header-only
- C++17
- Only depend on nlohmann::json (no Boost)
- Keep codebase small (currently 2.5k LoC) and easy to understand
Decent performance compared to Python.

Non-goals:

Address glaring Prompt injection risks in current Jinja chat templating practices. See Security & Privacy below
Additional features from Jinja that aren't used by the template(s) of any major LLM (no feature creep!)
- Please don't submit PRs with such features, they will unfortunately be rejected.
Full Jinja compliance (neither syntax-wise, nor filters / tests / globals)

Usage:

This library is header-only: just copy the header(s) you need, make sure to use a compiler that handles C++11 and you're done. Oh, and get nlohmann::json's json.hpp in your include path.

See API in minja/minja.hpp and minja/chat-template.h (experimental).

For raw Jinja templating (see examples/raw.cpp):

#include <minja.hpp>
#include <iostream>

using json = nlohmann::ordered_json;

int main() {
    auto tmpl = minja::Parser::parse("Hello, {{ location }}!", /* options= */ {});
    auto context = minja::Context::make(minja::Value(json {
        {"location", "World"},
    }));
    auto result = tmpl->render(context);
    std::cout << result << std::endl;
}

To apply a template to a JSON array of messages and tools in the HuggingFace standard (see examples/chat-template.cpp):

#include <chat-template.hpp>
#include <iostream>

using json = nlohmann::ordered_json;

int main() {
    minja::chat_template tmpl(
        "{% for message in messages %}"
        "{{ '<|' + message['role'] + '|>\\n' + message['content'] + '<|end|>' + '\\n' }}"
        "{% endfor %}",
        /* bos_token= */ "<|start|>",
        /* eos_token= */ "<|end|>"
    );
    std::cout << tmpl.apply(
        json::parse(R"([
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hi there"}
        ])"),
        json::parse(R"([
            {"type": "function", "function": {"name": "google_search", "arguments": {"query": "2+2"}}}
        ])"),
        /* add_generation_prompt= */ true,
        /* extra_context= */ {}) << std::endl;
}

(Note that some template quirks are worked around by minja/chat-template.hpp so that all templates can be used the same way)

Supported features

Models have increasingly complex templates (see some examples), so a fair bit of Jinja's language constructs is required to execute their templates properly.

Minja supports the following subset of the Jinja2/3 template syntax:

Full expression syntax
Statements {{% … %}}, variable sections {{ … }}, and comments {# … #} with pre/post space elision {%- … -%} / {{- … -}} / {#- … -#}
if / elif / else / endif
for (recursive) (if) / else / endfor w/ loop.* (including loop.cycle) and destructuring
set w/ namespaces & destructuring
macro / endmacro
filter / endfilter
Extensible filters collection: count, dictsort, equalto, e / escape, items, join, joiner, namespace, raise_exception, range, reject, tojson, trim

Main limitations (non-exhaustive list):

Not supporting most filters. Only the ones actually used in templates of major (or trendy) models are/will be implemented.
No difference between none and undefined
Single namespace with all filters / tests / functions / macros / variables
No tuples (templates seem to rely on lists only)
No if expressions w/o else (but if statements are fine)
No {% raw %}, {% block … %}, {% include … %}, `{% extends … %},

Roadmap / TODOs

[x] Fix known issues w/ CRLF on Windows
[ ] Integrate to llama.cpp: https://github.com/ggerganov/llama.cpp/pull/11016 + https://github.com/ggerganov/llama.cpp/pull/9639
Improve fuzzing coverage:
- use thirdparty jinja grammar to guide exploration of inputs (or implement prettification of internal ASTs and use them to generate arbitrary values)
- fuzz each filter / test
Measure / track test coverage
Setup performance tests
Simplify two-pass parsing
- Pass tokens to IfNode and such
Macro nested set scope = global?
Get listed in https://jbmoelker.github.io/jinja-compat-tests/, https://en.cppreference.com/w/cpp/links/libs

Developer corner

Design overview

minja::Parser does two-phased parsing:
- its tokenize() method creates coarse template "tokens" (plain text section, or expression blocks or opening / closing blocks). Tokens may have nested expressions ASTs, parsed with parseExpression()
- its parseTemplate() method iterates on tokens to build the final TemplateNode AST.
minja::Value represents a Python-like value
- It relies on nlohmann/json for primitive values, but does its own JSON dump to be exactly compatible w/ the Jinja / Python implementation of dict string representation
minja::chat_template wraps a template and provides an interface similar to HuggingFace's chat template formatting. It also normalizes the message history to accommodate different expectations from some templates (e.g. message.tool_calls.function.arguments is typically expected to be a JSON string representation of the tool call arguments, but some templates expect the arguments object instead)
Testing involves a myriad of simple syntax tests and full e2e chat template rendering tests. For each model in MODEL_IDS (see tests/CMakeLists.txt), we fetch the chat_template field of the repo's tokenizer_config.json, use the official jinja2 Python library to render them on each of the (relevant) test contexts (in tests/contexts) into a golden file, and run a C++ test that renders w/ Minja and checks we get exactly the same output.

Adding new Templates / Building

Install Prerequisites:
- cmake
- GCC / clang
- python 3.8+ (for tests)
- flake8
- editorconfig-checker
Optional: test additional templates:
- Add their HuggingFace model identifier to MODEL_IDS in tests/CMakeLists.txt (e.g. meta-llama/Llama-3.2-3B-Instruct)
- For gated models you have access to, first authenticate w/ HuggingFace:
```
pip install huggingface_hub
huggingface-cli login
```

Build & run tests (shorthand: ./scripts/run_tests.sh):

rm -fR build && \
    cmake -B build && \
    cmake --build build -j && \
    ctest --test-dir build -j --output-on-failure

Fuzzing tests

Note: fuzztest doesn't work natively on Windows or MacOS.

Show instructions to run it inside a Docker container

Beware of Docker Desktop's licensing: you might want to check out alternatives such as colima (we'll still use the docker client in the example below).

docker run --rm -it -v $PWD:/src:rw $( echo "
    FROM python:3.12-slim-bookworm
    COPY requirements.txt /tmp
    RUN apt update && \
        apt install -y cmake clang ccache git python3 python-is-python3 python3-pip && \
        apt-get clean && \
        rm -rf /var/lib/apt/lists/*
    RUN pip install setuptools pip --upgrade --force-reinstall
    RUN pip install -r /tmp/requirements.txt
    CMD /usr/bin/bash
    WORKDIR /src
" | docker build . -f - -q )

Build in fuzzing mode & run all fuzzing tests (optionally, set a higher TIMEOUT as env var):
```
./scripts/run_fuzzing_mode.sh
```

If your model's template doesn't run fine, please consider the following before opening a bug:
- Is the template using any unsupported filter / test / method / global function, and which one(s)?
- Is the template publicly available? Non-gated models are more likely to become supported.
- Which version of GCC / clang did you compile the tests with? On which OS version?
- If you intend to contribute a fix:
  - Please read CONTRIBUTING first. You'd have to sign a CLA, which your employer may need to accept.
  - Please test as many gated models as possible (use cmake -B build -DMINJA_TEST_GATED_MODELS=1 ... and edit MODEL_LIST appropriately)
For bonus points, check the style of your edits with:
```
flake8
editorconfig-checker
```

Security & Privacy

Data protection

This library doesn't store any data by itself, it doesn't access files or the web, it only transforms a template (string) and context (JSON w/ fields "messages", "tools"...) into a formatted string.

You should still be careful about untrusted third-party chat templates, as these could try and trigger bugs in Minja to exfiltrate user chat data (we only have limited fuzzing tests in place).

Risks are even higher with any user-defined functions.

Do NOT produce HTML or JavaScript with this!

HTML processing with this library is UNSAFE: no escaping of is performed (and the safe filter is a passthrough), leaving users vulnerable to XSS. Minja is not intended to produce HTML.

Beware of Prompt injection risks!

Prompt injection is NOT protected against by this library.

There are many types of prompt injection, some quite exotic (cf. data exfiltration exploits leveraging markdown image previews).

For the simpler cases, it is perfectly possible for a user to craft a message that will look like a system prompt, like an assistant response or like the results of tool calls. While some models might be fine-tuned to ignore system calls not at the very start of the prompt or out of order messages / tool call results, it is expected that most models will be very confused & successfully manipulated by such prompt injections.

Note that injection of tool calls should typically not result in their execution as LLM inference engines should not try to parse the template output (just generated tokens), but this is something to watch out for when auditing such inference engines.

As there isn't any standard mechanism to escape special tokens to prevent those attacks, it is advised users of this library take their own message sanitization measures before applying chat templates. We do not recommend any specific such measure as each model reacts differently (some even understand l33tcode as instructions).

For Tasks:

Click tags to check more tools for each tasks

render chat templates integrate with llm projects format json data generate text output parse jinja syntax

For Jobs:

software developer systems engineer ai engineer web developer data scientist

Alternative AI tools for minja

Similar Open Source Tools

minja

github

: 102

upgini

Upgini is an intelligent data search engine with a Python library that helps users find and add relevant features to their ML pipeline from various public, community, and premium external data sources. It automates the optimization of connected data sources by generating an optimal set of machine learning features using large language models, GraphNNs, and recurrent neural networks. The tool aims to simplify feature search and enrichment for external data to make it a standard approach in machine learning pipelines. It democratizes access to data sources for the data science community.

github

: 330

langserve

LangServe helps developers deploy `LangChain` runnables and chains as a REST API. This library is integrated with FastAPI and uses pydantic for data validation. In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.

github

: 1.9k

aire

Aire is a modern Laravel form builder with a focus on expressive and beautiful code. It allows easy configuration of form components using fluent method calls or Blade components. Aire supports customization through config files and custom views, data binding with Eloquent models or arrays, method spoofing, CSRF token injection, server-side and client-side validation, and translations. It is designed to run on Laravel 5.8.28 and higher, with support for PHP 7.1 and higher. Aire is actively maintained and under consideration for additional features like read-only plain text, cross-browser support for custom checkboxes and radio buttons, support for Choices.js or similar libraries, improved file input handling, and better support for content prepending or appending to inputs.

github

: 544

Gemini-API

Gemini-API is a reverse-engineered asynchronous Python wrapper for Google Gemini web app (formerly Bard). It provides features like persistent cookies, ImageFx support, extension support, classified outputs, official flavor, and asynchronous operation. The tool allows users to generate contents from text or images, have conversations across multiple turns, retrieve images in response, generate images with ImageFx, save images to local files, use Gemini extensions, check and switch reply candidates, and control log level.

github

: 160

DeepPavlov

DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.

github

: 6.6k

use-stick-to-bottom

github

: 221

zml

ZML is a high-performance AI inference stack built for production, using Zig language, MLIR, and Bazel. It allows users to create exciting AI projects, run pre-packaged models like MNIST, TinyLlama, OpenLLama, and Meta Llama, and compile models for accelerator runtimes. Users can also run tests, explore examples, and contribute to the project. ZML is licensed under the Apache 2.0 license.

github

: 2.2k

neocodeium

NeoCodeium is a free AI completion plugin powered by Codeium, designed for Neovim users. It aims to provide a smoother experience by eliminating flickering suggestions and allowing for repeatable completions using the `.` key. The plugin offers performance improvements through cache techniques, displays suggestion count labels, and supports Lua scripting. Users can customize keymaps, manage suggestions, and interact with the AI chat feature. NeoCodeium enhances code completion in Neovim, making it a valuable tool for developers seeking efficient coding assistance.

github

: 160

suno-api

Suno AI API is an open-source project that allows developers to integrate the music generation capabilities of Suno.ai into their own applications. The API provides a simple and convenient way to generate music, lyrics, and other audio content using Suno.ai's powerful AI models. With Suno AI API, developers can easily add music generation functionality to their apps, websites, and other projects.

github

: 1.7k

lantern

Lantern is an open-source PostgreSQL database extension designed to store vector data, generate embeddings, and handle vector search operations efficiently. It introduces a new index type called 'lantern_hnsw' for vector columns, which speeds up 'ORDER BY ... LIMIT' queries. Lantern utilizes the state-of-the-art HNSW implementation called usearch. Users can easily install Lantern using Docker, Homebrew, or precompiled binaries. The tool supports various distance functions, index construction parameters, and operator classes for efficient querying. Lantern offers features like embedding generation, interoperability with pgvector, parallel index creation, and external index graph generation. It aims to provide superior performance metrics compared to other similar tools and has a roadmap for future enhancements such as cloud-hosted version, hardware-accelerated distance metrics, industry-specific application templates, and support for version control and A/B testing of embeddings.

github

: 756

hayhooks

Hayhooks is a tool that simplifies the deployment and serving of Haystack pipelines as REST APIs. It allows users to wrap their pipelines with custom logic and expose them via HTTP endpoints, including OpenAI-compatible chat completion endpoints. With Hayhooks, users can easily convert their Haystack pipelines into API services with minimal boilerplate code.

github

: 51

llm-vscode

llm-vscode is an extension designed for all things LLM, utilizing llm-ls as its backend. It offers features such as code completion with 'ghost-text' suggestions, the ability to choose models for code generation via HTTP requests, ensuring prompt size fits within the context window, and code attribution checks. Users can configure the backend, suggestion behavior, keybindings, llm-ls settings, and tokenization options. Additionally, the extension supports testing models like Code Llama 13B, Phind/Phind-CodeLlama-34B-v2, and WizardLM/WizardCoder-Python-34B-V1.0. Development involves cloning llm-ls, building it, and setting up the llm-vscode extension for use.

github

: 1.1k

aiogram_dialog

Aiogram Dialog is a framework for developing interactive messages and menus in Telegram bots, inspired by Android SDK. It allows splitting data retrieval, rendering, and action processing, creating reusable widgets, and designing bots with a focus on user experience. The tool supports rich text rendering, automatic message updating, multiple dialog stacks, inline keyboard widgets, stateful widgets, various button layouts, media handling, transitions between windows, and offline HTML-preview for messages and transitions diagram.

github

: 657

HuggingFaceGuidedTourForMac

HuggingFaceGuidedTourForMac is a guided tour on how to install optimized pytorch and optionally Apple's new MLX, JAX, and TensorFlow on Apple Silicon Macs. The repository provides steps to install homebrew, pytorch with MPS support, MLX, JAX, TensorFlow, and Jupyter lab. It also includes instructions on running large language models using HuggingFace transformers. The repository aims to help users set up their Macs for deep learning experiments with optimized performance.

github

: 79

auto-playwright

Auto Playwright is a tool that allows users to run Playwright tests using AI. It eliminates the need for selectors by determining actions at runtime based on plain-text instructions. Users can automate complex scenarios, write tests concurrently with or before functionality development, and benefit from rapid test creation. The tool supports various Playwright actions and offers additional options for debugging and customization. It uses HTML sanitization to reduce costs and improve text quality when interacting with the OpenAI API.

github

: 298

For similar tasks

minja

github

: 102

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k