godot-llm

LLM in Godot

Stars: 80

Visit

Godot LLM is a plugin that enables the utilization of large language models (LLM) for generating content in games. It provides functionality for text generation, text embedding, multimodal text generation, and vector database management within the Godot game engine. The plugin supports features like Retrieval Augmented Generation (RAG) and integrates llama.cpp-based functionalities for text generation, embedding, and multimodal capabilities. It offers support for various platforms and allows users to experiment with LLM models in their game development projects.

README:

Godot LLM

Isn't it cool to utilize large language model (LLM) to generate contents for your game? LLM has great potential in NPC models, game mechanics and design assisting. Thanks for technology like llama.cpp, "small" LLM, such as llama-3-8B, run reasonably well locally on lower-end machine without a good GPU. I want to experiment LLM in Godot but I couldn't find any good library, so I decided to create one here.

⚠ While LLM is less controversial than image generation models, there can still be legal issues when LLM contents are integrated in games, I have created another page to document some relevant information

Quick start
Retrieval Augmented Generation (RAG)
Roadmap
Documentation
FAQ
Compile from Source

Quick Start

Install

Get Godot LLM directly from the asset library, or download the vulkan or cpu zip file from the release page, and unzip it to place it in the addons folder in your godot project
Now you should be able to see GdLlama, GdEmbedding, GDLlava, and LlmDB nodes in your godot editor. You can add them to a scene in Godot editor, or initialize themm directly by .new().

Text Generation: GDLlama node

Download a supported LLM model in GGUF format (recommendation: Meta-Llama-3-8B-Instruct-Q5_K_M.gguf), move the file to somewhere in your godot project
Set up your model with GDScript, point model_path to your GGUF file. The default n_predict = -1 generates an infinite sequence, we want it to be shorter here

func _ready():
    var gdllama = GDLlama.new()
    gdllama.model_path = "./models/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf" ##Your model path
    gdllama.n_predict = 20

Generate text starting from "Hello"

    var generated_text = gdllama.generate_text_simple("Hello")
    print(generated_text)

Text generation is slow, you may want to call gdllama.run_generate_text("Hello", "", "") to run the generation in background, then handle the generate_text_updated or generate_text_finished signals

    gdllama.generate_text_updated.connect(_on_gdllama_updated)
    gdllama.run_generate_text("Hello", "", "")

func _on_gdllama_updated(new_text: String):
    print(new_text)

Text Embedding: GDEmbedding node

Download a supported embedding model in GGUF format (recommendation: mxbai-embed-large-v1.Q5_K_M.gguf), move the file to somewhere in your godot project
Set up your model with GDScript, point model_path to your GGUF file

func _ready():
    var gdembedding= GDEmbedding.new()
    gdembedding.model_path = "./models/mxbai-embed-large-v1.Q5_K_M.gguf"

Compute the embedded vector of "Hello world" in PackedFloat32Array

    var array: PackedFloat32Array = gdembedding.compute_embedding("Hello world")
    print(array)

Compute the similarity between "Hello" and "World"

    var similarity: float = gdembedding.similarity_cos_string("Hello", "World")
    print(similarity)

Embedding computation can be slow, you may want to call gdembedding.run_compute_embedding("Hello world") or gdembedding.run_similarity_cos_string("Hello", "Worlld") to run the computation in background, then handle the compute_embedding_finished and similarity_cos_string_finished signals

    gdembedding.compute_embedding_finished.connect(_on_embedding_finished)
    gdembedding.run_compute_embedding("Hello world")

func _on_embedding_finished(embedding: PackedFloat32Array):
    print(embedding)

    gdembedding.similarity_cos_string_finished.connect(_on_embedding_finished)
    gdembedding.run_similarity_cos_string("Hello", "Worlld")

func _on_similarity_finished(similarity: float):
    print(similarity)

Note that the current implementation only allows one thread running per node, avoid calling 2 run_* methods consecutively:

    ## Don't do this, this will hang your UI
    gdembedding.run_compute_embedding("Hello world")
    gdembedding.run_similarity_cos_string("Hello", "Worlld")

Instead, always wait for the finished signal or check gdembedding.is_running() before calling a run_* function.

Multimodal Text Generation: GDLlava node

Download a supported multimodal model in GGUF format (recommendation: llava-phi-3-mini-int4.gguf), be aware that there are two files needed - a gguf language model and a mmproj model (typical name *mmproj*.gguf), move the files to somewhere in your godot project
Set up your model with GDScript, point model_path and mmproj_path to your corresponding GGUF files

func _ready():
    var gdllava = GDLlava.new()
    gdllava.model_path = "./models/llava-phi-3-mini-int4.gguf"
    gdllava.mmproj_path = "./models/llava-phi-3-mini-mmproj-f16.gguf"

Load an image (svg, png, or jpg, other format may also works as long as it is supported by Godot), or use your game screen (viewport) as a image

    var image = Image.new()
    image.load("icon.svg")

    ## Or load the game screen instead
    #var image = get_viewport().get_texture().get_image()

Generate text to provide "Provide a full description" for the image

    var generated_text = gdllava.generate_text_image("Provide a full description", image)
    print(generated_text)

Text generation is slow, you may want to call gdllama.run_generate_text("Hello", "", "") to run the generation in background, then handle the generate_text_updated or generate_text_finished signals

    gdllava.generate_text_updated.connect(_on_gdllava_updated)
    gdllava.run_generate_text_image("Provide a full description", image)

func _on_gdllava_updated(new_text: String):
    print(new_text)

Vector database: LlmDB node

LlmDB node extends GDEmbedding node, follow the previous section to download a model and set up the model_path

func _ready():
	var db = LlmDB.new()
	db.model_path = "./models/mxbai-embed-large-v1.Q5_K_M.gguf"

Open a database, which creates a llm.db file and connect to it by default

	db.open_db()

Set up the structure of the metadata of your textual data, the first metadata should always be an id field with String as the data type, here we use the LlmDBMetaData.create_text, LlmDBMetaData.create_int, and LlmDBMetaData.create_real functions to define the structure of metadata with the corresponding data type.

	db.meta = [
		LlmDBMetaData.create_text("id"),
		LlmDBMetaData.create_int("year"),
        LlmDBMetaData.create_real("attack")
	]

Different models create embedding vectors of different sizes, calibrate the embedding_size property before creating tables

    db.calibrate_embedding_size()

Create tables based on the metadata, By default, these table are created:

llm_table_meta: which store the metadata for a particular id
llm_table: store texts with metadata and embedding
Some tables with names containing llm_table_virtual: tables for embedding similarity computation

Note that your .meta property should always match the metadata columns in the database before any storing or retrieving operation, consider setting your .meta property within the _ready() function or within the inspector.

    db.create_llm_tables()

Store a piece of text with metadata dictionary specifying the year, note that you can leave out some of the metadata if it is not relevant to the text. If the input text is longer than chunk_size, the function will automatically break it down into smaller pieces to fit in the chunk_size.

    var text = "Godot is financially supported by the Godot Foundation, a non-profit organization formed on August 23rd, 2022 via the KVK (number 87351919) in the Netherlands. The Godot Foundation is responsible for managing donations made to Godot and ensuring that such donations are used to enhance Godot. The Godot Foundation is a legally independent organization and does not own Godot. In the past, the Godot existed as a member project of the Software Freedom Conservancy."

    db.store_text_by_meta({"year": 2024}, text)

Retrieve 3 of the most similar text chunks to godot where the year is 2024:

    print(db.retrieve_similar_texts("godot", "year=2024", 3))

Depending on the embedding model, storing and retrieving can be slow, consider using the run_store_text_by_meta function, run_retrieve_similar_texts function, and the retrieve_similar_text_finished signal to store and retrieve texts in background. Also, call close_db() when the database is no longer in use.

Template/Demo

The godot-llm-template provides a rather complete demonstration on different functionalities of this plugin

Retrieval-Augmented Generation (RAG)

This plugin now has all the essentaial components for simple Retrieval-Augmented Generation (RAG). You can store information about your game world or your character into the vector database, retrieve relevant texts to enrich your prompt, then generate text for your game, the generated text can be stored back to the vector database to enrich future prompt. RAG complement the shortcoming of LLM - the limited context size force the model to forget earlier information, and with RAG, information can be stored in a database to become long-term memory, and only relevant information are retrieve to enrich the prompt to keep the prompt within the context size.

To get started, you may try the following format for your prompt input:

Document:
{retrieved text}

Question:
{your prompt}

Roadmap

Features

Platform (backend): windows (cpu, vulkan), macOS(cpu, metal), Linux (cpu, vulkan), Android (cpu)
- macOS support is on best effort basis since I don't have a mac myself
llama.cpp-based features
- Text generation
- Embedding
- Multimodal
Vector database integration based on sqlite and sqlite-vec
- Split text to chunks
- Store text embedding
- Associate metadata with text
- Retrieve text by embedding similarity and sql constraints on metadata

TODO

iOS: build should be trivial, but an apple developer ID is needed to run thhe build
Add in-editor documentation, waiting for proper support in Godot 4.3
Add utility functions to generate useful prompts, such as llama guard 2
Download models directly from huggingface
Automatically generate json schema from data classes in GDSCript
More llama.cpp features
mlc-llm integration
Any suggestion?

Documentation

Inspector Properties: GDLlama, GDEmbedding, and GDLlava

There are 3 base nodes added by this plugin: GdLlama, GdEmbedding, and GdLlava. Each type of node owns a set of properties which affect the computational performance and the generated output. Some of the properties belong to more than one node, and they generally have similar meaning for all types of node.

Model Path: location of your GGUF model
Mmproj Path location of your mmproj GGUF file, for GdLlava only
Instruct: question and answer interactive mode
Interactive: custom interactive mode, you should set your reverse_prompt, input_prefix, and input_suffix to set up a smooth interaction
Reverse Prompt: AI stops to wait for user input after seeing this prompt being generated, a good example is "User:"
Input Prefix: append before every user input
Input Suffix: append after every user input
Should Output prompt: whether the input prompt should be included in the output
Should Output Special: whether the special (e.g., beginning of sequence and ending of sequence) token should be included in the output
Context Size: number of tokens the model can process at a time
N Predict: number of new tokens to generate, generate infinite sequence if -1
N Keep: when the model run out of context size, it starts to forget about earlier context, set this variable to force the model to keep a number of the earliest tokens to keep the conversation relevant
Temperature: the higher the temperature, the more random the generated text
Penalty Repeat: penalize repeated sequence, diabled if -1
Penalty Last N: the number of latest token to consider when penalizing repeated sequence, disabled if 0, Context Size if -1
Penalilze Nl: penallize newline token
Top K: only sample from this amount of tokens with the highest probabilities, disabled if 0
Top P: only sample from tokens within this cumulative probability, disabledd if 1.0
Min P: only sample from tokens with at least this probability, disabledd if 0.0
N Thread: number of cpu threads to use
N GPU Layer: number of layer offloaded to GPU
Main GPU: the main GPU for computation
Split Mode: how the computation will be distributed if there are multiple GPU in your systemm (0: None, 1: Layer, 2: Row)
Escape: process escape character in input prompt
N Batch: maximum number of tokens per iteration during continuous batching
N Ubatch: maximum batch size for computation

GdLlama functions and signals

Functions

generate_text_simple(prompt: String) -> String: generate text from prompt
generate_text_json(prompt: String, json: String) -> String: generate text in a format enforced by a json schema, see the following section
generate_text_grammar(prompt: String, grammar: String) -> String: generate text in a format enforced by GBNF grammar
generate_text(prompt: String, grammar: String, json: String) -> String: a wrapper function, run generate_text_gramma if grammar is non-empty, runs generate_text_json if json is non-empty, run generate_text_simple otherwise
run_generate_text(prompt: String, grammar: String, json: String) -> Error: run generate_text in background, rely on signals to recieve generated text, note that only one background thread is allowd for a GDLlama node, calling this function when the background thread is still running will freeze the logic until the background thread is done
input_text(input: String): input text to interactively generate text (with either Instruct or Interactive enabled) with the model, only works if the model is waiting for intput, inputing an empty string means the model should continue to generate what it has been generating
stop_generate_text(): stop text generation, clean up the model and the background thread
is_running() -> bool: whether the background thread is running
is_waiting_input() -> bool: whether the model is waiting for input text (with either Instruct or Interactive enabled)

Signals

generate_text_finished(text: String) : emitted with the full generated text when a text generation is completed. When either Instruct or Interactive enabled, this signal is emitted after the whole interaction is finished
generate_text_updated(new_text: String): instead of waiting the full generated text, this signal is emited whenever a new token (part of the text sequence) is generated, which forms a stream of strings
input_wait_started(): the model is now starting to wait for user input, happens when either Instruct or Interactive are enabled and the model stop generating text in the middle of the conversation to wait for further input from the user.

GDEmbedding functions and signals

Functions

compute_embedding(prompt: String) -> PackedFloat32Array: compute the embedding vector of a prompt
similarity_cos_array(array1: PackedFloat32Array, array2: PackedFloat32Array) -> float: compute the cosine similarity between two embedding vectors, this is a fast function, no model is loaded
similarity_cos_string(s1: String, s2: String) -> float: compute the cosine similarity between two strings
run_compute_embedding(prompt: String) -> Error: run compute_embedding(prompt: String) in background, rely on the compute_embedding_finished signal to recieve the embedding vector, note that only one background thread is allowd for a GDEmbedding node, calling this function when the background thread is still running will freeze the logic until the background thread is done
run_similarity_cos_string(s1: String, s2: String) -> Error: run similarity_cos_string in background, rely on the compute_similairty_finished signal to recieve the cosine similairty, note that only one background thread is allowd for a GDEmbedding node, calling this function when the background thread is still running will freeze the logic until the background thread is done
is_running() -> bool: whether the background thread is running

Signals

compute_embedding_finished(embedding: PackedFloat32Array): emitted when run_compute_embedding is completed
similarity_cos_string_finished(similarity: float): emitted when run_similarity_cos_string is completed

GDLlava functions and signals

functions

generate_text_base64(prompt: String, image_base64: String) -> String: generate text based on a prompt and a base64 string which encodes a jpg or png image
generate_text_image(prompt: String, image: Image) -> String: generate text based on a prompt and an Image object in Godot
run_generate_text_base64(prompt: String, image_base64: String) -> Error: run generate_text_base64 in background, rely on signals to recieve generated text, note that only one background thread is allowd for a GDLlava node, calling this function when the background thread is still running will freeze the logic until the background thread is done
run_generate_text_base64(prompt: String, image: Image) -> Error: run generate_text_base64 in background, rely on signals to recieve generated text, note that only one background thread is allowd for a GDLlava node, calling this function when the background thread is still running will freeze the logic until the background thread is done
stop_generate_text(): stop text generation, clean up the model and the background thread
is_running() -> bool: whether the background thread is running

Signals

generate_text_finished(text: String) : emitted with the full generated text when a text generation is completed
generate_text_updated(new_text: String): instead of waiting the full generated text, this signal is emited whenever a new token (part of the text sequence) is generated, which forms a stream of strings

Text generation with Json schema

Suppose you want to generate a character with:

name: a string from 3 character to 20 character
birthday: a string with a specific date format
weapon: either "sword", "bow", or "wand
description: a text with minimum 10 character

You should first create a GDLlama node, and turn Should Output prompt and Should Output Special off either by inspector or by script:

should_output_prompt = false
should_output_special = false

Construct the following _person_schema dictionary in GDScript:

var _person_schema = {
	"type": "object",
	"properties": {
		"name": {
			"type": "string",
			"minLength": 3,
			"maxLength": 20,
		},
		"birthday": {
			"type": "string",
			"format": "date"
		},
		"weapon": {
			 "enum": ["sword", "bow", "wand"],
		},
		"description": {
			"type": "string",
			"minLength": 10,
		},
	},
	"required": ["name", "birthday", "weapon", "description"]
}

Then convert it to a json string

var person_schema: String = JSON.stringify(_person_schema)

Supposed you are interested in a "Main character in a magic world", you can generate the character using the generate_text_json(prompt, json_scheme) of the GDLlama node:

var json_string: String = generate_text_json(prompt, json_scheme)

Note that text generation is slow, you may want to use run_generate_text(prompt, "", json_scheme) to run the generation in background, then handle generate_text_finished to receive the generated text.

json_string should look like this:

{"birthday": "2000-05-12", "description": "A young wizard with a pure heart and a mischievous grin. He has a wild imagination and a love for adventure. He is always up for a challenge and is not afraid to take risks.", "name": "Eryndor Thorne", "weapon": "wand"}

Now, the generated data is ready, you can parse back to a dictionary or other object to use the data.

var dict: Dictionary = {}
var json = JSON.new()
var error = json.parse(json_string)
if (error == OK):
		dict = json.data

print(dict["name"]) ##Eryndor Thorne

Inspector properties: LlmDB

LlmDB extends GDEmbedding and shares all its properties, check the section above for the relevant information. Additionally, LlmDB has

Meta: an array of LlmDBMetaData Resource which defines the structure of the metadata. LlmDBMetaData contains Data Name which define the name of a metadata, and Data Type (0=integer, 1=real, 2=text, 3=blob) to define the data type of the metadata. Meta should be non-empty, and the first element of Meta should always be an id with text as the Data Type.
dB Dir: the directory of the database file, default is the root directory of the project
dB File: the file name of the database file, default is llm.db
Table Name: defines the name of the tables created by the create_llm_tables function
Embedding Size: the vector size of the embedding computed by the model, used in the create_llm_tables function
Absolute Separators: an array of String. When storing a piece of text, the text will be first separated by the String defines here, the separation process will stop if the separated text is shorter than Chunk Size or all the separators here have been processed. The default are \n and \n\n, which are displayed as empty space in the inspector.
Chunk Separators: an array of String. After the Absolute Separators are processed, one of the separators (first one that works) here will be chosen to further separated the piece of texts, then the pieces are grouped up to chunks to fulfill the requirements of Chunk Size and Chunk Overlap
Chunk Size: any text chunk should not exceed this size, unless the separation function fails to fulfill the requirement after iteratoring through the iterators
Chunk Overlap: the maximum overlap between neighbouring text chunks, the algorithm will try to create the biggest overlap possible fulfilling this constraint

LlmDB Functions and Signals

Besides the functions and signals from GDEmbedding, LlmDB has a few more functions and signals

Functions

calibrate_embedding_size(): calibrate Embedding Size to the correct number based on the model in model_path
open_db(): create a dB_File at dB_Dir if the file doesn't exist, then connect to the database
close_db(): terminate the connection to the database
execute(statement: String) execute an sql statement, turn on Verbose stdout in Project Settings to see the log generated by this statement
create_llm_tables(): create a table with name Table Name if the table doesn't exist, a Table Name + _meta table to store pre-defined metadata by id, and some _virtual tables to
drop_table(p_table_name: String): drop a table with a specific name
drop_llm_tables(p_table_name: String): drop all tables (except the sqlite_sequence table which is created automatically for autoincrement) created by create_llm_tables(), i.e., p_table_name, p_table_name + _meta and every table with a name containing p_table_name + _virtual
has_table(p_table_name: String) -> bool: whether a table with this name exists
is_table_valid(p_table_name: String) -> bool: whether the table contains valid metadata, i.e., all elements in .meta properties exist in the table and the data types are correct
store_meta(meta_dict: Dictionary): store a set of meta data to table Table Name + _meta with id as the primary key, such that you can call store_text_by_id by id instead of inputting the full metadata dictionary through store_text_by_meta
has_id(id: String, p_table_name: String) -> bool: whether the table has a specific id stored
split_text(text: String) -> PackedStringArray: split a piece of text first by all Absolute Separators, then by one of the appropiate Chunk Separators, such that any text chunk is shorter than Chunk Size (measured in character), and the overlap is close to but not greater than Chunk Overlap. If the algorithm failed to satisfy the contraints, there will be an error message printed out and the returned chunk will be greater than the Chunk Size
store_text_by_id(id: String, text: String): split the text and store the chunks in the database, be aware that store_meta should have been called previously such that the id with the corresponding meta is already in the database
run_store_text_by_id(id: String, text: String) -> Error: run store_text_by_id in background, emits store_text_finished signal when finished
store_text_by_meta(meta_dict: Dictionary, text: String): split the text and store the chunks in the database with the metadata defined in meta_dict, be aware that the metadata should be valid, every key should be a name stored in the .meta property and the corresponding type should be correct
run_store_text_by_meta(meta_dict: Dictionary, text: String) -> Error run store_text_by_meta in background, emits store_text_finished signal when finished
retrieve_similar_texts(text: String, where: String, n_results: int) -> PackedStringArray: retrieve n_results most similar text chunks to text, where should be empty or an sql WHERE clause to filter the chunks by metadata
run_retrieve_similar_texts(text: String, where: String, n_results: int) -> Error: run retrieve_similar_texts in background, and emits a retrieve_similar_texts_finished signal once it is done

Signals

store_text_finished: emitted when run_store_text_by_id or run_store_text_by_meta is finished
retrieve_similar_texts_finished(array: PackedStringArray): contains an array of String, emitted when run_retrieve_similar_texts is finished

LlmDBMetaData

This is a simple resource class that forms the meta array property in LlmDB. It has two properties:

data_name: a String that defines the name of this metadata
data_type: an int that defines the data type of this metadata (0=integer, 1=real, 2=text, 3=blob), note that inputing an integer here is not recommended since it can be confusing, use the inspector properties, the LlmDBMetaData enum or the function below instead
LlmDBMetaDataType enum:
- LlmDBMetaData.INTEGER = 0
- LlmDBMetaData.REAL = 1
- LlmDBMetaData.TEXT = 2
- LlmDBMetaData.BLOB = 3

There are 4 static functions to create LlmDBMetaData

create_int(data_name: String) -> LlmDBMetaData: create a LlmDBMetaData with type int (0)
create_real(data_name: String) -> LlmDBMetaData: create a LlmDBMetaData with type real (1)
create_text(data_name: String) -> LlmDBMetaData: create a LlmDBMetaData with type text (2)
create_blob(data_name: String) -> LlmDBMetaData: create a LlmDBMetaData with type blob (3), note that blob data type support is still a work-in-progress

Alternatively, you can use this static function to create LlmDBMetaData

create(data_name: String, data_type: int) -> LlmDBMetaData: create a corresponding LlmDBMetaData by data_name and data_type, it is recommended to use the enum instead of int for data_type

FAQ

How to get more debug message?

Turn on Verbose stdout in Project Settings, consider running Godot from a terminal to get additional logging messages.

Does it support languages other than English?

Yes, the plugin uses utf8 encoding so it has multilingual support naturally. However, a language model may be trained with English data only and it won't be able to generate text other than English, choose the language model based on your need.

Strange tokens in generated text, such as <eot_id> when Should Output Special is off.

You are always welcome to open an issue. However, be aware that the standard of GGUF format can be changed to support new features and models, such that the bug can come from the model side instead of within this plugin. For example, some older llama 3 GGUF model may not be compatible with the latest format, you may try to search for a newer model with fixes such as this.

You are running Arch linux (or its derivatives such as Manjaro) and your Godot Editor crash.

The Arch build of Godot is bugged when working with GDExtension, download Godot from the official website instead.

You have a discrete GPU and you see unable to load model error, you have make sure that the model parameters are correctly set.

There is currently a bug on vulkan backend if you have multiple drivers installed for the same GPU, try to turn Split Mode to NONE (0) and set your Main GPU manually (starting from 0) to see if it works.

Compile from source

Install build tools and Vulkan SDK for your operating system, then clone this repository

git clone https://github.com/Adriankhl/godot-llm.git
cd godot-llm
git submodule update --init --recursive
mkdir build
cd build

Run cmake.

On Windows:

cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DCMAKE_BUILD_TYPE=Release

On Linux:

cmake .. -GNinja -DLLAMA_NATIVE=OFF -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Release

Vulkan build works for Windows and Linux, if you want a cpu build, set -DLLAMA_VULKAN=OFF instead.

For Android, set $NDK_PATH to your android ndk directory, then:

cmake .. -GNinja -DCMAKE_TOOLCHAIN_FILE=$NDK_PATH\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS="-mcpu=generic" -DCMAKE_CXX_FLAGS="-mcpu=generic" -DCMAKE_BUILD_TYPE=Release

You may want to adjust the compile flags for Android to suit different types of CPU.

Then compile and install by ninja:

ninja -j4
ninja install

The folder ../install/gpu/addons/godot_llm (cpu instead of gpu for cpu build) can be copy directly to the addons folder of your godot project.

Contributing

PRs are welcome

For Tasks:

Click tags to check more tools for each tasks

generate game content embed text perform similarity comparison store and retrieve text experiment with llm models

For Jobs:

game developer ai engineer software engineer data scientist content creator

Alternative AI tools for godot-llm

Similar Open Source Tools

godot-llm

github

: 80

aiolauncher_scripts

AIO Launcher Scripts is a collection of Lua scripts that can be used with AIO Launcher to enhance its functionality. These scripts can be used to create widget scripts, search scripts, and side menu scripts. They provide various functions such as displaying text, buttons, progress bars, charts, and interacting with app widgets. The scripts can be used to customize the appearance and behavior of the launcher, add new features, and interact with external services.

github

: 81

Hurley-AI

Hurley AI is a next-gen framework for developing intelligent agents through Retrieval-Augmented Generation. It enables easy creation of custom AI assistants and agents, supports various agent types, and includes pre-built tools for domains like finance and legal. Hurley AI integrates with LLM inference services and provides observability with Arize Phoenix. Users can create Hurley RAG tools with a single line of code and customize agents with specific instructions. The tool also offers various helper functions to connect with Hurley RAG and search tools, along with pre-built tools for tasks like summarizing text, rephrasing text, understanding memecoins, and querying databases.

github

: 175

py-vectara-agentic

The `vectara-agentic` Python library is designed for developing powerful AI assistants using Vectara and Agentic-RAG. It supports various agent types, includes pre-built tools for domains like finance and legal, and enables easy creation of custom AI assistants and agents. The library provides tools for summarizing text, rephrasing text, legal tasks like summarizing legal text and critiquing as a judge, financial tasks like analyzing balance sheets and income statements, and database tools for inspecting and querying databases. It also supports observability via LlamaIndex and Arize Phoenix integration.

github

: 98

codespin

CodeSpin.AI is a set of open-source code generation tools that leverage large language models (LLMs) to automate coding tasks. With CodeSpin, you can generate code in various programming languages, including Python, JavaScript, Java, and C++, by providing natural language prompts. CodeSpin offers a range of features to enhance code generation, such as custom templates, inline prompting, and the ability to use ChatGPT as an alternative to API keys. Additionally, CodeSpin provides options for regenerating code, executing code in prompt files, and piping data into the LLM for processing. By utilizing CodeSpin, developers can save time and effort in coding tasks, improve code quality, and explore new possibilities in code generation.

github

: 60

olah

Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.

github

: 132

MiniAgents

github

: 93

chatlas

Chatlas is a Python tool that provides a simple and unified interface across various large language model providers. It helps users prototype faster by abstracting complexity from tasks like streaming chat interfaces, tool calling, and structured output. Users can easily switch providers by changing one line of code and access provider-specific features when needed. Chatlas focuses on developer experience with typing support, rich console output, and extension points.

github

: 67

partial-json-parser-js

Partial JSON Parser is a lightweight and customizable library for parsing partial JSON strings. It allows users to parse incomplete JSON data and stream it to the user. The library provides options to specify what types of partialness are allowed during parsing, such as strings, objects, arrays, special values, and more. It helps handle malformed JSON and returns the parsed JavaScript value. Partial JSON Parser is implemented purely in JavaScript and offers both commonjs and esm builds.

github

: 62

chatgpt-subtitle-translator

This tool utilizes the OpenAI ChatGPT API to translate text, with a focus on line-based translation, particularly for SRT subtitles. It optimizes token usage by removing SRT overhead and grouping text into batches, allowing for arbitrary length translations without excessive token consumption while maintaining a one-to-one match between line input and output.

github

: 295

hordelib

horde-engine is a wrapper around ComfyUI designed to run inference pipelines visually designed in the ComfyUI GUI. It enables users to design inference pipelines in ComfyUI and then call them programmatically, maintaining compatibility with the existing horde implementation. The library provides features for processing Horde payloads, initializing the library, downloading and validating models, and generating images based on input data. It also includes custom nodes for preprocessing and tasks such as face restoration and QR code generation. The project depends on various open source projects and bundles some dependencies within the library itself. Users can design ComfyUI pipelines, convert them to the backend format, and run them using the run_image_pipeline() method in hordelib.comfy.Comfy(). The project is actively developed and tested using git, tox, and a specific model directory structure.

github

: 56

AirspeedVelocity.jl

AirspeedVelocity.jl is a tool designed to simplify benchmarking of Julia packages over their lifetime. It provides a CLI to generate benchmarks, compare commits/tags/branches, plot benchmarks, and run benchmark comparisons for every submitted PR as a GitHub action. The tool freezes the benchmark script at a specific revision to prevent old history from affecting benchmarks. Users can configure options using CLI flags and visualize benchmark results. AirspeedVelocity.jl can be used to benchmark any Julia package and offers features like generating tables and plots of benchmark results. It also supports custom benchmarks and can be integrated into GitHub actions for automated benchmarking of PRs.

github

: 110

py-gpt

github

: 785

datadreamer

DataDreamer is an advanced toolkit designed to facilitate the development of edge AI models by enabling synthetic data generation, knowledge extraction from pre-trained models, and creation of efficient and potent models. It eliminates the need for extensive datasets by generating synthetic datasets, leverages latent knowledge from pre-trained models, and focuses on creating compact models suitable for integration into any device and performance for specialized tasks. The toolkit offers features like prompt generation, image generation, dataset annotation, and tools for training small-scale neural networks for edge deployment. It provides hardware requirements, usage instructions, available models, and limitations to consider while using the library.

github

: 77

gen.nvim

gen.nvim is a tool that allows users to generate text using Language Models (LLMs) with customizable prompts. It requires Ollama with models like `llama3`, `mistral`, or `zephyr`, along with Curl for installation. Users can use the `Gen` command to generate text based on predefined or custom prompts. The tool provides key maps for easy invocation and allows for follow-up questions during conversations. Additionally, users can select a model from a list of installed models and customize prompts as needed.

github

: 1.1k

bonito

Bonito is an open-source model for conditional task generation, converting unannotated text into task-specific training datasets for instruction tuning. It is a lightweight library built on top of Hugging Face `transformers` and `vllm` libraries. The tool supports various task types such as question answering, paraphrase generation, sentiment analysis, summarization, and more. Users can easily generate synthetic instruction tuning datasets using Bonito for zero-shot task adaptation.

github

: 742

For similar tasks

R2R

R2R (RAG to Riches) is a fast and efficient framework for serving high-quality Retrieval-Augmented Generation (RAG) to end users. The framework is designed with customizable pipelines and a feature-rich FastAPI implementation, enabling developers to quickly deploy and scale RAG-based applications. R2R was conceived to bridge the gap between local LLM experimentation and scalable production solutions. **R2R is to LangChain/LlamaIndex what NextJS is to React**. A JavaScript client for R2R deployments can be found here. ### Key Features * **🚀 Deploy** : Instantly launch production-ready RAG pipelines with streaming capabilities. * **🧩 Customize** : Tailor your pipeline with intuitive configuration files. * **🔌 Extend** : Enhance your pipeline with custom code integrations. * **⚖️ Autoscale** : Scale your pipeline effortlessly in the cloud using SciPhi. * **🤖 OSS** : Benefit from a framework developed by the open-source community, designed to simplify RAG deployment.

github

: 5.9k

generative-ai-go

The Google AI Go SDK enables developers to use Google's state-of-the-art generative AI models (like Gemini) to build AI-powered features and applications. It supports use cases like generating text from text-only input, generating text from text-and-images input (multimodal), building multi-turn conversations (chat), and embedding.

github

: 557

infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. It is developed under the MIT License and powers inference behind Gradient.ai. The API allows users to deploy models from SentenceTransformers, offers fast inference backends utilizing various accelerators, dynamic batching for efficient processing, correct and tested implementation, and easy-to-use API built on FastAPI with Swagger documentation. Users can embed text, rerank documents, and perform text classification tasks using the tool. Infinity supports various models from Huggingface and provides flexibility in deployment via CLI, Docker, Python API, and cloud services like dstack. The tool is suitable for tasks like embedding, reranking, and text classification.

github

: 1.7k

godot-llm

github

: 80

WordLlama

WordLlama is a fast, lightweight NLP toolkit optimized for CPU hardware. It recycles components from large language models to create efficient word representations. It offers features like Matryoshka Representations, low resource requirements, binarization, and numpy-only inference. The tool is suitable for tasks like semantic matching, fuzzy deduplication, ranking, and clustering, making it a good option for NLP-lite tasks and exploratory analysis.

github

: 1.4k

chonkie

Chonkie is a lightweight and fast RAG chunking library designed to efficiently split text for RAG (Retrieval-Augmented Generation) applications. It offers various chunking methods like TokenChunker, WordChunker, SentenceChunker, SemanticChunker, SDPMChunker, and an experimental LateChunker. Chonkie is feature-rich, easy to use, fast, supports multiple tokenizers, and comes with a cute pygmy hippo mascot. It aims to provide a no-nonsense solution for chunking text without the need to worry about dependencies or bloat.

github

: 2.8k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

godot-llm

README:

Godot LLM

Table of Contents

Quick Start

Install

Text Generation: GDLlama node

Text Embedding: GDEmbedding node

Multimodal Text Generation: GDLlava node

Vector database: LlmDB node

Template/Demo

Retrieval-Augmented Generation (RAG)

Roadmap

Features

TODO

Documentation

Inspector Properties: GDLlama, GDEmbedding, and GDLlava

GdLlama functions and signals

Functions

Signals

GDEmbedding functions and signals

Functions

Signals

GDLlava functions and signals

functions

Signals

Text generation with Json schema

Inspector properties: LlmDB

LlmDB Functions and Signals

Functions

Signals

LlmDBMetaData

FAQ

Compile from source

Contributing

For Tasks:

For Jobs:

Alternative AI tools for godot-llm

Similar Open Source Tools

godot-llm

aiolauncher_scripts

Hurley-AI

py-vectara-agentic

codespin

olah

MiniAgents

chatlas

partial-json-parser-js

chatgpt-subtitle-translator

hordelib

AirspeedVelocity.jl

py-gpt

datadreamer

gen.nvim

bonito

For similar tasks

R2R

generative-ai-go

infinity

godot-llm

WordLlama

chonkie

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape