
illume
scriptable command line program for LLM interfacing
Stars: 78

Illume is a scriptable command line program designed for interfacing with an OpenAI-compatible LLM API. It acts as a unix filter, sending standard input to the LLM and streaming its response to standard output. Users can interact with the LLM through text editors like Vim or Emacs, enabling seamless communication with the AI model for various tasks.
README:
A unix filter for talking to an OpenAI-compatible LLM API. Sends standard input to the LLM and streams its response to standard output. In a text editor like Vim, send your buffer into the program via standard input and append its output to your buffer.
With Go 1.10 or later:
$ go build illume.go
Then place illume
on your $PATH
.
A couple of examples running outside of a text editor:
$ illume <request.md >response.md
$ illume <chat.md | tee -a chat.md
illume.vim
has a Vim configuration for interacting with live output:
-
Illume()
: complete the end the buffer (chat,!completion
) -
IllumeInfill()
: generate code at the cursor -
IllumeStop()
: stop generation in this buffer
illume.el
is similar for Emacs: M-x illume
and M-x illume-stop
.
Use !context
to select files to upload as context. These are uploaded in
full, mind the token limit and narrow the context as needed by pointing to
subdirectories or temporarily deleting files. Put !user
on its own line,
then your question:
!context /path/to/repository .py .sql
!user
Do you suggest any additional indexes?
Sending this to illume
retrieves a reply:
!context /path/to/repository .py .sql
!user
Do you suggest any additional indexes?
!assistant
Yes, your XYZ table...
Add your response with another !user
:
!context /path/to/repository .py .sql
!user
Do you suggest any additional indexes?
!assistant
Yes, your XYZ table...
!user
But what about ...?
Rinse and repeat. The text file is the entire state of the conversation.
Alternatively the LLM can continue from text of your input using the
!complete
directive.
!completion
The meaning of life is
Do not use !user
nor !assistant
in this mode, but the other options
still work.
If the input contains !infill
by itself, Illume operates in infill mode.
Output is to be inserted in place of !infill
, i.e. code generation. By
default it will use the llama.cpp /infill
endpoint, which requires a
FIM-trained model with metadata declaring its FIM tokens. This excludes
most models, including most "coder" models due to missing metadata. There
are currently no standards and few conventions around FIM, and every model
implements it differently.
Given an argument, it is memorized as the template, replacing {prefix}
and {suffix}
with the surrounding input. For example, including a
leading space in the template:
!infill <PRE> {prefix} <SUF>{suffix} <MID>
Write this template according to the model's FIM documentation. Illume
includes built-in fim:MODEL
templates for several popular models. This
form of !infill
only configures, and does not activate infill mode on
its own. Put it in a profile.
For example, to generate FIM completions on a remote DeepSeek model running on llama.cpp, your Illume profile file might be something like:
!profile llama.cpp
!profile fim:deepseek
!api http://myllama:8080/
With illume.vim
, do not type a no-argument !infill
directive yourself.
The configuration automatically inserts it into Illume's input at the
cursor position.
Recommendation: DeepSeek produces the best FIM output, followed by
Qwen and Granite. All three work out-of-the-box with llama.cpp /infill
,
but work best with an Illume FIM profile.
$ILLUME_PROFILE
sets the default profile. The default profile is like an
implicit !profile
when none is specified. A profile sets the URL, extra
keys, HTTP headers, or even a system prompt. Illume supplies many built-in
profiles: see Profiles
in the source. If the profile name contains a
slash, the profile is read from that file. Otherwise it's matched against
a built-in profile, or a file with a .profile
suffix next to the Illume
executable.
An !error
"directive" appears in error output, but it's not processed on
input. Everything before !user
and !assistant
are in the "system"
role, which is where you can write a system prompt.
Load a profile. JSON !:KEY
directives in the profile do not override
user-set keys. If no !profile
is given, Illume loads $ILLUME_PROFILE
if set, otherwise it loads the default profile.
Sets the API base URL. When not llama.cpp, it typically ends with /v1
or
/v2
. Illume interpolates {…}
in the URL from !:KEY
directives. It's
done just before making the request, and so may reference keys set after
the !api
directive. Examples:
!api https://api-inference.huggingface.co/models/{model}/v1
!:model mistralai/Mistral-Nemo-Instruct-2407
If the URL is wrapped in quotes, it will be used literally as provided without modification.
Insert a file at this position in the conversation.
Include all files under DIR with matching file name suffixes. Only
relative names are sent, but the last element of DIR
is included in this
relative path if it does not end with a slash. Files can be included in
any role, not just the system prompt.
Marks the following lines as belonging to a user message. You can modify these to trick the LLM into thinking you said something different in the past.
Marks the following lines as belonging to an assistant message. You can modify these to trick the LLM into thinking it said something different.
These lines are not sent to the LLM. Used to annotate conversations.
Discard all messages before this line. Used to "comment out" headers in the input, e.g. when composing email. Directives before this line are still effective.
Stop processing directives and ignore the rest of the input.
Insert an arbitrary JSON value into the query object. Examples:
!:temperature 0.3
!:model mistralai/Mistral-Nemo-Instruct-2407
!:stop ["<|im_end|>"]
If VALUE
is missing, the key is deleted instead. If it cannot be parsed
as JSON, it's passed through as a string. If it looks like JSON but should
be sent as string data, wrap it in quotes to turn it into a JSON string.
Insert an arbitrary HTTP header into the request. Examples:
!>x-use-cache false
!>user-agent My LLM Client 1.0
!>authorization
If VALUE
is missing, the header is deleted. This is, for instance, a
second for disabling the API token, as shown in the example. If the value
contains $VAR
then Illume will expand it from the environment.
Use completion mode instead of conversational. The LLM will continue
writing from the end of the document. Cannot be used with !user
or
!assistant
, which are for the (default) chat mode.
With no template, activate infill mode, and generate code to be inserted at this position. Given a template, use that template to generate the prompt when infill mode is active.
Like !context
but embed a reddit post from its JSON representation
(append .json
to the URL and then download it). Includes all comments
with threading.
!reddit some-reddit-post.json
Please summarize this reddit post and its comments.
Like !reddit
but just the post with no comments.
Like !reddit
but insert a GitHub issue for inspection, and optionally
the issue comments. You can download these in the GitHub API.
https://api.github.com/repos/USER/REPO/issues/ID
https://api.github.com/repos/USER/REPO/issues/ID/comments
Combine it with !context
on GitHub's "patch" output to embed the entire
context of a pull request.
https://github.com/USER/REPO/pull/ID.patch
On response completion, inserts a !note
with timing statistics.
Dry run: "reply" with the raw HTTP request instead of querying the API. For inspecting the exact query parameters.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for illume
Similar Open Source Tools

illume
Illume is a scriptable command line program designed for interfacing with an OpenAI-compatible LLM API. It acts as a unix filter, sending standard input to the LLM and streaming its response to standard output. Users can interact with the LLM through text editors like Vim or Emacs, enabling seamless communication with the AI model for various tasks.

vectorflow
VectorFlow is an open source, high throughput, fault tolerant vector embedding pipeline. It provides a simple API endpoint for ingesting large volumes of raw data, processing, and storing or returning the vectors quickly and reliably. The tool supports text-based files like TXT, PDF, HTML, and DOCX, and can be run locally with Kubernetes in production. VectorFlow offers functionalities like embedding documents, running chunking schemas, custom chunking, and integrating with vector databases like Pinecone, Qdrant, and Weaviate. It enforces a standardized schema for uploading data to a vector store and supports features like raw embeddings webhook, chunk validation webhook, S3 endpoint, and telemetry. The tool can be used with the Python client and provides detailed instructions for running and testing the functionalities.

blinkid-ios
BlinkID iOS is a mobile SDK that enables developers to easily integrate ID scanning and data extraction capabilities into their iOS applications. The SDK supports scanning and processing various types of identity documents, such as passports, driver's licenses, and ID cards. It provides accurate and fast data extraction, including personal information and document details. With BlinkID iOS, developers can enhance their apps with secure and reliable ID verification functionality, improving user experience and streamlining identity verification processes.

opencommit
OpenCommit is a tool that auto-generates meaningful commits using AI, allowing users to quickly create commit messages for their staged changes. It provides a CLI interface for easy usage and supports customization of commit descriptions, emojis, and AI models. Users can configure local and global settings, switch between different AI providers, and set up Git hooks for integration with IDE Source Control. Additionally, OpenCommit can be used as a GitHub Action to automatically improve commit messages on push events, ensuring all commits are meaningful and not generic. Payments for OpenAI API requests are handled by the user, with the tool storing API keys locally.

chat-ollama
ChatOllama is an open-source chatbot based on LLMs (Large Language Models). It supports a wide range of language models, including Ollama served models, OpenAI, Azure OpenAI, and Anthropic. ChatOllama supports multiple types of chat, including free chat with LLMs and chat with LLMs based on a knowledge base. Key features of ChatOllama include Ollama models management, knowledge bases management, chat, and commercial LLMs API keys management.

seer
Seer is a service that provides AI capabilities to Sentry by running inference on Sentry issues and providing user insights. It is currently in early development and not yet compatible with self-hosted Sentry instances. The tool requires access to internal Sentry resources and is intended for internal Sentry employees. Users can set up the environment, download model artifacts, integrate with local Sentry, run evaluations for Autofix AI agent, and deploy to a sandbox staging environment. Development commands include applying database migrations, creating new migrations, running tests, and more. The tool also supports VCRs for recording and replaying HTTP requests.

qb
QANTA is a system and dataset for question answering tasks. It provides a script to download datasets, preprocesses questions, and matches them with Wikipedia pages. The system includes various datasets, training, dev, and test data in JSON and SQLite formats. Dependencies include Python 3.6, `click`, and NLTK models. Elastic Search 5.6 is needed for the Guesser component. Configuration is managed through environment variables and YAML files. QANTA supports multiple guesser implementations that can be enabled/disabled. Running QANTA involves using `cli.py` and Luigi pipelines. The system accesses raw Wikipedia dumps for data processing. The QANTA ID numbering scheme categorizes datasets based on events and competitions.

ai-town
AI Town is a virtual town where AI characters live, chat, and socialize. This project provides a deployable starter kit for building and customizing your own version of AI Town. It features a game engine, database, vector search, auth, text model, deployment, pixel art generation, background music generation, and local inference. You can customize your own simulation by creating characters and stories, updating spritesheets, changing the background, and modifying the background music.

gemini-cli
gemini-cli is a versatile command-line interface for Google's Gemini LLMs, written in Go. It includes tools for chatting with models, generating/comparing embeddings, and storing data in SQLite for analysis. Users can interact with Gemini models through various subcommands like prompt, chat, counttok, embed content, embed db, and embed similar.

feeds.fun
Feeds Fun is a self-hosted news reader tool that automatically assigns tags to news entries. Users can create rules to score news based on tags, filter and sort news as needed, and track read news. The tool offers multi/single-user support, feeds management, and various features for personalized news consumption. Users can access the tool's backend as the ffun package on PyPI and the frontend as the feeds-fun package on NPM. Feeds Fun requires setting up OpenAI or Gemini API keys for full tag generation capabilities. The tool uses tag processors to detect tags for news entries, with options for simple and complex processors. Feeds Fun primarily relies on LLM tag processors from OpenAI and Google for tag generation.

aicommit
aicommit is a small command line tool for generating commit messages that follow the repository's existing style. It helps users create commit messages with intention, context, and external references to aid understanding of code changes. The tool offers flags like `-c` for adding context and supports retrying and dry-running commit messages. Users can also provide context to the AI for better message generation and save API key to disk for convenience. aicommit reads a `COMMITS.md` file to determine the style guide, following it if available.

reader
Reader is a tool that converts any URL to an LLM-friendly input with a simple prefix `https://r.jina.ai/`. It improves the output for your agent and RAG systems at no cost. Reader supports image reading, captioning all images at the specified URL and adding `Image [idx]: [caption]` as an alt tag. This enables downstream LLMs to interact with the images in reasoning, summarizing, etc. Reader offers a streaming mode, useful when the standard mode provides an incomplete result. In streaming mode, Reader waits a bit longer until the page is fully rendered, providing more complete information. Reader also supports a JSON mode, which contains three fields: `url`, `title`, and `content`. Reader is backed by Jina AI and licensed under Apache-2.0.

curategpt
CurateGPT is a prototype web application and framework designed for general purpose AI-guided curation and curation-related operations over collections of objects. It provides functionalities for loading example data, building indexes, interacting with knowledge bases, and performing tasks such as chatting with a knowledge base, querying Pubmed, interacting with a GitHub issue tracker, term autocompletion, and all-by-all comparisons. The tool is built to work best with the OpenAI gpt-4 model and OpenAI ada-text-embedding-002 for embedding, but also supports alternative models through a plugin architecture.

ChatGPT-OpenAI-Smart-Speaker
ChatGPT Smart Speaker is a project that enables speech recognition and text-to-speech functionalities using OpenAI and Google Speech Recognition. It provides scripts for running on PC/Mac and Raspberry Pi, allowing users to interact with a smart speaker setup. The project includes detailed instructions for setting up the required hardware and software dependencies, along with customization options for the OpenAI model engine, language settings, and response randomness control. The Raspberry Pi setup involves utilizing the ReSpeaker hardware for voice feedback and light shows. The project aims to offer an advanced smart speaker experience with features like wake word detection and response generation using AI models.
For similar tasks

serverless-chat-langchainjs
This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.

ChatGPT-Telegram-Bot
ChatGPT Telegram Bot is a Telegram bot that provides a smooth AI experience. It supports both Azure OpenAI and native OpenAI, and offers real-time (streaming) response to AI, with a faster and smoother experience. The bot also has 15 preset bot identities that can be quickly switched, and supports custom bot identities to meet personalized needs. Additionally, it supports clearing the contents of the chat with a single click, and restarting the conversation at any time. The bot also supports native Telegram bot button support, making it easy and intuitive to implement required functions. User level division is also supported, with different levels enjoying different single session token numbers, context numbers, and session frequencies. The bot supports English and Chinese on UI, and is containerized for easy deployment.

supersonic
SuperSonic is a next-generation BI platform that integrates Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms. This integration ensures that Chat BI has access to the same curated and governed semantic data models as traditional BI. Furthermore, the implementation of both paradigms benefits from the integration: * Chat BI's Text2SQL gets augmented with context-retrieval from semantic models. * Headless BI's query interface gets extended with natural language API. SuperSonic provides a Chat BI interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build logical semantic models (definition of metric/dimension/tag, along with their meaning and relationships) through a Headless BI interface. Meanwhile, SuperSonic is designed to be extensible and composable, allowing custom implementations to be added and configured with Java SPI. The integration of Chat BI and Headless BI has the potential to enhance the Text2SQL generation in two dimensions: 1. Incorporate data semantics (such as business terms, column values, etc.) into the prompt, enabling LLM to better understand the semantics and reduce hallucination. 2. Offload the generation of advanced SQL syntax (such as join, formula, etc.) from LLM to the semantic layer to reduce complexity. With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development we decide to open source SuperSonic as an extensible framework.

chat-ollama
ChatOllama is an open-source chatbot based on LLMs (Large Language Models). It supports a wide range of language models, including Ollama served models, OpenAI, Azure OpenAI, and Anthropic. ChatOllama supports multiple types of chat, including free chat with LLMs and chat with LLMs based on a knowledge base. Key features of ChatOllama include Ollama models management, knowledge bases management, chat, and commercial LLMs API keys management.

ChatIDE
ChatIDE is an AI assistant that integrates with your IDE, allowing you to converse with OpenAI's ChatGPT or Anthropic's Claude within your development environment. It provides a seamless way to access AI-powered assistance while coding, enabling you to get real-time help, generate code snippets, debug errors, and brainstorm ideas without leaving your IDE.

azure-search-openai-javascript
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval.

xiaogpt
xiaogpt is a tool that allows you to play ChatGPT and other LLMs with Xiaomi AI Speaker. It supports ChatGPT, New Bing, ChatGLM, Gemini, Doubao, and Tongyi Qianwen. You can use it to ask questions, get answers, and have conversations with AI assistants. xiaogpt is easy to use and can be set up in a few minutes. It is a great way to experience the power of AI and have fun with your Xiaomi AI Speaker.

googlegpt
GoogleGPT is a browser extension that brings the power of ChatGPT to Google Search. With GoogleGPT, you can ask ChatGPT questions and get answers directly in your search results. You can also use GoogleGPT to generate text, translate languages, and more. GoogleGPT is compatible with all major browsers, including Chrome, Firefox, Edge, and Safari.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.