illume

scriptable command line program for LLM interfacing

Stars: 78

Visit

Illume is a scriptable command line program designed for interfacing with an OpenAI-compatible LLM API. It acts as a unix filter, sending standard input to the LLM and streaming its response to standard output. Users can interact with the LLM through text editors like Vim or Emacs, enabling seamless communication with the AI model for various tasks.

README:

Illume: scriptable command line program for LLM interfacing

A unix filter for talking to an OpenAI-compatible LLM API. Sends standard input to the LLM and streams its response to standard output. In a text editor like Vim, send your buffer into the program via standard input and append its output to your buffer.

How to build

With Go 1.10 or later:

$ go build illume.go

Then place illume on your $PATH.

How to use

A couple of examples running outside of a text editor:

$ illume <request.md >response.md
$ illume <chat.md | tee -a chat.md

illume.vim has a Vim configuration for interacting with live output:

Illume(): complete the end the buffer (chat, !completion)
IllumeInfill(): generate code at the cursor
IllumeStop(): stop generation in this buffer

illume.el is similar for Emacs: M-x illume and M-x illume-stop.

Example usage

Use !context to select files to upload as context. These are uploaded in full, mind the token limit and narrow the context as needed by pointing to subdirectories or temporarily deleting files. Put !user on its own line, then your question:

!context /path/to/repository .py .sql

!user

Do you suggest any additional indexes?

Sending this to illume retrieves a reply:

!context /path/to/repository .py .sql

!user

Do you suggest any additional indexes?

!assistant

Yes, your XYZ table...

Add your response with another !user:

!context /path/to/repository .py .sql

!user

Do you suggest any additional indexes?

!assistant

Yes, your XYZ table...

!user

But what about ...?

Rinse and repeat. The text file is the entire state of the conversation.

Completion mode

Alternatively the LLM can continue from text of your input using the !complete directive.

!completion
The meaning of life is

Do not use !user nor !assistant in this mode, but the other options still work.

Infill mode

If the input contains !infill by itself, Illume operates in infill mode. Output is to be inserted in place of !infill, i.e. code generation. By default it will use the llama.cpp /infill endpoint, which requires a FIM-trained model with metadata declaring its FIM tokens. This excludes most models, including most "coder" models due to missing metadata. There are currently no standards and few conventions around FIM, and every model implements it differently.

Given an argument, it is memorized as the template, replacing {prefix} and {suffix} with the surrounding input. For example, including a leading space in the template:

!infill  <PRE> {prefix} <SUF>{suffix} <MID>

Write this template according to the model's FIM documentation. Illume includes built-in fim:MODEL templates for several popular models. This form of !infill only configures, and does not activate infill mode on its own. Put it in a profile.

For example, to generate FIM completions on a remote DeepSeek model running on llama.cpp, your Illume profile file might be something like:

!profile llama.cpp
!profile fim:deepseek
!api http://myllama:8080/

With illume.vim, do not type a no-argument !infill directive yourself. The configuration automatically inserts it into Illume's input at the cursor position.

Recommendation: DeepSeek produces the best FIM output, followed by Qwen and Granite. All three work out-of-the-box with llama.cpp /infill, but work best with an Illume FIM profile.

Profiles

$ILLUME_PROFILE sets the default profile. The default profile is like an implicit !profile when none is specified. A profile sets the URL, extra keys, HTTP headers, or even a system prompt. Illume supplies many built-in profiles: see Profiles in the source. If the profile name contains a slash, the profile is read from that file. Otherwise it's matched against a built-in profile, or a file with a .profile suffix next to the Illume executable.

Directives

An !error "directive" appears in error output, but it's not processed on input. Everything before !user and !assistant are in the "system" role, which is where you can write a system prompt.

`!profile NAME`

Load a profile. JSON !:KEY directives in the profile do not override user-set keys. If no !profile is given, Illume loads $ILLUME_PROFILE if set, otherwise it loads the default profile.

`!api URL`

Sets the API base URL. When not llama.cpp, it typically ends with /v1 or /v2. Illume interpolates {…} in the URL from !:KEY directives. It's done just before making the request, and so may reference keys set after the !api directive. Examples:

!api https://api-inference.huggingface.co/models/{model}/v1
!:model mistralai/Mistral-Nemo-Instruct-2407

If the URL is wrapped in quotes, it will be used literally as provided without modification.

`!context FILE`

Insert a file at this position in the conversation.

`!context DIR [SUFFIX...]`

Include all files under DIR with matching file name suffixes. Only relative names are sent, but the last element of DIR is included in this relative path if it does not end with a slash. Files can be included in any role, not just the system prompt.

`!user`

Marks the following lines as belonging to a user message. You can modify these to trick the LLM into thinking you said something different in the past.

`!assistant`

Marks the following lines as belonging to an assistant message. You can modify these to trick the LLM into thinking it said something different.

`!note ...`

These lines are not sent to the LLM. Used to annotate conversations.

`!begin`

Discard all messages before this line. Used to "comment out" headers in the input, e.g. when composing email. Directives before this line are still effective.

`!end`

Stop processing directives and ignore the rest of the input.

`!:KEY VALUE`

Insert an arbitrary JSON value into the query object. Examples:

!:temperature 0.3
!:model mistralai/Mistral-Nemo-Instruct-2407
!:stop ["<|im_end|>"]

If VALUE is missing, the key is deleted instead. If it cannot be parsed as JSON, it's passed through as a string. If it looks like JSON but should be sent as string data, wrap it in quotes to turn it into a JSON string.

`!>HEADER VALUE`

Insert an arbitrary HTTP header into the request. Examples:

!>x-use-cache false
!>user-agent My LLM Client 1.0
!>authorization

If VALUE is missing, the header is deleted. This is, for instance, a second for disabling the API token, as shown in the example. If the value contains $VAR then Illume will expand it from the environment.

`!completion`

Use completion mode instead of conversational. The LLM will continue writing from the end of the document. Cannot be used with !user or !assistant, which are for the (default) chat mode.

`!infill [TEMPLATE]`

With no template, activate infill mode, and generate code to be inserted at this position. Given a template, use that template to generate the prompt when infill mode is active.

`!reddit FILE`

Like !context but embed a reddit post from its JSON representation (append .json to the URL and then download it). Includes all comments with threading.

!reddit some-reddit-post.json
Please summarize this reddit post and its comments.

`!reddit! FILE`

Like !reddit but just the post with no comments.

`!github issue.json [comments.json]`

Like !reddit but insert a GitHub issue for inspection, and optionally the issue comments. You can download these in the GitHub API.

https://api.github.com/repos/USER/REPO/issues/ID
https://api.github.com/repos/USER/REPO/issues/ID/comments

Combine it with !context on GitHub's "patch" output to embed the entire context of a pull request.

https://github.com/USER/REPO/pull/ID.patch

`!stats`

On response completion, inserts a !note with timing statistics.

`!debug`

Dry run: "reply" with the raw HTTP request instead of querying the API. For inspecting the exact query parameters.

For Tasks:

Click tags to check more tools for each tasks

generate code complete text interact with ai ask questions contextualize information

For Jobs:

software developer data scientist ai engineer research scientist technical writer

Alternative AI tools for illume

Similar Open Source Tools

illume

github

: 78

vectorflow

VectorFlow is an open source, high throughput, fault tolerant vector embedding pipeline. It provides a simple API endpoint for ingesting large volumes of raw data, processing, and storing or returning the vectors quickly and reliably. The tool supports text-based files like TXT, PDF, HTML, and DOCX, and can be run locally with Kubernetes in production. VectorFlow offers functionalities like embedding documents, running chunking schemas, custom chunking, and integrating with vector databases like Pinecone, Qdrant, and Weaviate. It enforces a standardized schema for uploading data to a vector store and supports features like raw embeddings webhook, chunk validation webhook, S3 endpoint, and telemetry. The tool can be used with the Python client and provides detailed instructions for running and testing the functionalities.

github

: 639

opencommit

OpenCommit is a tool that auto-generates meaningful commits using AI, allowing users to quickly create commit messages for their staged changes. It provides a CLI interface for easy usage and supports customization of commit descriptions, emojis, and AI models. Users can configure local and global settings, switch between different AI providers, and set up Git hooks for integration with IDE Source Control. Additionally, OpenCommit can be used as a GitHub Action to automatically improve commit messages on push events, ensuring all commits are meaningful and not generic. Payments for OpenAI API requests are handled by the user, with the tool storing API keys locally.

github

: 5.9k

chat-ollama

ChatOllama is an open-source chatbot based on LLMs (Large Language Models). It supports a wide range of language models, including Ollama served models, OpenAI, Azure OpenAI, and Anthropic. ChatOllama supports multiple types of chat, including free chat with LLMs and chat with LLMs based on a knowledge base. Key features of ChatOllama include Ollama models management, knowledge bases management, chat, and commercial LLMs API keys management.

github

: 2.8k

seer

Seer is a service that provides AI capabilities to Sentry by running inference on Sentry issues and providing user insights. It is currently in early development and not yet compatible with self-hosted Sentry instances. The tool requires access to internal Sentry resources and is intended for internal Sentry employees. Users can set up the environment, download model artifacts, integrate with local Sentry, run evaluations for Autofix AI agent, and deploy to a sandbox staging environment. Development commands include applying database migrations, creating new migrations, running tests, and more. The tool also supports VCRs for recording and replaying HTTP requests.

github

: 87

qb

QANTA is a system and dataset for question answering tasks. It provides a script to download datasets, preprocesses questions, and matches them with Wikipedia pages. The system includes various datasets, training, dev, and test data in JSON and SQLite formats. Dependencies include Python 3.6, `click`, and NLTK models. Elastic Search 5.6 is needed for the Guesser component. Configuration is managed through environment variables and YAML files. QANTA supports multiple guesser implementations that can be enabled/disabled. Running QANTA involves using `cli.py` and Luigi pipelines. The system accesses raw Wikipedia dumps for data processing. The QANTA ID numbering scheme categorizes datasets based on events and competitions.

github

: 167

ai-town

AI Town is a virtual town where AI characters live, chat, and socialize. This project provides a deployable starter kit for building and customizing your own version of AI Town. It features a game engine, database, vector search, auth, text model, deployment, pixel art generation, background music generation, and local inference. You can customize your own simulation by creating characters and stories, updating spritesheets, changing the background, and modifying the background music.

github

: 6.3k

gemini-cli

gemini-cli is a versatile command-line interface for Google's Gemini LLMs, written in Go. It includes tools for chatting with models, generating/comparing embeddings, and storing data in SQLite for analysis. Users can interact with Gemini models through various subcommands like prompt, chat, counttok, embed content, embed db, and embed similar.

github

: 95

feeds.fun

Feeds Fun is a self-hosted news reader tool that automatically assigns tags to news entries. Users can create rules to score news based on tags, filter and sort news as needed, and track read news. The tool offers multi/single-user support, feeds management, and various features for personalized news consumption. Users can access the tool's backend as the ffun package on PyPI and the frontend as the feeds-fun package on NPM. Feeds Fun requires setting up OpenAI or Gemini API keys for full tag generation capabilities. The tool uses tag processors to detect tags for news entries, with options for simple and complex processors. Feeds Fun primarily relies on LLM tag processors from OpenAI and Google for tag generation.

github

: 68

aicommit

aicommit is a small command line tool for generating commit messages that follow the repository's existing style. It helps users create commit messages with intention, context, and external references to aid understanding of code changes. The tool offers flags like `-c` for adding context and supports retrying and dry-running commit messages. Users can also provide context to the AI for better message generation and save API key to disk for convenience. aicommit reads a `COMMITS.md` file to determine the style guide, following it if available.

github

: 81

Airstream

github

: 239

curategpt

CurateGPT is a prototype web application and framework designed for general purpose AI-guided curation and curation-related operations over collections of objects. It provides functionalities for loading example data, building indexes, interacting with knowledge bases, and performing tasks such as chatting with a knowledge base, querying Pubmed, interacting with a GitHub issue tracker, term autocompletion, and all-by-all comparisons. The tool is built to work best with the OpenAI gpt-4 model and OpenAI ada-text-embedding-002 for embedding, but also supports alternative models through a plugin architecture.

github

: 81

ChatGPT-OpenAI-Smart-Speaker

ChatGPT Smart Speaker is a project that enables speech recognition and text-to-speech functionalities using OpenAI and Google Speech Recognition. It provides scripts for running on PC/Mac and Raspberry Pi, allowing users to interact with a smart speaker setup. The project includes detailed instructions for setting up the required hardware and software dependencies, along with customization options for the OpenAI model engine, language settings, and response randomness control. The Raspberry Pi setup involves utilizing the ReSpeaker hardware for voice feedback and light shows. The project aims to offer an advanced smart speaker experience with features like wake word detection and response generation using AI models.

github

: 188

gpt-subtrans

GPT-Subtrans is an open-source subtitle translator that utilizes large language models (LLMs) as translation services. It supports translation between any language pairs that the language model supports. Note that GPT-Subtrans requires an active internet connection, as subtitles are sent to the provider's servers for translation, and their privacy policy applies.

github

: 418

Open-LLM-VTuber

Open-LLM-VTuber is a project in early stages of development that allows users to interact with Large Language Models (LLM) using voice commands and receive responses through a Live2D talking face. The project aims to provide a minimum viable prototype for offline use on macOS, Linux, and Windows, with features like long-term memory using MemGPT, customizable LLM backends, speech recognition, and text-to-speech providers. Users can configure the project to chat with LLMs, choose different backend services, and utilize Live2D models for visual representation. The project supports perpetual chat, offline operation, and GPU acceleration on macOS, addressing limitations of existing solutions on macOS.

github

: 1.9k

warc-gpt

WARC-GPT is an experimental retrieval augmented generation pipeline for web archive collections. It allows users to interact with WARC files, extract text, generate text embeddings, visualize embeddings, and interact with a web UI and API. The tool is highly customizable, supporting various LLMs, providers, and embedding models. Users can configure the application using environment variables, ingest WARC files, start the server, and interact with the web UI and API to search for content and generate text completions. WARC-GPT is designed for exploration and experimentation in exploring web archives using AI.

github

: 219

For similar tasks

serverless-chat-langchainjs

This sample shows how to build a serverless chat experience with Retrieval-Augmented Generation using LangChain.js and Azure. The application is hosted on Azure Static Web Apps and Azure Functions, with Azure Cosmos DB for MongoDB vCore as the vector database. You can use it as a starting point for building more complex AI applications.

github

: 771

ChatGPT-Telegram-Bot

ChatGPT Telegram Bot is a Telegram bot that provides a smooth AI experience. It supports both Azure OpenAI and native OpenAI, and offers real-time (streaming) response to AI, with a faster and smoother experience. The bot also has 15 preset bot identities that can be quickly switched, and supports custom bot identities to meet personalized needs. Additionally, it supports clearing the contents of the chat with a single click, and restarting the conversation at any time. The bot also supports native Telegram bot button support, making it easy and intuitive to implement required functions. User level division is also supported, with different levels enjoying different single session token numbers, context numbers, and session frequencies. The bot supports English and Chinese on UI, and is containerized for easy deployment.

github

: 476

supersonic

SuperSonic is a next-generation BI platform that integrates Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms. This integration ensures that Chat BI has access to the same curated and governed semantic data models as traditional BI. Furthermore, the implementation of both paradigms benefits from the integration: * Chat BI's Text2SQL gets augmented with context-retrieval from semantic models. * Headless BI's query interface gets extended with natural language API. SuperSonic provides a Chat BI interface that empowers users to query data using natural language and visualize the results with suitable charts. To enable such experience, the only thing necessary is to build logical semantic models (definition of metric/dimension/tag, along with their meaning and relationships) through a Headless BI interface. Meanwhile, SuperSonic is designed to be extensible and composable, allowing custom implementations to be added and configured with Java SPI. The integration of Chat BI and Headless BI has the potential to enhance the Text2SQL generation in two dimensions: 1. Incorporate data semantics (such as business terms, column values, etc.) into the prompt, enabling LLM to better understand the semantics and reduce hallucination. 2. Offload the generation of advanced SQL syntax (such as join, formula, etc.) from LLM to the semantic layer to reduce complexity. With these ideas in mind, we develop SuperSonic as a practical reference implementation and use it to power our real-world products. Additionally, to facilitate further development we decide to open source SuperSonic as an extensible framework.

github

: 3.4k

chat-ollama

github

: 2.8k

ChatIDE

ChatIDE is an AI assistant that integrates with your IDE, allowing you to converse with OpenAI's ChatGPT or Anthropic's Claude within your development environment. It provides a seamless way to access AI-powered assistance while coding, enabling you to get real-time help, generate code snippets, debug errors, and brainstorm ideas without leaving your IDE.

github

: 214

azure-search-openai-javascript

This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval.

github

: 277

xiaogpt

xiaogpt is a tool that allows you to play ChatGPT and other LLMs with Xiaomi AI Speaker. It supports ChatGPT, New Bing, ChatGLM, Gemini, Doubao, and Tongyi Qianwen. You can use it to ask questions, get answers, and have conversations with AI assistants. xiaogpt is easy to use and can be set up in a few minutes. It is a great way to experience the power of AI and have fun with your Xiaomi AI Speaker.

github

: 6.5k

googlegpt

GoogleGPT is a browser extension that brings the power of ChatGPT to Google Search. With GoogleGPT, you can ask ChatGPT questions and get answers directly in your search results. You can also use GoogleGPT to generate text, translate languages, and more. GoogleGPT is compatible with all major browsers, including Chrome, Firefox, Edge, and Safari.

github

: 163

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k