
llm_processes
None
Stars: 55

README:
This repository contains the code to reproduce the experiments carried out in LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language.
The code has been authored by: John Bronskill, James Requeima, and Dami Choi.
This code requires the following:
- python 3.9 or greater
- PyTorch 2.3.0 or greater
- transformers 4.41.0 or greater
- accelerate 0.30.1 or greater
- jsonargparse 4.28.0 or greater
- matplotlib 3.9.0 or greater
- optuna 3.6.1 or greater (only needed if you intend to run the black-box optimization experiments)
- gpytorch 1.14 or greater (only if you intend to run the Gaussian Process code)
We support a variety of LLMs through the Hugging Face transformer APIs. The code currently supports the following LLMs:
LLM Type | URL | GPU Memory Required (GB) |
---|---|---|
phi-3-mini-128k-instruct | https://huggingface.co/microsoft/Phi-3-mini-128k-instruct | 8 |
llama-2-7B | https://huggingface.co/meta-llama/Llama-2-7b | 24 |
llama-2-70B | https://huggingface.co/meta-llama/Llama-2-70b | 160 |
llama-3-8B | https://huggingface.co/meta-llama/Meta-Llama-3-8B | 24 |
llama-3-70B | https://huggingface.co/meta-llama/Meta-Llama-3-70B | 160 |
mixtral-8x7B | https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 | 160 |
mixtral-8x7B-instruct | https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 | 160 |
Adding a new LLM that supports the hugging face APIs is not difficult, just modify hf_api.py
.
- Clone or download this repository.
- Run
pip install .
to install thellm_processes
package and all dependencies.
Installing the llm_processes
package will automatically install the llm_process
command. You can
view its arguments by running llm_process --help
.
Use the command as:
llm_process --llm_type <LLM Type> [additional options]
Common options:
--experiment_name <value>
specifies a name that will be used to name any output or plot files,
default is test
.
--output_dir <directory where output files are written>
, default is ./output
.
--plot_dir <directory where output plot files are written>
, default is ./plots
.
--num_samples <number of samples to take at each target location>
, default is 50
.
--autoregressive <True/False>
, if True
, run A-LLMP, if False
, run I-LLMP, default is False
.
--batch_size <value>
controls how many samples for each target point are processed at once. A higher value will
result in faster execution, but will consume more GPU memory. Lower this number if you get out of memory errors.
Default is 5
.
The additional options are:
Data: --data_path <choose a file from the data/functions directory>
.
In the experiments we used sigmoid_10_seed_*.pkl
, square_20_seed_*.pkl
, and linear_cos_75_seed_*.pkl
,
where you would substitute a seed number for the *.
Prompt Format: --x_prefix <value>
, --y_prefix <value>
, and --break_str <value>
Prompt Order: --prompt_ordering <sequential/random/distance>
Prompt y-Scaling: --y_min <value>
and --y_max <value>
Top-p and Temperature: --top_p <value>
and --temperature <value>
Autoregressive: --autoregressive True
From the root directory of the repo, run:
python ./experiments/run_functions_exp.py --llm_type <LLM Type> --function <beat/exp/gaussian_wave/linear/linear_cos/log/sigmoid/sinc/sine/square/x_times_sine/xsin>
From the root directory of the repo, run:
python ./experiments/run_compare_exp.py --llm_type <LLM Type>
From the root directory of the repo, run:
python ./experiments/run_fashion_mnist_exp.py --llm_type <LLM Type>
From the root directory of the repo, run:
python ./experiments/run_black_box_opt_exp.py --llm_type <LLM Type> --experiment_name_prefix <see table> --function <see table> --max_generated_length <see table> --num_cold_start_points <see table>
function | experiment_name_prefix | max_generated_length | num_cold_start_points |
---|---|---|---|
Sinusoidal | Sinusoidal | 7 | 7 |
Gramacy | Gramacy | 8 | 12 |
Branin | Branin | 7 | 12 |
Bohachevsky | Bohachevsky | 11 | 12 |
Goldstein | Goldstein | 12 | 12 |
Hartmann3 | Hartmann3 | 7 | 15 |
From the root directory of the repo, run:
python run_llm_process.py --llm_type <LLM Type> --experiment_name weather_3 --data_path ./data/weather/weather_3.pkl --autoregressive True --num_decimal_places_y 1 --max_generated_length 20
From the root directory of the repo, run:
python ./experiments/run_in_context.py --llm_type <LLM Type>
From the root directory of the repo, run:
llm_process --llm_type <LLM Type> --data_path ./data/scenario/scenario_data_2_points.pkl --prefix <prompt to try> --autoregressive True --plot_trajectories 5 --forecast True
From the root directory of the repo, run:
python ./experiments/run_housing_exp.py --llm_type <LLM Type>
In the black-box optimization experiments, we use code from the benchfunk repository (Copyright (c) 2014, the benchfunk authors).
The datasets in the data/functions
directory are derived from the synthetic datasets in the LLMTime repository (Copyright (c) 2023 Nate Gruver, Marc Finzi, Shikai Qiu).
To ask questions or report issues, please open an issue on the issues tracker.
If you use this code, please cite our paper:
@inproceedings{requeima2024llm,
author = {Requeima, James and Bronskill, John and Choi, Dami and Turner, Richard E and Duvenaud, David},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
pages = {109609--109671},
publisher = {Curran Associates, Inc.},
title = {LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language},
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/c5ec22711f3a4a2f4a0a8ffd92167190-Paper-Conference.pdf},
volume = {37},
year = {2024}
}
We have recently extended LLM Processes to tabular data in our paper JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm_processes
Similar Open Source Tools

TPI-LLM
TPI-LLM (Tensor Parallelism Inference for Large Language Models) is a system designed to bring LLM functions to low-resource edge devices, addressing privacy concerns by enabling LLM inference on edge devices with limited resources. It leverages multiple edge devices for inference through tensor parallelism and a sliding window memory scheduler to minimize memory usage. TPI-LLM demonstrates significant improvements in TTFT and token latency compared to other models, and plans to support infinitely large models with low token latency in the future.

ovos-installer
The ovos-installer is a simple and multilingual tool designed to install Open Voice OS and HiveMind using Bash, Whiptail, and Ansible. It supports various Linux distributions and provides an automated installation process. Users can easily start and stop services, update their Open Voice OS instance, and uninstall the tool if needed. The installer also allows for non-interactive installation through scenario files. It offers a user-friendly way to set up Open Voice OS on different systems.

graphrag-visualizer
GraphRAG Visualizer is an application designed to visualize Microsoft GraphRAG artifacts by uploading parquet files generated from the GraphRAG indexing pipeline. Users can view and analyze data in 2D or 3D graphs, display data tables, search for specific nodes or relationships, and process artifacts locally for data security and privacy.

rwkv.cpp
rwkv.cpp is a port of BlinkDL/RWKV-LM to ggerganov/ggml, supporting FP32, FP16, and quantized INT4, INT5, and INT8 inference. It focuses on CPU but also supports cuBLAS. The project provides a C library rwkv.h and a Python wrapper. RWKV is a large language model architecture with models like RWKV v5 and v6. It requires only state from the previous step for calculations, making it CPU-friendly on large context lengths. Users are advised to test all available formats for perplexity and latency on a representative dataset before serious use.

gollama
Gollama is a delightful tool that brings Ollama, your offline conversational AI companion, directly into your terminal. It provides a fun and interactive way to generate responses from various models without needing internet connectivity. Whether you're brainstorming ideas, exploring creative writing, or just looking for inspiration, Gollama is here to assist you. The tool offers an interactive interface, customizable prompts, multiple models selection, and visual feedback to enhance user experience. It can be installed via different methods like downloading the latest release, using Go, running with Docker, or building from source. Users can interact with Gollama through various options like specifying a custom base URL, prompt, model, and enabling raw output mode. The tool supports different modes like interactive, piped, CLI with image, and TUI with image. Gollama relies on third-party packages like bubbletea, glamour, huh, and lipgloss. The roadmap includes implementing piped mode, support for extracting codeblocks, copying responses/codeblocks to clipboard, GitHub Actions for automated releases, and downloading models directly from Ollama using the rest API. Contributions are welcome, and the project is licensed under the MIT License.

local-deep-research
Local Deep Research is a powerful AI-powered research assistant that performs deep, iterative analysis using multiple LLMs and web searches. It can be run locally for privacy or configured to use cloud-based LLMs for enhanced capabilities. The tool offers advanced research capabilities, flexible LLM support, rich output options, privacy-focused operation, enhanced search integration, and academic & scientific integration. It also provides a web interface, command line interface, and supports multiple LLM providers and search engines. Users can configure AI models, search engines, and research parameters for customized research experiences.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks like Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, LMFormatEnforcer, etc on tasks like multi-label classification, named entity recognition, synthetic data generation. The tool provides benchmark results, methodology, instructions to run the benchmark, add new data, and add a new framework. It also includes a roadmap for framework-related tasks, contribution guidelines, citation information, and feedback request.

dvc
DVC, or Data Version Control, is a command-line tool and VS Code extension that helps you develop reproducible machine learning projects. With DVC, you can version your data and models, iterate fast with lightweight pipelines, track experiments in your local Git repo, compare any data, code, parameters, model, or performance plots, and share experiments and automatically reproduce anyone's experiment.

optillm
optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.

mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.

aiosmb
aiosmb is a fully asynchronous SMB library written in pure Python, supporting Python 3.7 and above. It offers various authentication methods such as Kerberos, NTLM, SSPI, and NEGOEX. The library supports connections over TCP and QUIC protocols, with proxy support for SOCKS4 and SOCKS5. Users can specify an SMB connection using a URL format, making it easier to authenticate and connect to SMB hosts. The project aims to implement DCERPC features, VSS mountpoint operations, and other enhancements in the future. It is inspired by Impacket and AzureADJoinedMachinePTC projects.

distributed-llama
Distributed Llama is a tool that allows you to run large language models (LLMs) on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage. It uses TCP sockets to synchronize the state of the neural network, and you can easily configure your AI cluster by using a home router. Distributed Llama supports models such as Llama 2 (7B, 13B, 70B) chat and non-chat versions, Llama 3, and Grok-1 (314B).

factorio-learning-environment
Factorio Learning Environment is an open source framework designed for developing and evaluating LLM agents in the game of Factorio. It provides two settings: Lab-play with structured tasks and Open-play for building large factories. Results show limitations in spatial reasoning and automation strategies. Agents interact with the environment through code synthesis, observation, action, and feedback. Tools are provided for game actions and state representation. Agents operate in episodes with observation, planning, and action execution. Tasks specify agent goals and are implemented in JSON files. The project structure includes directories for agents, environment, cluster, data, docs, eval, and more. A database is used for checkpointing agent steps. Benchmarks show performance metrics for different configurations.