ControlLLM

ControlLLM

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Stars: 174

Visit
 screenshot

ControlLLM is a framework that empowers large language models to leverage multi-modal tools for solving complex real-world tasks. It addresses challenges like ambiguous user prompts, inaccurate tool selection, and inefficient tool scheduling by utilizing a task decomposer, a Thoughts-on-Graph paradigm, and an execution engine with a rich toolbox. The framework excels in tasks involving image, audio, and video processing, showcasing superior accuracy, efficiency, and versatility compared to existing methods.

README:

ControlLLM

ControlLLM: Augmenting Large Language Models with Tools by Searching on Graphs

[Paper] [Project Page] [Demo] [🤗 Space]

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a $\textit{task decomposer}$ that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a $\textit{Thoughts-on-Graph (ToG) paradigm}$ that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an $\textit{execution engine with a rich toolbox}$ that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.

🤖 Video Demo

https://github.com/OpenGVLab/ControlLLM/assets/13723743/cf72861e-0e7b-4c15-89ee-7fa1d838d00f

🏠 System Overview

arch

🎁 Major Features

  • Image Perception
  • Image Editing
  • Image Generation
  • Video Perception
  • Video Editing
  • Video Generation
  • Audio Perception
  • Audio Generation
  • Multi-Solution
  • Pointing Inputs
  • Resource Type Awareness

🗓️ Schedule

  • ✅ (🔥 New) Rlease online demo and 🤗Hugging Face space.
  • ✅ (🔥 New) Support PixArt-alpha, a state-of-the-art method for Text-to-Image synthesis.

🛠️Installation

Basic requirements

  • Linux
  • Python 3.10+
  • PyTorch 2.0+
  • CUDA 11.8+

Clone project

Execute the following command in the root directory:

git clone https://github.com/OpenGVLab/ControlLLM.git
cd controlllm

Install dependencies

Setup environment:

conda create -n cllm python=3.10

conda activate cllm

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

Install LLaVA:

pip install git+https://github.com/haotian-liu/LLaVA.git

Then install other dependencies:

pip install -r requirements.txt
pip install -e .

👨‍🏫 Get Started

Step 1: Launch tool services

Please put your personal OpenAI Key and Weather Key into the corresponding environment variables.

😬 Launch all in one endpoint:

# openai key
export OPENAI_API_KEY="..."
# openai base
export OPENAI_BASE_URL="..."
# weather api key
export WEATHER_API_KEY="..."
# resource dir
export SERVER_ROOT="./server_resources"

python -m cllm.services.launch --port 10056 --host 0.0.0.0

Tools as Services

Take image generation as an example, we first launch the service.

python -m cllm.services.image_generation.launch --port 10011 --host 0.0.0.0

Then, we can call the services via python api:

from cllm.services.image_generation.api import *
setup(port=10011)
text2image('A horse')

Step 2: Launch ToG service

export OPENAI_BASE_URL="..."
export OPENAI_API_KEY="..."
python -m cllm.services.tog.launch --port 10052 --host 0.0.0.0

Step 3: Launch gradio demo

Use openssl to generate the certificate:

mkdir certificate

openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes

Last, you can launch gradio demo in your server:

export TOG_PORT=10052
export CLLM_SERVICES_PORT=10056
export CLIENT_ROOT="./client_resources"

export GRADIO_TEMP_DIR="$HOME/.tmp"
export OPENAI_BASE_URL="..."
export OPENAI_API_KEY="..."

python -m cllm.app.gradio --controller "cllm.agents.tog.Controller" --server-port 10003 --https

Alternatively, you can set above variables in run.sh and launch all services by running:

bash ./run.sh

🎫 License

This project is released under the Apache 2.0 license.

🖊️ Citation

If you find this project useful in your research, please cite our paper:

@article{2023controlllm,
  title={ControlLLM: Augment Language Models with Tools by Searching on Graphs},
  author={Liu, Zhaoyang and Lai, Zeqiang and Gao, Zhangwei and Cui, Erfei and Li, Zhiheng and Zhu, Xizhou and Lu, Lewei and Chen, Qifeng and Qiao, Yu and Dai, Jifeng and Wang, Wenhai},
  journal={arXiv preprint arXiv:2305.10601},
  year={2023}
}

🤝 Acknowledgement


If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

image

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ControlLLM

Similar Open Source Tools

For similar tasks

For similar jobs