LiveBench
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Stars: 406
LiveBench is a benchmark tool designed for Language Model Models (LLMs) with a focus on limiting contamination through monthly new questions based on recent datasets, arXiv papers, news articles, and IMDb movie synopses. It provides verifiable, objective ground-truth answers for accurate scoring without an LLM judge. The tool offers 18 diverse tasks across 6 categories and promises to release more challenging tasks over time. LiveBench is built on FastChat's llm_judge module and incorporates code from LiveCodeBench and IFEval.
README:
🏆 Leaderboard • 💻 Data • 📝 Paper
Top models as of 30th September 2024 (see the full leaderboard here):
Please see the changelog for details about each LiveBench release.
- Introduction
- Installation Quickstart
- Usage
- Data
- Adding New Questions
- Adding New Models
- Documentation
- Citation
Introducing LiveBench: a benchmark for LLMs designed with test set contamination and objective evaluation in mind.
LiveBench has the following properties:
- LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses.
- Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge.
- LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.
We will evaluate your model! Open an issue or email us at [email protected]!
Tested on Python 3.10.
We recommend using a virtual environment to install LiveBench.
python -m venv .venv
source .venv/bin/activate
To generate answers with API models (i.e. with gen_api_answer.py
), conduct judgments, and show results:
cd LiveBench
pip install -e .
To do all of the above and also generate answers with local GPU inference on open source models (i.e. with gen_model_answer.py):
cd LiveBench
pip install -e .[flash_attn]
Note about fschat: The fschat package version on pip (i.e., lmsys/fastchat) is currently out of date, so we strongly recommend pip uninstall fschat
before running the above, since it will then automatically install a more recent commit of fastchat.
Our repo is adapted from FastChat's excellent llm_judge module, and it also contains code from LiveCodeBench and IFEval.
cd livebench
The simplest way to run LiveBench inference and scoring is by using our provided Bash scripts. These scripts automate the process of generating and scoring model responses, and can automatically parallelize runs of different tasks or categories to speed up execution for models with higher rate limits.
To evaluate a single subset of LiveBench for a single model, do:
./scripts/run_livebench <bench-name> <model> <question-source>
e.g. ./scripts/run_livebench live_bench/coding gpt-4o-mini
will evaluate gpt-4o-mini on all the coding tasks. <question-source>
is optional and defaults to huggingface
.
If you'd like to run multiple LiveBench subsets in sequence, use
./scripts/run_livebench_sequential <model> <venv-path> <question-source>
where <venv-path>
is a relative path to your venv/bin/activate
script. The list of benchmarks to be evaluated can be viewed inside the script.
For a local-weight model, use
./scripts/run_livebench_sequential_local_model <model-path> <model-id> <venv-path> <question-source>
For API-based models with high rate limits, evaluation of LiveBench can be sped up by evaluating different tasks in parallel. To do this automatically, run
./scripts/run_livebench_parallel <model> <venv-path> <question-source>
The set of categories or tasks to be evaluated is editable in ./scripts/run_livebench_parallel
. This script will spawn a tmux session, with each LiveBench process in a separate pane, so progress on all can be viewed at once. This setup will also persist on a remote server (i.e. through SSH) so that connection interrupts will not cancel the processes.
If you'd like to start evaluation of multiple models at once, run
./scripts/run_livebench_parallel_models <venv-path> <question-source>
You can edit the list of models to be evaluated in the script file. This script runs run_livebench_parallel
once for each model.
Note: After the evaluation has completed, you will need to run show_livebench_result.py
manually to view the leaderboard.
If you'd like, you can manually execute the Python scripts used to evaluate LiveBench.
In all scripts, the --bench-name
argument is used to specify the subset of questions to use.
Setting --bench-name
to live_bench
will use all questions.
Setting --bench-name
to live_bench/category
will use all questions in that category.
Setting --bench-name
to live_bench/category/task
will use all questions in that task.
The --question-source
argument is used to specify the source of questions; by default, it is set to huggingface
, which uses the questions available on Huggingface. See below for instructions on how to use your own questions.
The --livebench-release-option
argument is used to specify the version of livebench to use. By default, it is set to the latest version. Available options are 2024-07-26
, 2024-06-24
, 2024-08-31
, and 2024-11-25
.
Make sure you have the appropriate API keys set as environment variables (e.g. export OPENAI_API_KEY=<your_key>
). If using a virtual environment, you can add the environment variable export to the .venv/bin/activate
file.
The gen_api_answer.py
script is used to generate answers for API-based models. It can be run using the following command:
python gen_api_answer.py --bench-name <bench-name> --model <model-name> --question-source <question-source> --livebench-release-option <livebench-release-option>
Only the --model
argument is required. For example, to run coding tasks for gpt-4o-mini, run:
python gen_api_answer.py --bench-name live_bench/coding --model gpt-4o-mini
If your model uses an OpenAI API endpoint, you can specify the endpoint using the --api-base
argument. For example, to evaluate gpt-4o-mini using a VLLM endpoint, run:
python gen_api_answer.py --model gpt-4o-mini --api-base http://localhost:8000/v1
In this case, if an API key is needed, you should set the LIVEBENCH_API_KEY
environment variable.
To generate answers with local GPU inference on open source models, use the gen_model_answer.py
script:
python gen_model_answer.py --model-path <path-to-model> --model-id <model-id> --bench-name <bench-name>
<path-to-model>
should be either a path to a local model weight folder or a HuggingFace repo ID. <model-id>
will be the name of the model on the leaderboard and the identifier used for other scripts.
Other arguments are optional, but you may want to set --num-gpus-per-model
and --num-gpus-total
to match the number of GPUs you have available. You may also want to set --dtype
to match the dtype of your model weights.
Run python gen_model_answer.py --help
for more details.
To score the outputs of your model, run the gen_ground_truth_judgment.py
script:
python gen_ground_truth_judgment.py --bench-name <bench-name> --model-list <model-list>
<model-list>
is a space-separated list of model IDs to score. For example, to score gpt-4o-mini and claude-3-5-sonnet, run:
python gen_ground_truth_judgment.py --bench-name live_bench --model-list gpt-4o-mini claude-3-5-sonnet
If no --model-list
argument is provided, all models will be scored.
Setting --debug
will print debug information for individual questions. This can be useful for debugging new tasks.
To show the results of your model, run the show_livebench_result.py
script:
python show_livebench_result.py --bench-name <bench-name> --model-list <model-list>
<model-list>
is a space-separated list of model IDs to show. For example, to show the results of gpt-4o-mini and claude-3-5-sonnet, run:
python show_livebench_result.py --bench-name live_bench --model-list gpt-4o-mini claude-3-5-sonnet
If no --model-list
argument is provided, all models will be shown.
The leaderboard will be displayed in the terminal. You can also find the breakdown by category in all_groups.csv
and by task in all_tasks.csv
.
The scripts/error_check
script will print out questions for which a model's output is $ERROR$
, which indicates repeated API call failures.
You can use the scripts/rerun_failed_questions.py
script to rerun the failed questions.
If after multiple attempts, the model's output is still $ERROR$
, it's likely that the question is triggering some content filter from the model's provider (Gemini models are particularly prone to this). In this case, there is not much that can be done.
The questions for each of the categories can be found below:
Also available are the model answers and the model judgments.
To download the question.jsonl
files (for inspection) and answer/judgment files from the leaderboard, use
python download_questions.py
python download_leaderboard.py
Questions will be downloaded to livebench/data/<category>/question.jsonl
.
If you want to create your own set of questions, or try out different prompts, etc, follow these steps:
- Create a
question.jsonl
file with the following path (or, runpython download_questions.py
and update the downloaded file):livebench/data/live_bench/<category>/<task>/question.jsonl
. For example,livebench/data/reasoning/web_of_lies_new_prompt/question.jsonl
. Here is an example of the format forquestion.jsonl
(it's the first few questions from web_of_lies_v2):
{"question_id": "0daa7ca38beec4441b9d5c04d0b98912322926f0a3ac28a5097889d4ed83506f", "category": "reasoning", "ground_truth": "no, yes, yes", "turns": ["In this question, assume each person either always tells the truth or always lies. Tala is at the movie theater. The person at the restaurant says the person at the aquarium lies. Ayaan is at the aquarium. Ryan is at the botanical garden. The person at the park says the person at the art gallery lies. The person at the museum tells the truth. Zara is at the museum. Jake is at the art gallery. The person at the art gallery says the person at the theater lies. Beatriz is at the park. The person at the movie theater says the person at the train station lies. Nadia is at the campground. The person at the campground says the person at the art gallery tells the truth. The person at the theater lies. The person at the amusement park says the person at the aquarium tells the truth. Grace is at the restaurant. The person at the aquarium thinks their friend is lying. Nia is at the theater. Kehinde is at the train station. The person at the theater thinks their friend is lying. The person at the botanical garden says the person at the train station tells the truth. The person at the aquarium says the person at the campground tells the truth. The person at the aquarium saw a firetruck. The person at the train station says the person at the amusement park lies. Mateo is at the amusement park. Does the person at the train station tell the truth? Does the person at the amusement park tell the truth? Does the person at the aquarium tell the truth? Think step by step, and then put your answer in **bold** as a list of three words, yes or no (for example, **yes, no, yes**). If you don't know, guess."], "task": "web_of_lies_v2"}
-
If adding a new task, create a new scoring method in the
process_results
folder. If it is similar to an existing task, you can copy that task's scoring function. For example,livebench/process_results/reasoning/web_of_lies_new_prompt/utils.py
can be a copy of theweb_of_lies_v2
scoring method. -
Add the scoring function to
gen_ground_truth_judgment.py
here. -
Run and score models using
--question-source jsonl
and specifying your task. For example:
python gen_api_answer.py --bench-name live_bench/reasoning/web_of_lies_new_prompt --model claude-3-5-sonnet-20240620 --question-source jsonl
python gen_ground_truth_judgment.py --bench-name live_bench/reasoning/web_of_lies_new_prompt --question-source jsonl
python show_livebench_result.py --bench-name live_bench/reasoning/web_of_lies_new_prompt
As discussed above, local model models can be evaluated with gen_model_answer.py
.
API-based models with an OpenAI-compatible API can be evaluated with gen_api_answer.py
by setting the --api-base
argument.
For other models, it will be necessary to update several files depending on the model.
Models for which there is already an API implementation in LiveBench (e.g. OpenAI, Anthropic, Mistral, Google, Amazon, etc.) can be added simply by adding a new entry in api_models.py
, using the appropriate Model
subclass (e.g. OpenAIModel
, AnthropicModel
, MistralModel
, GoogleModel
, AmazonModel
, etc.).
For other models:
- Implement a new completion function in
model/completions.py
. This function should take aModel
,Conversation
,temperature
,max_tokens
, andkwargs
as arguments, and return a tuple of(response, tokens_consumed)
after calling the model's API. - If necessary, implement a new
ModelAdapter
inmodel/model_adapter.py
. This class should implement theBaseModelAdapter
interface. For many models, existing adapters (such asChatGPTAdapter
) will work. - Add a new
Model
entry inmodel/api_models.py
. This will have the formModel(api_name=<api_name>, display_name=<display_name>, aliases=[], adapter=<model_adapter>, api_function=<api_function>)
. Make sure to add the new model to theALL_MODELS
list.
You should now be able to evaluate the model with gen_api_answer.py
or other scripts as normal.
Here, we describe our dataset documentation. This information is also available in our paper.
@article{livebench,
author = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
title = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
url = {arXiv preprint arXiv:2406.19314},
year = {2024},
}
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for LiveBench
Similar Open Source Tools
LiveBench
LiveBench is a benchmark tool designed for Language Model Models (LLMs) with a focus on limiting contamination through monthly new questions based on recent datasets, arXiv papers, news articles, and IMDb movie synopses. It provides verifiable, objective ground-truth answers for accurate scoring without an LLM judge. The tool offers 18 diverse tasks across 6 categories and promises to release more challenging tasks over time. LiveBench is built on FastChat's llm_judge module and incorporates code from LiveCodeBench and IFEval.
ai-models
The `ai-models` command is a tool used to run AI-based weather forecasting models. It provides functionalities to install, run, and manage different AI models for weather forecasting. Users can easily install and run various models, customize model settings, download assets, and manage input data from different sources such as ECMWF, CDS, and GRIB files. The tool is designed to optimize performance by running on GPUs and provides options for better organization of assets and output files. It offers a range of command line options for users to interact with the models and customize their forecasting tasks.
eval-dev-quality
DevQualityEval is an evaluation benchmark and framework designed to compare and improve the quality of code generation of Language Model Models (LLMs). It provides developers with a standardized benchmark to enhance real-world usage in software development and offers users metrics and comparisons to assess the usefulness of LLMs for their tasks. The tool evaluates LLMs' performance in solving software development tasks and measures the quality of their results through a point-based system. Users can run specific tasks, such as test generation, across different programming languages to evaluate LLMs' language understanding and code generation capabilities.
2p-kt
2P-Kt is a Kotlin-based and multi-platform reboot of tuProlog (2P), a multi-paradigm logic programming framework written in Java. It consists of an open ecosystem for Symbolic Artificial Intelligence (AI) with modules supporting logic terms, unification, indexing, resolution of logic queries, probabilistic logic programming, binary decision diagrams, OR-concurrent resolution, DSL for logic programming, parsing modules, serialisation modules, command-line interface, and graphical user interface. The tool is designed to support knowledge representation and automatic reasoning through logic programming in an extensible and flexible way, encouraging extensions towards other symbolic AI systems than Prolog. It is a pure, multi-platform Kotlin project supporting JVM, JS, Android, and Native platforms, with a lightweight library leveraging the Kotlin common library.
llm-verified-with-monte-carlo-tree-search
This prototype synthesizes verified code with an LLM using Monte Carlo Tree Search (MCTS). It explores the space of possible generation of a verified program and checks at every step that it's on the right track by calling the verifier. This prototype uses Dafny, Coq, Lean, Scala, or Rust. By using this technique, weaker models that might not even know the generated language all that well can compete with stronger models.
curate-gpt
CurateGPT is a prototype web application and framework for performing general purpose AI-guided curation and curation-related operations over collections of objects. It allows users to load JSON, YAML, or CSV data, build vector database indexes for ontologies, and interact with various data sources like GitHub, Google Drives, Google Sheets, and more. The tool supports ontology curation, knowledge base querying, term autocompletion, and all-by-all comparisons for objects in a collection.
turnkeyml
TurnkeyML is a tools framework that integrates models, toolchains, and hardware backends to simplify the evaluation and actuation of deep learning models. It supports use cases like exporting ONNX files, performance validation, functional coverage measurement, stress testing, and model insights analysis. The framework consists of analysis, build, runtime, reporting tools, and a models corpus, seamlessly integrated to provide comprehensive functionality with simple commands. Extensible through plugins, it offers support for various export and optimization tools and AI runtimes. The project is actively seeking collaborators and is licensed under Apache 2.0.
gemini-cli
gemini-cli is a versatile command-line interface for Google's Gemini LLMs, written in Go. It includes tools for chatting with models, generating/comparing embeddings, and storing data in SQLite for analysis. Users can interact with Gemini models through various subcommands like prompt, chat, counttok, embed content, embed db, and embed similar.
LayerSkip
LayerSkip is an implementation enabling early exit inference and self-speculative decoding. It provides a code base for running models trained using the LayerSkip recipe, offering speedup through self-speculative decoding. The tool integrates with Hugging Face transformers and provides checkpoints for various LLMs. Users can generate tokens, benchmark on datasets, evaluate tasks, and sweep over hyperparameters to optimize inference speed. The tool also includes correctness verification scripts and Docker setup instructions. Additionally, other implementations like gpt-fast and Native HuggingFace are available. Training implementation is a work-in-progress, and contributions are welcome under the CC BY-NC license.
ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.
PolyMind
PolyMind is a multimodal, function calling powered LLM webui designed for various tasks such as internet searching, image generation, port scanning, Wolfram Alpha integration, Python interpretation, and semantic search. It offers a plugin system for adding extra functions and supports different models and endpoints. The tool allows users to interact via function calling and provides features like image input, image generation, and text file search. The application's configuration is stored in a `config.json` file with options for backend selection, compatibility mode, IP address settings, API key, and enabled features.
MultiPL-E
MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. It is part of the BigCode Code Generation LM Harness and allows for evaluating Code LLMs using various benchmarks. The tool supports multiple versions with improvements and new language additions, providing a scalable and polyglot approach to benchmarking neural code generation. Users can access a tutorial for direct usage and explore the dataset of translated prompts on the Hugging Face Hub.
unstructured
The `unstructured` library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of `unstructured` revolve around streamlining and optimizing the data processing workflow for LLMs. `unstructured` modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.
qb
QANTA is a system and dataset for question answering tasks. It provides a script to download datasets, preprocesses questions, and matches them with Wikipedia pages. The system includes various datasets, training, dev, and test data in JSON and SQLite formats. Dependencies include Python 3.6, `click`, and NLTK models. Elastic Search 5.6 is needed for the Guesser component. Configuration is managed through environment variables and YAML files. QANTA supports multiple guesser implementations that can be enabled/disabled. Running QANTA involves using `cli.py` and Luigi pipelines. The system accesses raw Wikipedia dumps for data processing. The QANTA ID numbering scheme categorizes datasets based on events and competitions.
langroid-examples
Langroid-examples is a repository containing examples of using the Langroid Multi-Agent Programming framework to build LLM applications. It provides a collection of scripts and instructions for setting up the environment, working with local LLMs, using OpenAI LLMs, and running various examples. The repository also includes optional setup instructions for integrating with Qdrant, Redis, Momento, GitHub, and Google Custom Search API. Users can explore different scenarios and functionalities of Langroid through the provided examples and documentation.
tutor-gpt
Tutor-GPT is an LLM powered learning companion developed by Plastic Labs. It dynamically reasons about your learning needs and updates its own prompts to best serve you. It is an expansive learning companion that uses theory of mind experiments to provide personalized learning experiences. The project is split into different modules for backend logic, including core logic, discord bot implementation, FastAPI API interface, NextJS web front end, common utilities, and SQL scripts for setting up local supabase. Tutor-GPT is powered by Honcho to build robust user representations and create personalized experiences for each user. Users can run their own instance of the bot by following the provided instructions.
For similar tasks
Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey provides a comprehensive review of efficient and lightweight Multimodal Large Language Models (MLLMs), focusing on model size reduction and cost efficiency for edge computing scenarios. The survey covers the timeline of efficient MLLMs, research on efficient structures and strategies, and applications. It discusses current limitations and future directions in efficient MLLM research.
uvadlc_notebooks
The UvA Deep Learning Tutorials repository contains a series of Jupyter notebooks designed to help understand theoretical concepts from lectures by providing corresponding implementations. The notebooks cover topics such as optimization techniques, transformers, graph neural networks, and more. They aim to teach details of the PyTorch framework, including PyTorch Lightning, with alternative translations to JAX+Flax. The tutorials are integrated as official tutorials of PyTorch Lightning and are relevant for graded assignments and exams.
LiveBench
LiveBench is a benchmark tool designed for Language Model Models (LLMs) with a focus on limiting contamination through monthly new questions based on recent datasets, arXiv papers, news articles, and IMDb movie synopses. It provides verifiable, objective ground-truth answers for accurate scoring without an LLM judge. The tool offers 18 diverse tasks across 6 categories and promises to release more challenging tasks over time. LiveBench is built on FastChat's llm_judge module and incorporates code from LiveCodeBench and IFEval.
farel-bench
The 'farel-bench' project is a benchmark tool for testing LLM reasoning abilities with family relationship quizzes. It generates quizzes based on family relationships of varying degrees and measures the accuracy of large language models in solving these quizzes. The project provides scripts for generating quizzes, running models locally or via APIs, and calculating benchmark metrics. The quizzes are designed to test logical reasoning skills using family relationship concepts, with the goal of evaluating the performance of language models in this specific domain.
ai-chat-protocol
The Microsoft AI Chat Protocol SDK is a library for easily building AI Chat interfaces from services that follow the AI Chat Protocol API Specification. By agreeing on a standard API contract, AI backend consumption and evaluation can be performed easily and consistently across different services. It allows developers to develop AI chat interfaces, consume and evaluate AI inference backends, and incorporate HTTP middleware for logging and authentication.
gen-ai-experiments
Gen-AI-Experiments is a structured collection of Jupyter notebooks and AI experiments designed to guide users through various AI tools, frameworks, and models. It offers valuable resources for both beginners and experienced practitioners, covering topics such as AI agents, model testing, RAG systems, real-world applications, and open-source tools. The repository includes folders with curated libraries, AI agents, experiments, LLM testing, open-source libraries, RAG experiments, and educhain experiments, each focusing on different aspects of AI development and application.
For similar jobs
llm-jp-eval
LLM-jp-eval is a tool designed to automatically evaluate Japanese large language models across multiple datasets. It provides functionalities such as converting existing Japanese evaluation data to text generation task evaluation datasets, executing evaluations of large language models across multiple datasets, and generating instruction data (jaster) in the format of evaluation data prompts. Users can manage the evaluation settings through a config file and use Hydra to load them. The tool supports saving evaluation results and logs using wandb. Users can add new evaluation datasets by following specific steps and guidelines provided in the tool's documentation. It is important to note that using jaster for instruction tuning can lead to artificially high evaluation scores, so caution is advised when interpreting the results.
AlignBench
AlignBench is the first comprehensive evaluation benchmark for assessing the alignment level of Chinese large models across multiple dimensions. It includes introduction information, data, and code related to AlignBench. The benchmark aims to evaluate the alignment performance of Chinese large language models through a multi-dimensional and rule-calibrated evaluation method, enhancing reliability and interpretability.
LiveBench
LiveBench is a benchmark tool designed for Language Model Models (LLMs) with a focus on limiting contamination through monthly new questions based on recent datasets, arXiv papers, news articles, and IMDb movie synopses. It provides verifiable, objective ground-truth answers for accurate scoring without an LLM judge. The tool offers 18 diverse tasks across 6 categories and promises to release more challenging tasks over time. LiveBench is built on FastChat's llm_judge module and incorporates code from LiveCodeBench and IFEval.
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.