promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
Stars: 4447
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
README:
promptfoo
is a tool for testing, evaluating, and red-teaming LLM apps.
With promptfoo, you can:
- Build reliable prompts, models, and RAGs with benchmarks specific to your use-case
- Secure your apps with automated red teaming and pentesting
- Speed up evaluations with caching, concurrency, and live reloading
- Score outputs automatically by defining metrics
- Use as a CLI, library, or in CI/CD
- Use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API
The goal: test-driven LLM development instead of trial-and-error.
npx promptfoo@latest init
promptfoo produces matrix views that let you quickly evaluate outputs across many prompts and inputs:
It works on the command line too:
It also produces high-level vulnerability and risk reports:
There are many different ways to evaluate prompts. Here are some reasons to consider promptfoo:
- Developer friendly: promptfoo is fast, with quality-of-life features like live reloads and caching.
- Battle-tested: Originally built for LLM apps serving over 10 million users in production. Our tooling is flexible and can be adapted to many setups.
- Simple, declarative test cases: Define evals without writing code or working with heavy notebooks.
- Language agnostic: Use Python, Javascript, or any other language.
- Share & collaborate: Built-in share functionality & web viewer for working with teammates.
- Open-source: LLM evals are a commodity and should be served by 100% open-source projects with no strings attached.
- Private: This software runs completely locally. The evals run on your machine and talk directly with the LLM.
Start by establishing a handful of test cases - core use cases and failure cases that you want to ensure your prompt can handle.
As you explore modifications to the prompt, use promptfoo eval
to rate all outputs. This ensures the prompt is actually improving overall.
As you collect more examples and establish a user feedback loop, continue to build the pool of test cases.
To get started, run this command:
npx promptfoo@latest init
This will create a promptfooconfig.yaml
placeholder in your current directory.
After editing the prompts and variables to your liking, run the eval command to kick off an evaluation:
npx promptfoo@latest eval
Run this command:
npx promptfoo@latest redteam init
This will ask you questions about what types of vulnerabilities you want to find and walk you through running your first scan.
The YAML configuration format runs each prompt through a series of example inputs (aka "test case") and checks if they meet requirements (aka "assert").
See the Configuration docs for a detailed guide.
prompts:
- file://prompt1.txt
- file://prompt2.txt
providers:
- openai:gpt-4o-mini
- ollama:llama3.1:70b
tests:
- description: 'Test translation to French'
vars:
language: French
input: Hello world
assert:
- type: contains-json
- type: javascript
value: output.length < 100
- description: 'Test translation to German'
vars:
language: German
input: How's it going?
assert:
- type: llm-rubric
value: does not describe self as an AI, model, or chatbot
- type: similar
value: was geht
threshold: 0.6 # cosine similarity
See Test assertions for full details.
Deterministic eval metrics
Assertion Type | Returns true if... |
---|---|
equals |
output matches exactly |
contains |
output contains substring |
icontains |
output contains substring, case insensitive |
regex |
output matches regex |
starts-with |
output starts with string |
contains-any |
output contains any of the listed substrings |
contains-all |
output contains all list of substrings |
icontains-any |
output contains any of the listed substrings, case insensitive |
icontains-all |
output contains all list of substrings, case insensitive |
is-json |
output is valid json (optional json schema validation) |
contains-json |
output contains valid json (optional json schema validation) |
is-sql |
output is valid sql |
contains-sql |
output contains valid sql |
is-xml |
output is valid xml |
contains-xml |
output contains valid xml |
javascript |
provided Javascript function validates the output |
python |
provided Python function validates the output |
webhook |
provided webhook returns {pass: true}
|
rouge-n |
Rouge-N score is above a given threshold |
levenshtein |
Levenshtein distance is below a threshold |
latency |
Latency is below a threshold (milliseconds) |
perplexity |
Perplexity is below a threshold |
cost |
Cost is below a threshold (for models with cost info such as GPT) |
is-valid-openai-function-call |
Ensure that the function call matches the function's JSON schema |
is-valid-openai-tools-call |
Ensure that all tool calls match the tools JSON schema |
Model-assisted eval metrics
Assertion Type | Method |
---|---|
similar | Embeddings and cosine similarity are above a threshold |
classifier | Run LLM output through a classifier |
llm-rubric | LLM output matches a given rubric, using a Language Model to grade output |
answer-relevance | Ensure that LLM output is related to original query |
context-faithfulness | Ensure that LLM output uses the context |
context-recall | Ensure that ground truth appears in context |
context-relevance | Ensure that context is relevant to original query |
factuality | LLM output adheres to the given facts, using Factuality method from OpenAI eval |
model-graded-closedqa | LLM output adheres to given criteria, using Closed QA method from OpenAI eval |
moderation | Make sure outputs are safe |
select-best | Compare multiple outputs for a test case and pick the best one |
Every test type can be negated by prepending not-
. For example, not-equals
or not-regex
.
Some people prefer to configure their LLM tests in a CSV. In that case, the config is pretty simple:
prompts:
- file://prompts.txt
providers:
- openai:gpt-4o-mini
tests: file://tests.csv
See example CSV.
If you're looking to customize your usage, you have a wide set of parameters at your disposal.
Option | Description |
---|---|
-p, --prompts <paths...> |
Paths to prompt files, directory, or glob |
-r, --providers <name or path...> |
One of: openai:chat, openai:completion, openai:model-name, localai:chat:model-name, localai:completion:model-name. See API providers |
-o, --output <path> |
Path to output file (csv, json, yaml, html) |
--tests <path> |
Path to external test file |
-c, --config <paths> |
Path to one or more configuration files. promptfooconfig.js/json/yaml is automatically loaded if present |
-j, --max-concurrency <number> |
Maximum number of concurrent API calls |
--table-cell-max-length <number> |
Truncate console table cells to this length |
--prompt-prefix <path> |
This prefix is prepended to every prompt |
--prompt-suffix <path> |
This suffix is append to every prompt |
--grader |
Provider that will conduct the evaluation, if you are using LLM to grade your output |
After running an eval, you may optionally use the view
command to open the web viewer:
npx promptfoo view
In this example, we evaluate whether adding adjectives to the personality of an assistant bot affects the responses:
npx promptfoo eval -p prompts.txt -r openai:gpt-4o-mini -t tests.csv
This command will evaluate the prompts in prompts.txt
, substituting the variable values from vars.csv
, and output results in your terminal.
You can also output a nice spreadsheet, JSON, YAML, or an HTML file:
In the next example, we evaluate the difference between GPT 3 and GPT 4 outputs for a given prompt:
npx promptfoo eval -p prompts.txt -r openai:gpt-4o openai:gpt-4o-mini -o output.html
Produces this HTML table:
You can also use promptfoo
as a library in your project by importing the evaluate
function. The function takes the following parameters:
-
testSuite
: the Javascript equivalent of the promptfooconfig.yamlinterface EvaluateTestSuite { providers: string[]; // Valid provider name (e.g. openai:gpt-4o-mini) prompts: string[]; // List of prompts tests: string | TestCase[]; // Path to a CSV file, or list of test cases defaultTest?: Omit<TestCase, 'description'>; // Optional: add default vars and assertions on test case outputPath?: string | string[]; // Optional: write results to file } interface TestCase { // Optional description of what you're testing description?: string; // Key-value pairs to substitute in the prompt vars?: Record<string, string | string[] | object>; // Optional list of automatic checks to run on the LLM output assert?: Assertion[]; // Additional configuration settings for the prompt options?: PromptConfig & OutputConfig & GradingConfig; // The required score for this test case. If not provided, the test case is graded pass/fail. threshold?: number; // Override the provider for this test provider?: string | ProviderOptions | ApiProvider; } interface Assertion { type: string; value?: string; threshold?: number; // Required score for pass weight?: number; // The weight of this assertion compared to other assertions in the test case. Defaults to 1. provider?: ApiProvider; // For assertions that require an LLM provider }
-
options
: misc options related to how the tests are runinterface EvaluateOptions { maxConcurrency?: number; showProgressBar?: boolean; generateSuggestions?: boolean; }
promptfoo
exports an evaluate
function that you can use to run prompt evaluations.
import promptfoo from 'promptfoo';
const results = await promptfoo.evaluate({
prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
providers: ['openai:gpt-4o-mini'],
tests: [
{
vars: {
body: 'Hello world',
},
},
{
vars: {
body: "I'm hungry",
},
},
],
});
This code imports the promptfoo
library, defines the evaluation options, and then calls the evaluate
function with these options.
See the full example here, which includes an example results object.
- Main guide: Learn about how to configure your YAML file, setup prompt files, etc.
- Configuring test cases: Learn more about how to configure expected outputs and test assertions.
Requires Node.js 18 or newer.
You can install promptfoo using npm, npx, Homebrew, or by cloning the repository.
Install promptfoo
globally:
npm install -g promptfoo
Or install it locally in your project:
npm install promptfoo
Run promptfoo without installing it:
npx promptfoo@latest init
This will create a promptfooconfig.yaml
placeholder in your current directory.
If you prefer using Homebrew, you can install promptfoo with:
brew install promptfoo
For the latest development version:
git clone https://github.com/promptfoo/promptfoo.git
cd promptfoo
npm install
npm run build
npm link
To verify that promptfoo is installed correctly, run:
promptfoo --version
This should display the version number of promptfoo.
For more detailed installation instructions, including system requirements and troubleshooting, please visit our installation guide.
We support OpenAI's API as well as a number of open-source models. It's also to set up your own custom API provider. See Provider documentation for more details.
Here's how to build and run locally:
git clone https://github.com/promptfoo/promptfoo.git
cd promptfoo
# Optionally use the Node.js version specified in the .nvmrc file - make sure you are on node >= 18
nvm use
npm i
cd path/to/experiment-with-promptfoo # contains your promptfooconfig.yaml
npx path/to/promptfoo-source eval
The web UI is located in src/app
. To run it in dev mode, run npm run local:app
. This will host the web UI at http://localhost:3000. The web UI expects promptfoo view
to be running separately.
Then run:
npm run build
The build has some side effects such as e.g. copying HTML templates, migrations, etc.
Contributions are welcome! Please feel free to submit a pull request or open an issue.
promptfoo
includes several npm scripts to make development easier and more efficient. To use these scripts, run npm run <script_name>
in the project directory.
Here are some of the available scripts:
-
build
: Transpile TypeScript files to JavaScript -
build:watch
: Continuously watch and transpile TypeScript files on changes -
test
: Run test suite -
test:watch
: Continuously run test suite on changes -
db:generate
: Generate new db migrations (and create the db if it doesn't already exist). Note that after generating a new migration, you'll have tonpm i
to copy the migrations intodist/
. -
db:migrate
: Run existing db migrations (and create the db if it doesn't already exist)
To run the CLI during development you can run a command like: npm run local -- eval --config $(readlink -f ./examples/cloudflare-ai/chat_config.yaml)
, where any parts of the command after --
are passed through to our CLI entrypoint. Since the Next dev server isn't supported in this mode, see the instructions above for running the web server.
- Create an implementation in
src/providers/SOME_PROVIDER_FILE
- Update
loadApiProvider
insrc/providers.ts
to load your provider via string - Add test cases in
test/providers.test.ts
- Test the actual provider implementation
- Test loading the provider via a
loadApiProvider
test
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for promptfoo
Similar Open Source Tools
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
llm2sh
llm2sh is a command-line utility that leverages Large Language Models (LLMs) to translate plain-language requests into shell commands. It provides a convenient way to interact with your system using natural language. The tool supports multiple LLMs for command generation, offers a customizable configuration file, YOLO mode for running commands without confirmation, and is easily extensible with new LLMs and system prompts. Users can set up API keys for OpenAI, Claude, Groq, and Cerebras to use the tool effectively. llm2sh does not store user data or command history, and it does not record or send telemetry by itself, but the LLM APIs may collect and store requests and responses for their purposes.
BodhiApp
Bodhi App runs Open Source Large Language Models locally, exposing LLM inference capabilities as OpenAI API compatible REST APIs. It leverages llama.cpp for GGUF format models and huggingface.co ecosystem for model downloads. Users can run fine-tuned models for chat completions, create custom aliases, and convert Huggingface models to GGUF format. The CLI offers commands for environment configuration, model management, pulling files, serving API, and more.
thepipe
The Pipe is a multimodal-first tool for feeding files and web pages into vision-language models such as GPT-4V. It is best for LLM and RAG applications that require a deep understanding of tricky data sources. The Pipe is available as a hosted API at thepi.pe, or it can be set up locally.
stable-diffusion-webui
Stable Diffusion WebUI Docker Image allows users to run Automatic1111 WebUI in a docker container locally or in the cloud. The images do not bundle models or third-party configurations, requiring users to use a provisioning script for container configuration. It supports NVIDIA CUDA, AMD ROCm, and CPU platforms, with additional environment variables for customization and pre-configured templates for Vast.ai and Runpod.io. The service is password protected by default, with options for version pinning, startup flags, and service management using supervisorctl.
chatgpt-cli
ChatGPT CLI provides a powerful command-line interface for seamless interaction with ChatGPT models via OpenAI and Azure. It features streaming capabilities, extensive configuration options, and supports various modes like streaming, query, and interactive mode. Users can manage thread-based context, sliding window history, and provide custom context from any source. The CLI also offers model and thread listing, advanced configuration options, and supports GPT-4, GPT-3.5-turbo, and Perplexity's models. Installation is available via Homebrew or direct download, and users can configure settings through default values, a config.yaml file, or environment variables.
comfyui
ComfyUI is a highly-configurable, cloud-first AI-Dock container that allows users to run ComfyUI without bundled models or third-party configurations. Users can configure the container using provisioning scripts. The Docker image supports NVIDIA CUDA, AMD ROCm, and CPU platforms, with version tags for different configurations. Additional environment variables and Python environments are provided for customization. ComfyUI service runs on port 8188 and can be managed using supervisorctl. The tool also includes an API wrapper service and pre-configured templates for Vast.ai. The author may receive compensation for services linked in the documentation.
Whisper-WebUI
Whisper-WebUI is a Gradio-based browser interface for Whisper, serving as an Easy Subtitle Generator. It supports generating subtitles from various sources such as files, YouTube, and microphone. The tool also offers speech-to-text and text-to-text translation features, utilizing Facebook NLLB models and DeepL API. Users can translate subtitle files from other languages to English and vice versa. The project integrates faster-whisper for improved VRAM usage and transcription speed, providing efficiency metrics for optimized whisper models. Additionally, users can choose from different Whisper models based on size and language requirements.
trustgraph
TrustGraph is a tool that deploys private GraphRAG pipelines to build a RDF style knowledge graph from data, enabling accurate and secure `RAG` requests compatible with cloud LLMs and open-source SLMs. It showcases the reliability and efficiencies of GraphRAG algorithms, capturing contextual language flags missed in conventional RAG approaches. The tool offers features like PDF decoding, text chunking, inference of various LMs, RDF-aligned Knowledge Graph extraction, and more. TrustGraph is designed to be modular, supporting multiple Language Models and environments, with a plug'n'play architecture for easy customization.
Construction-Hazard-Detection
Construction-Hazard-Detection is an AI-driven tool focused on improving safety at construction sites by utilizing the YOLOv8 model for object detection. The system identifies potential hazards like overhead heavy loads and steel pipes, providing real-time analysis and warnings. Users can configure the system via a YAML file and run it using Docker. The primary dataset used for training is the Construction Site Safety Image Dataset enriched with additional annotations. The system logs are accessible within the Docker container for debugging, and notifications are sent through the LINE messaging API when hazards are detected.
runpod-worker-comfy
runpod-worker-comfy is a serverless API tool that allows users to run any ComfyUI workflow to generate an image. Users can provide input images as base64-encoded strings, and the generated image can be returned as a base64-encoded string or uploaded to AWS S3. The tool is built on Ubuntu + NVIDIA CUDA and provides features like built-in checkpoints and VAE models. Users can configure environment variables to upload images to AWS S3 and interact with the RunPod API to generate images. The tool also supports local testing and deployment to Docker hub using Github Actions.
LEADS
LEADS is a lightweight embedded assisted driving system designed to simplify the development of instrumentation, control, and analysis systems for racing cars. It is written in Python and C/C++ with impressive performance. The system is customizable and provides abstract layers for component rearrangement. It supports hardware components like Raspberry Pi and Arduino, and can adapt to various hardware types. LEADS offers a modular structure with a focus on flexibility and lightweight design. It includes robust safety features, modern GUI design with dark mode support, high performance on different platforms, and powerful ESC systems for traction control and braking. The system also supports real-time data sharing, live video streaming, and AI-enhanced data analysis for driver training. LEADS VeC Remote Analyst enables transparency between the driver and pit crew, allowing real-time data sharing and analysis. The system is designed to be user-friendly, adaptable, and efficient for racing car development.
optillm
optillm is an OpenAI API compatible optimizing inference proxy implementing state-of-the-art techniques to enhance accuracy and performance of LLMs, focusing on reasoning over coding, logical, and mathematical queries. By leveraging additional compute at inference time, it surpasses frontier models across diverse tasks.
code2prompt
Code2Prompt is a powerful command-line tool that generates comprehensive prompts from codebases, designed to streamline interactions between developers and Large Language Models (LLMs) for code analysis, documentation, and improvement tasks. It bridges the gap between codebases and LLMs by converting projects into AI-friendly prompts, enabling users to leverage AI for various software development tasks. The tool offers features like holistic codebase representation, intelligent source tree generation, customizable prompt templates, smart token management, Gitignore integration, flexible file handling, clipboard-ready output, multiple output options, and enhanced code readability.
xFasterTransformer
xFasterTransformer is an optimized solution for Large Language Models (LLMs) on the X86 platform, providing high performance and scalability for inference on mainstream LLM models. It offers C++ and Python APIs for easy integration, along with example codes and benchmark scripts. Users can prepare models in a different format, convert them, and use the APIs for tasks like encoding input prompts, generating token ids, and serving inference requests. The tool supports various data types and models, and can run in single or multi-rank modes using MPI. A web demo based on Gradio is available for popular LLM models like ChatGLM and Llama2. Benchmark scripts help evaluate model inference performance quickly, and MLServer enables serving with REST and gRPC interfaces.
ai-dial-core
AI DIAL Core is an HTTP Proxy that provides a unified API to different chat completion and embedding models, assistants, and applications. It is written in Java 17 and built on Eclipse Vert.x. The core functionality includes handling static and dynamic settings, deployment on Kubernetes using Helm charts, and storing user data in Blob Storage and Redis. It supports various identity providers, storage providers like AWS S3, Google Cloud Storage, and Azure Blob Store, and features like AI DIAL Addons, Interceptors, Assistants, Applications, and Models with customizable parameters and configurations.
For similar tasks
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
llm-client
LLMClient is a JavaScript/TypeScript library that simplifies working with large language models (LLMs) by providing an easy-to-use interface for building and composing efficient prompts using prompt signatures. These signatures enable the automatic generation of typed prompts, allowing developers to leverage advanced capabilities like reasoning, function calling, RAG, ReAcT, and Chain of Thought. The library supports various LLMs and vector databases, making it a versatile tool for a wide range of applications.
SimplerLLM
SimplerLLM is an open-source Python library that simplifies interactions with Large Language Models (LLMs) for researchers and beginners. It provides a unified interface for different LLM providers, tools for enhancing language model capabilities, and easy development of AI-powered tools and apps. The library offers features like unified LLM interface, generic text loader, RapidAPI connector, SERP integration, prompt template builder, and more. Users can easily set up environment variables, create LLM instances, use tools like SERP, generic text loader, calling RapidAPI APIs, and prompt template builder. Additionally, the library includes chunking functions to split texts into manageable chunks based on different criteria. Future updates will bring more tools, interactions with local LLMs, prompt optimization, response evaluation, GPT Trainer, document chunker, advanced document loader, integration with more providers, Simple RAG with SimplerVectors, integration with vector databases, agent builder, and LLM server.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
vespa
Vespa is a platform that performs operations such as selecting a subset of data in a large corpus, evaluating machine-learned models over the selected data, organizing and aggregating it, and returning it, typically in less than 100 milliseconds, all while the data corpus is continuously changing. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.
python-aiplatform
The Vertex AI SDK for Python is a library that provides a convenient way to use the Vertex AI API. It offers a high-level interface for creating and managing Vertex AI resources, such as datasets, models, and endpoints. The SDK also provides support for training and deploying custom models, as well as using AutoML models. With the Vertex AI SDK for Python, you can quickly and easily build and deploy machine learning models on Vertex AI.
ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.
opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.
For similar jobs
promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.
Sanmill
Sanmill is a free, powerful UCI-like N men's morris program with CUI, Flutter GUI and Qt GUI. Nine men's morris is a strategy board game for two players dating at least to the Roman Empire. The game is also known as nine-man morris , mill , mills , the mill game , merels , merrills , merelles , marelles , morelles , and ninepenny marl in English.
ComfyUI-IF_AI_tools
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local Large Language Model (LLM) via Ollama. This tool enables you to enhance your image generation workflow by leveraging the power of language models.
log10
Log10 is a one-line Python integration to manage your LLM data. It helps you log both closed and open-source LLM calls, compare and identify the best models and prompts, store feedback for fine-tuning, collect performance metrics such as latency and usage, and perform analytics and monitor compliance for LLM powered applications. Log10 offers various integration methods, including a python LLM library wrapper, the Log10 LLM abstraction, and callbacks, to facilitate its use in both existing production environments and new projects. Pick the one that works best for you. Log10 also provides a copilot that can help you with suggestions on how to optimize your prompt, and a feedback feature that allows you to add feedback to your completions. Additionally, Log10 provides prompt provenance, session tracking and call stack functionality to help debug prompt chains. With Log10, you can use your data and feedback from users to fine-tune custom models with RLHF, and build and deploy more reliable, accurate and efficient self-hosted models. Log10 also supports collaboration, allowing you to create flexible groups to share and collaborate over all of the above features.
Efficient-LLMs-Survey
This repository provides a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from **model-centric** , **data-centric** , and **framework-centric** perspective, respectively. We hope our survey and this GitHub repository can serve as valuable resources to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
awesome-gpt-prompt-engineering
Awesome GPT Prompt Engineering is a curated list of resources, tools, and shiny things for GPT prompt engineering. It includes roadmaps, guides, techniques, prompt collections, papers, books, communities, prompt generators, Auto-GPT related tools, prompt injection information, ChatGPT plug-ins, prompt engineering job offers, and AI links directories. The repository aims to provide a comprehensive guide for prompt engineering enthusiasts, covering various aspects of working with GPT models and improving communication with AI tools.