pyllms
Minimal Python library to connect to LLMs (OpenAI, Anthropic, AI21, Cohere, Aleph Alpha, HuggingfaceHub, Google PaLM2, with a built-in model performance benchmark.
Stars: 693
PyLLMs is a minimal Python library designed to connect to various Language Model Models (LLMs) such as OpenAI, Anthropic, Google, AI21, Cohere, Aleph Alpha, and HuggingfaceHub. It provides a built-in model performance benchmark for fast prototyping and evaluating different models. Users can easily connect to top LLMs, get completions from multiple models simultaneously, and evaluate models on quality, speed, and cost. The library supports asynchronous completion, streaming from compatible models, and multi-model initialization for testing and comparison. Additionally, it offers features like passing chat history, system messages, counting tokens, and benchmarking models based on quality, speed, and cost.
README:
PyLLMs is a minimal Python library to connect to LLMs (OpenAI, Anthropic, Google, AI21, Cohere, Aleph Alpha, HuggingfaceHub) with a built-in model performance benchmark.
It is ideal for fast prototyping and evaluating different models thanks to:
- Connect to top LLMs in a few lines of code
- Response meta includes tokens processed, cost and latency standardized across the models
- Multi-model support: Get completions from different models at the same time
- LLM benchmark: Evaluate models on quality, speed and cost
Feel free to reuse and expand. Pull requests are welcome.
Install the package using pip:
pip install pyllms
import llms
model = llms.init('gpt-4')
result = model.complete("what is 5+5")
print(result.text)
Library will attempt to read the API keys and the default model from environment variables, which you can set like this (for the provider you are using):
export OPENAI_API_KEY="your_api_key_here"
export ANTHROPIC_API_KEY="your_api_key_here"
export AI21_API_KEY="your_api_key_here"
export COHERE_API_KEY="your_api_key_here"
export ALEPHALPHA_API_KEY="your_api_key_here"
export HUGGINFACEHUB_API_KEY="your_api_key_here"
export GOOGLE_API_KEY="your_api_key_here"
export MISTRAL_API_KEY="your_api_key_here"
export LLMS_DEFAULT_MODEL="gpt-3.5-turbo"
Alternatively, you can pass initialization values to the init() method:
model=llms.init(openai_api_key='your_api_key_here', model='gpt-4')
For using Google LLMs through Vertex AI API, see "Using Google AI models" below.
You can also pass optional parameters to the complete method. 'temperature' and 'max_tokens' are standardized across all APIs and get converted to the corresponding API params.
Any other parameters accepted by the original model are supported in their verbatim form.
result = model.complete(
"what is the capital of country where mozzart was born",
temperature=0.1,
max_tokens=200
)
Note: By default, temperature for all models is set to 0, and max_tokens to 300.
The result meta will contain helpful information like tokens used, cost (which is automatically calculated using current pricing), and response latency:
>>> print(result.meta)
{
'model': 'gpt-3.5-turbo',
'tokens': 34,
'tokens_prompt': 20,
'tokens_completion': 14,
'cost': '0.00007',
'latency': 1.4
}
You can also initialize multiple models at once! This is very useful for testing and comparing output of different models in parallel.
>>> models=llms.init(model=['gpt-3.5-turbo','claude-instant-v1'])
>>> result=models.complete('what is the capital of country where mozzart was born')
>>> print(result.text)
[
'The capital of the country where Mozart was born is Vienna, Austria.',
'Wolfgang Amadeus Mozart was born in Salzburg, Austria.\n\nSo the capital of the country where Mozart was born is Vienna, Austria.'
]
>>> print(result.meta)
[
{'model': 'gpt-3.5-turbo', 'tokens': 34, 'tokens_prompt': 20, 'tokens_completion': 14, 'cost': 6.8e-05, 'latency': 0.7097790241241455},
{'model': 'claude-instant-v1', 'tokens': 54, 'tokens_prompt': 20, 'tokens_completion': 34, 'cost': 5.79e-05, 'latency': 0.7291600704193115}
]
Async completion is supported for compatible models. It is not supported in multi-models mode yet.
result = await model.acomplete("what is the capital of country where mozzart was born")
PyLLMs supports streaming from compatible models. 'complete_stream' method will return generator object and all you have to do is iterate through it:
model= llms.init('claude-v1')
result = model.complete_stream("write an essay on civil war")
for chunk in result.stream:
if chunk is not None:
print(chunk, end='')
Current limitations:
- When streaming, 'meta' is not available
- Multi-models are not supported for streaming
Tip: if you are testing this in python3 CLI, run it with -u parameter to disable buffering:
python3 -u
You can pass the history of conversation in a list using the following format:
history=[]
history.append({"role": "user", "content": user_input})
history.append({"role": "assistant", "content": result.text})
model.complete(prompt=prompt, history=history)
In addition, OpenAI chat models support system message:
model.complete(prompt=prompt, system_message=system, history=history)
You can count tokens using the model's tokenizer:
count=model.count_tokens('the quick brown fox jumped over the lazy dog')
PyLLMs supports optional params to specify base path for the OpenAI input/output format, for example OpenAI models running on Azure.
import llms
AZURE_API_BASE = "{insert here}"
AZURE_API_KEY = "{insert here}"
model = llms.init('gpt-4')
azure_args = {
"engine": "gpt-4", # Azure deployment_id
"api_base": AZURE_API_BASE,
"api_type": "azure",
"api_version": "2023-05-15",
"api_key": AZURE_API_KEY,
}
openai_result = model.complete("what is 5+5")
azure_result = model.complete("what is 5+5", **azure_args)
model.benchmark(**azure_args)
Models are appearing like mushrooms after rain and everyone is interested in three things:
- Quality
- Speed
- Cost
PyLLMs icludes an automated benchmark system. The quality of models is evaluated using a powerful model (for example gpt-4) on a range of predefined questions, or you can supply your own.
model=llms.init(model=['claude-3-haiku-20240307','gpt-4o-mini','claude-3-5-sonnet-20240620','gpt-4o','mistral-large-latest','open-mistral-nemo','gpt-4','gpt-3.5-turbo','deepseek-coder','deepseek-chat','llama-3.1-8b-instant','llama-3.1-70b-versatile'])
gpt4=llms.init('gpt-4o')
models.benchmark(evaluator=gpt4)
Model | Tokens | Total Cost ($) | Median Latency (s) | Speed (tokens/sec) | Accuracy |
---|---|---|---|---|---|
OpenAIProvider('gpt-3.5-turbo') | 1569 | 0.01552 | 0.51 | 45.03 | 22.00% |
MistralProvider('open-mistral-nemo') | 4135 | 0.00323 | 0.65 | 82.65 | 22.00% |
OpenAIProvider('gpt-4') | 2477 | 0.33408 | 1.32 | 16.68 | 26.00% |
DeepSeekProvider('deepseek-coder') | 8079 | 0.00327 | 4.13 | 16.72 | 28.00% |
GroqProvider('llama-3.1-8b-instant') | 6628 | 0.00085 | 2.26 | 82.02 | 28.00% |
AnthropicProvider('claude-3-haiku-20240307') | 5642 | 0.00881 | 1.33 | 55.46 | 28.00% |
DeepSeekProvider('deepseek-chat') | 7310 | 0.00304 | 4.81 | 17.20 | 32.00% |
OpenAIProvider('gpt-4o-mini') | 6029 | 0.00451 | 1.64 | 36.92 | 34.00% |
GroqProvider('llama-3.1-70b-versatile') | 5190 | 0.00781 | 0.71 | 81.62 | 40.00% |
MistralProvider('mistral-large-latest') | 5097 | 0.06787 | 3.08 | 18.03 | 44.00% |
AnthropicProvider('claude-3-5-sonnet-20240620') | 6595 | 0.12018 | 2.54 | 48.90 | 46.00% |
OpenAIProvider('gpt-4o') | 7482 | 0.14310 | 1.60 | 48.00 | 52.00% |
Overall Model Quality (Composite Score) (0.5 accuracy; 0.3 speed; 0.2 cost) as of July 26 2024
To evaluate models on your own prompts, simply pass a list of questions and answers as tuple. The evaluator will automatically evaluate the responses:
models.benchmark(prompts=[("what is the capital of finland", "helsinki")], evaluator=gpt4)
To get a list of supported models, call list(). Models will be shown in the order of least expensive to most expensive.
>>> model=llms.init()
>>> model.list()
>>> model.list("gpt') # lists only models with 'gpt' in name/provider name
Here is a pretty table of supported models (in alphabetical order).
| Provider | Name | Prompt Cost | Completion Cost | Token Limit |
|---------------------|------------------------|-------------|-----------------|-------------|
| AI21Provider | j2-grande-instruct | 10.0 | 10.0 | 8192 |
| AI21Provider | j2-jumbo-instruct | 15.0 | 15.0 | 8192 |
| AlephAlphaProvider | luminous-base | 6.6 | 7.6 | 2048 |
| AlephAlphaProvider | luminous-extended | 9.9 | 10.9 | 2048 |
| AlephAlphaProvider | luminous-supreme | 38.5 | 42.5 | 2048 |
| AlephAlphaProvider | luminous-supreme-control | 48.5 | 53.6 | 2048 |
| AnthropicProvider | claude-instant-v1 | 1.63 | 5.51 | 9000 |
| AnthropicProvider | claude-v1 | 11.02 | 32.68 | 9000 |
| CohereProvider | command | 25.0 | 25.0 | 8192 |
| CohereProvider | command-nightly | 25.0 | 25.0 | 8192 |
| GoogleProvider | chat-bison | 0.5 | 0.5 | 2048 |
| HuggingfaceHub | hf_pythia | 0.0 | 0.0 | 2048 |
| OpenAIProvider | gpt-3.5-turbo | 2.0 | 2.0 | 4000 |
| OpenAIProvider | gpt-4 | 30.0 | 60.0 | 8000 |
Useful links:
OpenAI documentation
Anthropic documentation
AI21 documentation
Cohere documentation
Aleph Alpha documentation
Google Generateive AI
documentation
Google Vertex AI documentation
- (Set up a GCP account and create a project)
- Enable Vertex AI APIs in your GCP project - https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com
- Install gcloud CLI tool - https://cloud.google.com/sdk/docs/install
- Set up Application Default Credentials - https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to
- Now you should be able to init Google LLM
model = llms.init('chat-bison')
result = model.complete("hello!")
PyLLMs supports locally installed Ollama models.
To use your Ollama models:
-
Ensure Ollama is running (as well as reachable at
localhost:11434
) and you've pulled the model you would like to use. -
Get the name of the LLM you would like to use.
Run:
ollama list
to get a list of installed models.
NAME ID SIZE MODIFIED
tinyllama:latest ... 637 MB ...
- Initialize PyLLMs as you would any other model:
model = llms.init("tinyllama:latest")
result = model.complete("hello!")
where tinyllama:latest
is the model name of an installed model.
This project is licensed under the MIT License.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for pyllms
Similar Open Source Tools
pyllms
PyLLMs is a minimal Python library designed to connect to various Language Model Models (LLMs) such as OpenAI, Anthropic, Google, AI21, Cohere, Aleph Alpha, and HuggingfaceHub. It provides a built-in model performance benchmark for fast prototyping and evaluating different models. Users can easily connect to top LLMs, get completions from multiple models simultaneously, and evaluate models on quality, speed, and cost. The library supports asynchronous completion, streaming from compatible models, and multi-model initialization for testing and comparison. Additionally, it offers features like passing chat history, system messages, counting tokens, and benchmarking models based on quality, speed, and cost.
KwaiAgents
KwaiAgents is a series of Agent-related works open-sourced by the [KwaiKEG](https://github.com/KwaiKEG) from [Kuaishou Technology](https://www.kuaishou.com/en). The open-sourced content includes: 1. **KAgentSys-Lite**: a lite version of the KAgentSys in the paper. While retaining some of the original system's functionality, KAgentSys-Lite has certain differences and limitations when compared to its full-featured counterpart, such as: (1) a more limited set of tools; (2) a lack of memory mechanisms; (3) slightly reduced performance capabilities; and (4) a different codebase, as it evolves from open-source projects like BabyAGI and Auto-GPT. Despite these modifications, KAgentSys-Lite still delivers comparable performance among numerous open-source Agent systems available. 2. **KAgentLMs**: a series of large language models with agent capabilities such as planning, reflection, and tool-use, acquired through the Meta-agent tuning proposed in the paper. 3. **KAgentInstruct**: over 200k Agent-related instructions finetuning data (partially human-edited) proposed in the paper. 4. **KAgentBench**: over 3,000 human-edited, automated evaluation data for testing Agent capabilities, with evaluation dimensions including planning, tool-use, reflection, concluding, and profiling.
AutoGPTQ
AutoGPTQ is an easy-to-use LLM quantization package with user-friendly APIs, based on GPTQ algorithm (weight-only quantization). It provides a simple and efficient way to quantize large language models (LLMs) to reduce their size and computational cost while maintaining their performance. AutoGPTQ supports a wide range of LLM models, including GPT-2, GPT-J, OPT, and BLOOM. It also supports various evaluation tasks, such as language modeling, sequence classification, and text summarization. With AutoGPTQ, users can easily quantize their LLM models and deploy them on resource-constrained devices, such as mobile phones and embedded systems.
agentops
AgentOps is a toolkit for evaluating and developing robust and reliable AI agents. It provides benchmarks, observability, and replay analytics to help developers build better agents. AgentOps is open beta and can be signed up for here. Key features of AgentOps include: - Session replays in 3 lines of code: Initialize the AgentOps client and automatically get analytics on every LLM call. - Time travel debugging: (coming soon!) - Agent Arena: (coming soon!) - Callback handlers: AgentOps works seamlessly with applications built using Langchain and LlamaIndex.
RobustVLM
This repository contains code for the paper 'Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models'. It focuses on fine-tuning CLIP in an unsupervised manner to enhance its robustness against visual adversarial attacks. By replacing the vision encoder of large vision-language models with the fine-tuned CLIP models, it achieves state-of-the-art adversarial robustness on various vision-language tasks. The repository provides adversarially fine-tuned ViT-L/14 CLIP models and offers insights into zero-shot classification settings and clean accuracy improvements.
ax
Ax is a Typescript library that allows users to build intelligent agents inspired by agentic workflows and the Stanford DSP paper. It seamlessly integrates with multiple Large Language Models (LLMs) and VectorDBs to create RAG pipelines or collaborative agents capable of solving complex problems. The library offers advanced features such as streaming validation, multi-modal DSP, and automatic prompt tuning using optimizers. Users can easily convert documents of any format to text, perform smart chunking, embedding, and querying, and ensure output validation while streaming. Ax is production-ready, written in Typescript, and has zero dependencies.
llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks like Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, LMFormatEnforcer, etc on tasks like multi-label classification, named entity recognition, synthetic data generation. The tool provides benchmark results, methodology, instructions to run the benchmark, add new data, and add a new framework. It also includes a roadmap for framework-related tasks, contribution guidelines, citation information, and feedback request.
BetaML.jl
The Beta Machine Learning Toolkit is a package containing various algorithms and utilities for implementing machine learning workflows in multiple languages, including Julia, Python, and R. It offers a range of supervised and unsupervised models, data transformers, and assessment tools. The models are implemented entirely in Julia and are not wrappers for third-party models. Users can easily contribute new models or request implementations. The focus is on user-friendliness rather than computational efficiency, making it suitable for educational and research purposes.
AiOS
AiOS is a tool for human pose and shape estimation, performing human localization and SMPL-X estimation in a progressive manner. It consists of body localization, body refinement, and whole-body refinement stages. Users can download datasets for evaluation, SMPL-X body models, and AiOS checkpoint. Installation involves creating a conda virtual environment, installing PyTorch, torchvision, Pytorch3D, MMCV, and other dependencies. Inference requires placing the video for inference and pretrained models in specific directories. Test results are provided for NMVE, NMJE, MVE, and MPJPE on datasets like BEDLAM and AGORA. Users can run scripts for AGORA validation, AGORA test leaderboard, and BEDLAM leaderboard. The tool acknowledges codes from MMHuman3D, ED-Pose, and SMPLer-X.
agentic
Agentic is a standard AI functions/tools library optimized for TypeScript and LLM-based apps, compatible with major AI SDKs. It offers a set of thoroughly tested AI functions that can be used with favorite AI SDKs without writing glue code. The library includes various clients for services like Bing web search, calculator, Clearbit data resolution, Dexa podcast questions, and more. It also provides compound tools like SearchAndCrawl and supports multiple AI SDKs such as OpenAI, Vercel AI SDK, LangChain, LlamaIndex, Firebase Genkit, and Dexa Dexter. The goal is to create minimal clients with strongly-typed TypeScript DX, composable AIFunctions via AIFunctionSet, and compatibility with major TS AI SDKs.
GenAIComps
GenAIComps is an initiative aimed at building enterprise-grade Generative AI applications using a microservice architecture. It simplifies the scaling and deployment process for production, abstracting away infrastructure complexities. GenAIComps provides a suite of containerized microservices that can be assembled into a mega-service tailored for real-world Enterprise AI applications. The modular approach of microservices allows for independent development, deployment, and scaling of individual components, promoting modularity, flexibility, and scalability. The mega-service orchestrates multiple microservices to deliver comprehensive solutions, encapsulating complex business logic and workflow orchestration. The gateway serves as the interface for users to access the mega-service, providing customized access based on user requirements.
last_layer
last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks, and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through. The tool offers ultra-fast scanning with low latency, privacy-focused operation without tracking or network calls, compatibility with serverless platforms, advanced threat detection mechanisms, and regular updates to adapt to evolving security challenges. It significantly reduces the risk of prompt-based attacks and exploits but cannot guarantee complete protection against all possible threats.
airswap-protocols
AirSwap Protocols is a repository containing smart contracts for developers and traders on the AirSwap peer-to-peer trading network. It includes various packages for functionalities like server registry, atomic token swap, staking, rewards pool, batch token and order calls, libraries, and utils. The repository follows a branching and release process for contracts and tools, with steps for regular development process and individual package features or patches. Users can deploy and verify contracts using specific commands with network flags.
floneum
Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.
aikit
AIKit is a one-stop shop to quickly get started to host, deploy, build and fine-tune large language models (LLMs). AIKit offers two main capabilities: Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open-source LLMs! Fine Tuning: AIKit offers an extensible fine tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.
StableToolBench
StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features like Virtual API System, Solvable Queries, and Stable Evaluation System. The benchmark ensures consistency through a caching system and API simulators, filters queries based on solvability using LLMs, and evaluates model performance using GPT-4 with metrics like Solvable Pass Rate and Solvable Win Rate.
For similar tasks
pyllms
PyLLMs is a minimal Python library designed to connect to various Language Model Models (LLMs) such as OpenAI, Anthropic, Google, AI21, Cohere, Aleph Alpha, and HuggingfaceHub. It provides a built-in model performance benchmark for fast prototyping and evaluating different models. Users can easily connect to top LLMs, get completions from multiple models simultaneously, and evaluate models on quality, speed, and cost. The library supports asynchronous completion, streaming from compatible models, and multi-model initialization for testing and comparison. Additionally, it offers features like passing chat history, system messages, counting tokens, and benchmarking models based on quality, speed, and cost.
LLM-Fine-Tuning-Azure
A fine-tuning guide for both OpenAI and Open-Source Large Language Models on Azure. Fine-Tuning retrains an existing pre-trained LLM using example data, resulting in a new 'custom' fine-tuned LLM optimized for task-specific examples. Use cases include improving LLM performance on specific tasks and introducing information not well represented by the base LLM model. Suitable for cases where latency is critical, high accuracy is required, and clear evaluation metrics are available. Learning path includes labs for fine-tuning GPT and Llama2 models via Dashboards and Python SDK.
cellseg_models.pytorch
cellseg-models.pytorch is a Python library built upon PyTorch for 2D cell/nuclei instance segmentation models. It provides multi-task encoder-decoder architectures and post-processing methods for segmenting cell/nuclei instances. The library offers high-level API to define segmentation models, open-source datasets for training, flexibility to modify model components, sliding window inference, multi-GPU inference, benchmarking utilities, regularization techniques, and example notebooks for training and finetuning models with different backbones.
awesome-mobile-llm
Awesome Mobile LLMs is a curated list of Large Language Models (LLMs) and related studies focused on mobile and embedded hardware. The repository includes information on various LLM models, deployment frameworks, benchmarking efforts, applications, multimodal LLMs, surveys on efficient LLMs, training LLMs on device, mobile-related use-cases, industry announcements, and related repositories. It aims to be a valuable resource for researchers, engineers, and practitioners interested in mobile LLMs.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.