llm-strategy
Directly Connecting Python to LLMs via Strongly-Typed Functions, Dataclasses, Interfaces & Generic Types
Stars: 393
The 'llm-strategy' repository implements the Strategy Pattern using Large Language Models (LLMs) like OpenAI’s GPT-3. It provides a decorator 'llm_strategy' that connects to an LLM to implement abstract methods in interface classes. The package uses doc strings, type annotations, and method/function names as prompts for the LLM and can convert the responses back to Python data. It aims to automate the parsing of structured data by using LLMs, potentially reducing the need for manual Python code in the future.
README:
Implementing the Strategy Pattern using LLMs.
Also, please see https://blog.blackhc.net/2022/12/llm_software_engineering/ for a wider perspective on why this could be important in the future.
This package adds a decorator llm_strategy
that connects to an LLM (such as OpenAI’s GPT-3) and uses the LLM to "implement" abstract methods in interface classes. It does this by forwarding requests to the LLM and converting the responses back to Python data using Python's @dataclasses
.
It uses the doc strings, type annotations, and method/function names as prompts for the LLM, and can automatically convert the results back into Python types (currently only supporting @dataclasses
). It can also extract a data schema to send to the LLM for interpretation. While the llm-strategy
package still relies on some Python code, it has the potential to reduce the need for this code in the future by using additional, cheaper LLMs to automate the parsing of structured data.
- Github repository: https://github.com/blackhc/llm-strategy/
- Documentation https://blackhc.github.io/llm-strategy/
The latest version also includes a package for hyperparameter tracking and collecting traces from LLMs.
This for example allows for meta optimization. See examples/research for a simple implementation using Generics.
You can find an example WandB trace at: https://wandb.ai/blackhc/blackboard-pagi/reports/Meta-Optimization-Example-Trace--Vmlldzo3MDMxODEz?accessToken=p9hubfskmq1z5yj1uz7wx1idh304diiernp7pjlrjrybpaozlwv3dnitjt7vni1j
The prompts showing off the pattern using Generics are straightforward:
T_TaskParameters = TypeVar("T_TaskParameters")
T_TaskResults = TypeVar("T_TaskResults")
T_Hyperparameters = TypeVar("T_Hyperparameters")
class TaskRun(GenericModel, Generic[T_TaskParameters, T_TaskResults, T_Hyperparameters]):
"""
The task run. This is the 'data' we use to optimize the hyperparameters.
"""
task_parameters: T_TaskParameters = Field(..., description="The task parameters.")
hyperparameters: T_Hyperparameters = Field(
...,
description="The hyperparameters used for the task. We optimize these.",
)
all_chat_chains: dict = Field(..., description="The chat chains from the task execution.")
return_value: T_TaskResults | None = Field(
..., description="The results of the task. (None for exceptions/failure.)"
)
exception: list[str] | str | None = Field(..., description="Exception that occurred during the task execution.")
class TaskReflection(BaseModel):
"""
The reflections on the task.
This contains the lessons we learn from each task run to come up with better
hyperparameters to try.
"""
feedback: str = Field(
...,
description=(
"Only look at the final results field. Does its content satisfy the "
"task description and task parameters? Does it contain all the relevant "
"information from the all_chains and all_prompts fields? What could be improved "
"in the results?"
),
)
evaluation: str = Field(
...,
description=(
"The evaluation of the outputs given the task. Is the output satisfying? What is wrong? What is missing?"
),
)
hyperparameter_suggestion: str = Field(
...,
description="How we want to change the hyperparameters to improve the results. What could we try to change?",
)
hyperparameter_missing: str = Field(
...,
description=(
"What hyperparameters are missing to improve the results? What could "
"be changed that is not exposed via hyperparameters?"
),
)
class TaskInfo(GenericModel, Generic[T_TaskParameters, T_TaskResults, T_Hyperparameters]):
"""
The task run and the reflection on the experiment.
"""
task_parameters: T_TaskParameters = Field(..., description="The task parameters.")
hyperparameters: T_Hyperparameters = Field(
...,
description="The hyperparameters used for the task. We optimize these.",
)
reflection: TaskReflection = Field(..., description="The reflection on the task.")
class OptimizationInfo(GenericModel, Generic[T_TaskParameters, T_TaskResults, T_Hyperparameters]):
"""
The optimization information. This is the data we use to optimize the
hyperparameters.
"""
older_task_summary: str | None = Field(
None,
description=(
"A summary of previous experiments and the proposed changes with "
"the goal of avoiding trying the same changes repeatedly."
),
)
task_infos: list[TaskInfo[T_TaskParameters, T_TaskResults, T_Hyperparameters]] = Field(
..., description="The most recent tasks we have run and our reflections on them."
)
best_hyperparameters: T_Hyperparameters = Field(..., description="The best hyperparameters we have found so far.")
class OptimizationStep(GenericModel, Generic[T_TaskParameters, T_TaskResults, T_Hyperparameters]):
"""
The next optimization steps. New hyperparameters we want to try experiments and new
task parameters we want to evaluate on given the previous experiments.
"""
best_hyperparameters: T_Hyperparameters = Field(
...,
description="The best hyperparameters we have found so far given task_infos and history.",
)
suggestion: str = Field(
...,
description=(
"The suggestions for the next experiments. What could we try to "
"change? We will try several tasks next and several sets of hyperparameters. "
"Let's think step by step."
),
)
task_parameters_suggestions: list[T_TaskParameters] = Field(
...,
description="The task parameters we want to try next.",
hint_min_items=1,
hint_max_items=4,
)
hyperparameter_suggestions: list[T_Hyperparameters] = Field(
...,
description="The hyperparameters we want to try next.",
hint_min_items=1,
hint_max_items=2,
)
class ImprovementProbability(BaseModel):
considerations: list[str] = Field(..., description="The considerations for potential improvements.")
probability: float = Field(..., description="The probability of improvement.")
class LLMOptimizer:
@llm_explicit_function
@staticmethod
def reflect_on_task_run(
language_model,
task_run: TaskRun[T_TaskParameters, T_TaskResults, T_Hyperparameters],
) -> TaskReflection:
"""
Reflect on the results given the task parameters and hyperparameters.
This contains the lessons we learn from each task run to come up with better
hyperparameters to try.
"""
raise NotImplementedError()
@llm_explicit_function
@staticmethod
def summarize_optimization_info(
language_model,
optimization_info: OptimizationInfo[T_TaskParameters, T_TaskResults, T_Hyperparameters],
) -> str:
"""
Summarize the optimization info. We want to preserve all relevant knowledge for
improving the hyperparameters in the future. All information from previous
experiments will be forgotten except for what this summary.
"""
raise NotImplementedError()
@llm_explicit_function
@staticmethod
def suggest_next_optimization_step(
language_model,
optimization_info: OptimizationInfo[T_TaskParameters, T_TaskResults, T_Hyperparameters],
) -> OptimizationStep[T_TaskParameters, T_TaskResults, T_Hyperparameters]:
"""
Suggest the next optimization step.
"""
raise NotImplementedError()
@llm_explicit_function
@staticmethod
def probability_for_improvement(
language_model,
optimization_info: OptimizationInfo[T_TaskParameters, T_TaskResults, T_Hyperparameters],
) -> ImprovementProbability:
"""
Return the probability for improvement (between 0 and 1).
This is your confidence that your next optimization steps will improve the
hyperparameters given the information provided. If you think that the
information available is unlikely to lead to better hyperparameters, return 0.
If you think that the information available is very likely to lead to better
hyperparameters, return 1. Be concise.
"""
raise NotImplementedError()
from dataclasses import dataclass
from llm_strategy import llm_strategy
from langchain.llms import OpenAI
@llm_strategy(OpenAI(max_tokens=256))
@dataclass
class Customer:
key: str
first_name: str
last_name: str
birthdate: str
address: str
@property
def age(self) -> int:
"""Return the current age of the customer.
This is a computed property based on `birthdate` and the current year (2022).
"""
raise NotImplementedError()
@dataclass
class CustomerDatabase:
customers: list[Customer]
def find_customer_key(self, query: str) -> list[str]:
"""Find the keys of the customers that match a natural language query best (sorted by closeness to the match).
We support semantic queries instead of SQL, so we can search for things like
"the customer that was born in 1990".
Args:
query: Natural language query
Returns:
The index of the best matching customer in the database.
"""
raise NotImplementedError()
def load(self):
"""Load the customer database from a file."""
raise NotImplementedError()
def store(self):
"""Store the customer database to a file."""
raise NotImplementedError()
@llm_strategy(OpenAI(max_tokens=1024))
@dataclass
class MockCustomerDatabase(CustomerDatabase):
def load(self):
self.customers = self.create_mock_customers(10)
def store(self):
pass
@staticmethod
def create_mock_customers(num_customers: int = 1) -> list[Customer]:
"""
Create mock customers with believable data (our customers are world citizens).
"""
raise NotImplementedError()
See examples/mock_app/customer_database_search.py for a full example.
Clone the repository first. Then, install the environment and the pre-commit hooks with
make install
The CI/CD pipeline will be triggered when you open a pull request, merge to main, or when you create a new release.
To finalize the set-up for publishing to PyPi or Artifactory, see here. For activating the automatic documentation with MkDocs, see here. To enable the code coverage reports, see here.
- Create an API Token on Pypi.
- Add the API Token to your projects secrets with the name
PYPI_TOKEN
by visiting this page. - Create a new release on Github.
Create a new tag in the form
*.*.*
.
For more details, see here.
Repository initiated with fpgmaas/cookiecutter-poetry.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm-strategy
Similar Open Source Tools
llm-strategy
The 'llm-strategy' repository implements the Strategy Pattern using Large Language Models (LLMs) like OpenAI’s GPT-3. It provides a decorator 'llm_strategy' that connects to an LLM to implement abstract methods in interface classes. The package uses doc strings, type annotations, and method/function names as prompts for the LLM and can convert the responses back to Python data. It aims to automate the parsing of structured data by using LLMs, potentially reducing the need for manual Python code in the future.
kor
Kor is a prototype tool designed to help users extract structured data from text using Language Models (LLMs). It generates prompts, sends them to specified LLMs, and parses the output. The tool works with the parsing approach and is integrated with the LangChain framework. Kor is compatible with pydantic v2 and v1, and schema is typed checked using pydantic. It is primarily used for extracting information from text based on provided reference examples and schema documentation. Kor is designed to work with all good-enough LLMs regardless of their support for function/tool calling or JSON modes.
xFinder
xFinder is a model specifically designed for key answer extraction from large language models (LLMs). It addresses the challenges of unreliable evaluation methods by optimizing the key answer extraction module. The model achieves high accuracy and robustness compared to existing frameworks, enhancing the reliability of LLM evaluation. It includes a specialized dataset, the Key Answer Finder (KAF) dataset, for effective training and evaluation. xFinder is suitable for researchers and developers working with LLMs to improve answer extraction accuracy.
instructor-js
Instructor is a Typescript library for structured extraction in Typescript, powered by llms, designed for simplicity, transparency, and control. It stands out for its simplicity, transparency, and user-centric design. Whether you're a seasoned developer or just starting out, you'll find Instructor's approach intuitive and steerable.
empower-functions
Empower Functions is a family of large language models (LLMs) that provide GPT-4 level capabilities for real-world 'tool using' use cases. These models offer compatibility support to be used as drop-in replacements, enabling interactions with external APIs by recognizing when a function needs to be called and generating JSON containing necessary arguments based on user inputs. This capability is crucial for building conversational agents and applications that convert natural language into API calls, facilitating tasks such as weather inquiries, data extraction, and interactions with knowledge bases. The models can handle multi-turn conversations, choose between tools or standard dialogue, ask for clarification on missing parameters, integrate responses with tool outputs in a streaming fashion, and efficiently execute multiple functions either in parallel or sequentially with dependencies.
azure-functions-openai-extension
Azure Functions OpenAI Extension is a project that adds support for OpenAI LLM (GPT-3.5-turbo, GPT-4) bindings in Azure Functions. It provides NuGet packages for various functionalities like text completions, chat completions, assistants, embeddings generators, and semantic search. The project requires .NET 6 SDK or greater, Azure Functions Core Tools v4.x, and specific settings in Azure Function or local settings for development. It offers features like text completions, chat completion, assistants with custom skills, embeddings generators for text relatedness, and semantic search using vector databases. The project also includes examples in C# and Python for different functionalities.
local-talking-llm
The 'local-talking-llm' repository provides a tutorial on building a voice assistant similar to Jarvis or Friday from Iron Man movies, capable of offline operation on a computer. The tutorial covers setting up a Python environment, installing necessary libraries like rich, openai-whisper, suno-bark, langchain, sounddevice, pyaudio, and speechrecognition. It utilizes Ollama for Large Language Model (LLM) serving and includes components for speech recognition, conversational chain, and speech synthesis. The implementation involves creating a TextToSpeechService class for Bark, defining functions for audio recording, transcription, LLM response generation, and audio playback. The main application loop guides users through interactive voice-based conversations with the assistant.
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
AI
AI is an open-source Swift framework for interfacing with generative AI. It provides functionalities for text completions, image-to-text vision, function calling, DALLE-3 image generation, audio transcription and generation, and text embeddings. The framework supports multiple AI models from providers like OpenAI, Anthropic, Mistral, Groq, and ElevenLabs. Users can easily integrate AI capabilities into their Swift projects using AI framework.
Tools4AI
Tools4AI is a Java-based Agentic Framework for building AI agents to integrate with enterprise Java applications. It enables the conversion of natural language prompts into actionable behaviors, streamlining user interactions with complex systems. By leveraging AI capabilities, it enhances productivity and innovation across diverse applications. The framework allows for seamless integration of AI with various systems, such as customer service applications, to interpret user requests, trigger actions, and streamline workflows. Prompt prediction anticipates user actions based on input prompts, enhancing user experience by proactively suggesting relevant actions or services based on context.
LongBench
LongBench v2 is a benchmark designed to assess the ability of large language models (LLMs) to handle long-context problems requiring deep understanding and reasoning across various real-world multitasks. It consists of 503 challenging multiple-choice questions with contexts ranging from 8k to 2M words, covering six major task categories. The dataset is collected from nearly 100 highly educated individuals with diverse professional backgrounds and is designed to be challenging even for human experts. The evaluation results highlight the importance of enhanced reasoning ability and scaling inference-time compute to tackle the long-context challenges in LongBench v2.
ragtacts
Ragtacts is a Clojure library that allows users to easily interact with Large Language Models (LLMs) such as OpenAI's GPT-4. Users can ask questions to LLMs, create question templates, call Clojure functions in natural language, and utilize vector databases for more accurate answers. Ragtacts also supports RAG (Retrieval-Augmented Generation) method for enhancing LLM output by incorporating external data. Users can use Ragtacts as a CLI tool, API server, or through a RAG Playground for interactive querying.
bosquet
Bosquet is a tool designed for LLMOps in large language model-based applications. It simplifies building AI applications by managing LLM and tool services, integrating with Selmer templating library for prompt templating, enabling prompt chaining and composition with Pathom graph processing, defining agents and tools for external API interactions, handling LLM memory, and providing features like call response caching. The tool aims to streamline the development process for AI applications that require complex prompt templates, memory management, and interaction with external systems.
superpipe
Superpipe is a lightweight framework designed for building, evaluating, and optimizing data transformation and data extraction pipelines using LLMs. It allows users to easily combine their favorite LLM libraries with Superpipe's building blocks to create pipelines tailored to their unique data and use cases. The tool facilitates rapid prototyping, evaluation, and optimization of end-to-end pipelines for tasks such as classification and evaluation of job departments based on work history. Superpipe also provides functionalities for evaluating pipeline performance, optimizing parameters for cost, accuracy, and speed, and conducting grid searches to experiment with different models and prompts.
summary-of-a-haystack
This repository contains data and code for the experiments in the SummHay paper. It includes publicly released Haystacks in conversational and news domains, along with scripts for running the pipeline, visualizing results, and benchmarking automatic evaluation. The data structure includes topics, subtopics, insights, queries, retrievers, summaries, evaluation summaries, and documents. The pipeline involves scripts for retriever scores, summaries, and evaluation scores using GPT-4o. Visualization scripts are provided for compiling and visualizing results. The repository also includes annotated samples for benchmarking and citation information for the SummHay paper.
For similar tasks
deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.
clearml
ClearML is a suite of tools designed to streamline the machine learning workflow. It includes an experiment manager, MLOps/LLMOps, data management, and model serving capabilities. ClearML is open-source and offers a free tier hosting option. It supports various ML/DL frameworks and integrates with Jupyter Notebook and PyCharm. ClearML provides extensive logging capabilities, including source control info, execution environment, hyper-parameters, and experiment outputs. It also offers automation features, such as remote job execution and pipeline creation. ClearML is designed to be easy to integrate, requiring only two lines of code to add to existing scripts. It aims to improve collaboration, visibility, and data transparency within ML teams.
llm-strategy
The 'llm-strategy' repository implements the Strategy Pattern using Large Language Models (LLMs) like OpenAI’s GPT-3. It provides a decorator 'llm_strategy' that connects to an LLM to implement abstract methods in interface classes. The package uses doc strings, type annotations, and method/function names as prompts for the LLM and can convert the responses back to Python data. It aims to automate the parsing of structured data by using LLMs, potentially reducing the need for manual Python code in the future.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.