
promptic
90% of what you need for LLM app development. Nothing you don't.
Stars: 223

Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.
README:
Promptic aims to be the "requests" of LLM development -- the most productive and pythonic way to build LLM applications. It leverages LiteLLM, so you're never locked in to an LLM provider and can switch to the latest and greatest with a single line of code. Promptic gets out of your way so you can focus entirely on building features.
“Perfection is attained, not when there is nothing more to add, but when there is nothing more to take away.”
- 🎯 Type-safe structured outputs with Pydantic
- 🤖 Easy-to-build agents with function calling
- 🔄 Streaming support for real-time responses
- 📚 Automatic prompt caching for supported models
- 💾 Built-in conversation memory
pip install promptic
Functions decorated with @llm
use its docstring as a prompt template. When the function is called, promptic combines the docstring with the function's arguments to generate the prompt and returns the LLM's response.
# examples/basic.py
from promptic import llm
@llm
def translate(text, language="Chinese"):
"""Translate '{text}' to {language}"""
print(translate("Hello world!"))
# 您好,世界!
print(translate("Hello world!", language="Spanish"))
# ¡Hola, mundo!
@llm(
model="claude-3-haiku-20240307",
system="You are a customer service analyst. Provide clear sentiment analysis with key points.",
)
def analyze_sentiment(text):
"""Analyze the sentiment of this customer feedback: {text}"""
print(analyze_sentiment("The product was okay but shipping took forever"))
# Sentiment: Mixed/Negative
# Key points:
# - Neutral product satisfaction
# - Significant dissatisfaction with shipping time
You can use Pydantic models to ensure the LLM returns data in exactly the structure you expect. Simply define a Pydantic model and use it as the return type annotation on your decorated function. The LLM's response will be automatically validated against your model schema and returned as a Pydantic object.
# examples/structured.py
from pydantic import BaseModel
from promptic import llm
class Forecast(BaseModel):
location: str
temperature: float
units: str
@llm
def get_weather(location, units: str = "fahrenheit") -> Forecast:
"""What's the weather for {location} in {units}?"""
print(get_weather("San Francisco", units="celsius"))
# location='San Francisco' temperature=16.0 units='Celsius'
Alternatively, you can use JSON Schema dictionaries for more low-level validation:
# examples/json_schema.py
from promptic import llm
schema = {
"type": "object",
"properties": {
"name": {
"type": "string",
"pattern": "^[A-Z][a-z]+$",
"minLength": 2,
"maxLength": 20,
},
"age": {"type": "integer", "minimum": 0, "maximum": 120},
"email": {"type": "string", "format": "email"},
},
"required": ["name", "age"],
"additionalProperties": False,
}
@llm(json_schema=schema, system="You generate test data.")
def get_user_info(name: str) -> dict:
"""Get information about {name}"""
print(get_user_info("Alice"))
# {'name': 'Alice', 'age': 25, 'email': '[email protected]'}
Functions decorated with @llm.tool
become tools that the LLM can invoke to perform actions or retrieve information. The LLM will automatically execute the appropriate tool calls, creating a seamless agent interaction.
# examples/book_meeting.py
from datetime import datetime
from promptic import llm
@llm(model="gpt-4o")
def scheduler(command):
"""{command}"""
@scheduler.tool
def get_current_time():
"""Get the current time"""
print("getting current time")
return datetime.now().strftime("%I:%M %p")
@scheduler.tool
def add_reminder(task: str, time: str):
"""Add a reminder for a specific task and time"""
print(f"adding reminder: {task} at {time}")
return f"Reminder set: {task} at {time}"
@scheduler.tool
def check_calendar(date: str):
"""Check calendar for a specific date"""
print(f"checking calendar for {date}")
return f"Calendar checked for {date}: No conflicts found"
cmd = """
What time is it?
Also, can you check my calendar for tomorrow
and set a reminder for a team meeting at 2pm?
"""
print(scheduler(cmd))
# getting current time
# checking calendar for 2023-10-05
# adding reminder: Team meeting at 2023-10-05T14:00:00
# The current time is 3:48 PM. I checked your calendar for tomorrow, and there are no conflicts. I've also set a reminder for your team meeting at 2 PM tomorrow.
The streaming feature allows real-time response generation, useful for long-form content or interactive applications:
# examples/streaming.py
from promptic import llm
@llm(stream=True)
def write_poem(topic):
"""Write a haiku about {topic}."""
print("".join(write_poem("artificial intelligence")))
# Binary thoughts hum,
# Electron minds awake, learn,
# Future thinking now.
Dry runs allow you to see which tools will be called and their arguments without invoking the decorated tool functions. You can also enable debug mode for more detailed logging.
# examples/error_handing.py
from promptic import llm
@llm(
system="you are a posh smart home assistant named Jarvis",
dry_run=True,
debug=True,
)
def jarvis(command):
"""{command}"""
@jarvis.tool
def turn_light_on():
"""turn light on"""
return True
@jarvis.tool
def get_current_weather(location: str, unit: str = "fahrenheit"):
"""Get the current weather in a given location"""
return f"The weather in {location} is 45 degrees {unit}"
print(jarvis("Please turn the light on and check the weather in San Francisco"))
# ...
# [DRY RUN]: function_name = 'turn_light_on' function_args = {}
# [DRY RUN]: function_name = 'get_current_weather' function_args = {'location': 'San Francisco'}
# ...
promptic
pairs perfectly with tenacity for handling rate limits, temporary API failures, and more.
# examples/resiliency.py
from tenacity import retry, wait_exponential, retry_if_exception_type
from promptic import llm
from litellm.exceptions import RateLimitError
@retry(
wait=wait_exponential(multiplier=1, min=4, max=10),
retry=retry_if_exception_type(RateLimitError),
)
@llm
def generate_summary(text):
"""Summarize this text in 2-3 sentences: {text}"""
generate_summary("Long article text here...")
By default, each function call is independent and stateless. Setting memory=True
enables built-in conversation memory, allowing the LLM to maintain context across multiple interactions. Here's a practical example using Gradio to create a web-based chatbot interface:
# examples/memory.py
import gradio as gr
from promptic import llm
@llm(memory=True, stream=True)
def assistant(message):
"""{message}"""
def predict(message, history):
partial_message = ""
for chunk in assistant(message):
partial_message += str(chunk)
yield partial_message
with gr.ChatInterface(title="Promptic Chatbot Demo", fn=predict) as demo:
# ensure clearing the chat window clears the chat history
demo.chatbot.clear(assistant.clear)
# demo.launch()
Note, calling a decorated function will always execute the prompt template. For more direct control over conversations, you can use the .message()
method to send follow-up messages without re-executing the prompt template:
# examples/direct_messaging.py
from promptic import llm
@llm(
system="You are a knowledgeable history teacher.",
model="gpt-4o-mini",
memory=True,
stream=True,
)
def history_chat(era: str, region: str):
"""Tell me a fun fact about the history of {region} during the {era} period."""
response = history_chat("medieval", "Japan")
for chunk in response:
print(chunk, end="")
for chunk in history_chat.message(
"In one sentence, was the most popular person there at the time?"
):
print(chunk, end="")
for chunk in history_chat.message("In one sentence, who was their main rival?"):
print(chunk, end="")
The .message()
method is particularly useful when:
- You have a decorated function with parameters but want to ask follow-up questions
- You want to maintain conversation context without re-executing the prompt template
- You need more direct control over the conversation flow while keeping memory intact
For custom storage solutions, you can extend the State
class to implement persistence in any database or storage system:
# examples/state.py
import json
from promptic import State, llm
class RedisState(State):
def __init__(self, redis_client):
super().__init__()
self.redis = redis_client
self.key = "chat_history"
def add_message(self, message):
self.redis.rpush(self.key, json.dumps(message))
def get_messages(self, limit=None):
messages = self.redis.lrange(self.key, 0, -1)
return [json.loads(m) for m in messages][-limit:] if limit else messages
def clear(self):
self.redis.delete(self.key)
@llm(state=RedisState(redis_client))
def persistent_chat(message):
"""Chat: {message}"""
For Anthropic models (Claude), promptic provides intelligent caching control to optimize context window usage and improve performance. By default, caching is enabled but can be disabled if needed. OpenAI models cache by default. Anthropic charges for cache writes, but tokens that are read from the cache are less expensive.
# examples/caching.py
from promptic import llm
# imagine these are long legal documents
legal_document, another_legal_document = (
"a legal document about Sam",
"a legal document about Jane",
)
system_prompts = [
"You are a helpful legal assistant",
"You provide detailed responses based on the provided context",
f"legal document 1: '{legal_document}'",
f"legal document 2: '{another_legal_document}'",
]
@llm(
system=system_prompts,
cache=True, # this is the default
)
def legal_chat(message):
"""{message}"""
print(legal_chat("which legal document is about Sam?"))
# The legal document about Sam is "legal document 1."
When caching is enabled:
- Long messages (>1KB) are automatically marked as ephemeral to optimize context window usage
- A maximum of 4 message blocks can be cached at once
- System prompts can include explicit cache control
Further reading:
- https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#cache-limitations
- https://docs.litellm.ai/docs/completion/prompt_caching
Authentication can be handled in three ways:
- Directly via the
api_key
parameter:
from promptic import llm
@llm(model="gpt-4o-mini", api_key="your-api-key-here")
def my_function(text):
"""Process this text: {text}"""
- Through environment variables (recommended):
# OpenAI
export OPENAI_API_KEY=sk-...
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# Google
export GEMINI_API_KEY=...
# Azure OpenAI
export AZURE_API_KEY=...
export AZURE_API_BASE=...
export AZURE_API_VERSION=...
- By setting the API key programmatically via litellm:
from litellm import litellm
litellm.api_key = "your-api-key-here"
The supported environment variables correspond to the model provider:
Provider | Environment Variable | Model Examples |
---|---|---|
OpenAI | OPENAI_API_KEY |
gpt-4o, gpt-3.5-turbo |
Anthropic | ANTHROPIC_API_KEY |
claude-3-haiku-20240307, claude-3-opus-20240229, claude-3-sonnet-20240229 |
GEMINI_API_KEY |
gemini/gemini-1.5-pro-latest |
The main decorator for creating LLM-powered functions. Can be used as @llm
or @llm()
with parameters.
-
model
(str, optional): The LLM model to use. Defaults to "gpt-4o-mini". -
system
(str | list[str] | list[dict], optional): System prompt(s) to set context for the LLM. Can be:- A single string:
system="You are a helpful assistant"
- A list of strings:
system=["You are a helpful assistant", "You speak formally"]
- A list of message dictionaries:
system=[ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Please be concise", "cache_control": {"type": "ephemeral"}}, {"role": "assistant", "content": "I will be concise"} ]
- A single string:
-
dry_run
(bool, optional): If True, simulates tool calls without executing them. Defaults to False. -
debug
(bool, optional): If True, enables detailed logging. Defaults to False. -
memory
(bool, optional): If True, enables conversation memory using the default State implementation. Defaults to False. -
state
(State, optional): Custom State implementation for memory management. Overrides thememory
parameter. -
json_schema
(dict, optional): JSON Schema dictionary for validating LLM outputs. Alternative to using Pydantic models. -
cache
(bool, optional): If True, enables prompt caching. Defaults to True. -
**litellm_kwargs
: Additional arguments passed directly to litellm.completion.
-
tool(fn)
: Decorator method to register a function as a tool that can be called by the LLM. -
clear()
: Clear all stored messages from memory. Raises ValueError if memory/state is not enabled.
Base class for managing conversation memory and state. Can be extended to implement custom storage solutions.
-
add_message(message: dict)
: Add a message to the conversation history. -
get_messages(prompt: str = None, limit: int = None) -> List[dict]
: Retrieve conversation history, optionally limited to the most recent messages and filtered by a prompt. -
clear()
: Clear all stored messages.
# examples/api_ref.py
from pydantic import BaseModel
from promptic import llm
class Story(BaseModel):
title: str
content: str
style: str
word_count: int
@llm(
model="gpt-4o-mini",
system="You are a creative writing assistant",
memory=True,
temperature=0.7,
max_tokens=800,
cache=False,
)
def story_assistant(command: str) -> Story:
"""Process this writing request: {command}"""
@story_assistant.tool
def get_writing_style():
"""Get the current writing style preference"""
return "whimsical and light-hearted"
@story_assistant.tool
def count_words(text: str) -> int:
"""Count words in the provided text"""
return len(text.split())
story = story_assistant("Write a short story about a magical library")
print(f"Title: {story.title}")
print(f"Style: {story.style}")
print(f"Words: {story.word_count}")
print(story.content)
print(
story_assistant("Write another story with the same style but about a time traveler")
)
promptic
is a lightweight abstraction layer over litellm and its various LLM providers. As such, there are some provider-specific limitations that are beyond the scope of what the library addresses:
-
Streaming:
- Gemini models do not support streaming when using tools/function calls
These limitations reflect the underlying differences between LLM providers and their implementations. For provider-specific features or workarounds, you may need to interact with litellm or the provider's SDK directly.
See CONTRIBUTING.md
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for promptic
Similar Open Source Tools

promptic
Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.

aiavatarkit
AIAvatarKit is a tool for building AI-based conversational avatars quickly. It supports various platforms like VRChat and cluster, along with real-world devices. The tool is extensible, allowing unlimited capabilities based on user needs. It requires VOICEVOX API, Google or Azure Speech Services API keys, and Python 3.10. Users can start conversations out of the box and enjoy seamless interactions with the avatars.

deep-searcher
DeepSearcher is a tool that combines reasoning LLMs and Vector Databases to perform search, evaluation, and reasoning based on private data. It is suitable for enterprise knowledge management, intelligent Q&A systems, and information retrieval scenarios. The tool maximizes the utilization of enterprise internal data while ensuring data security, supports multiple embedding models, and provides support for multiple LLMs for intelligent Q&A and content generation. It also includes features like private data search, vector database management, and document loading with web crawling capabilities under development.

ruby-openai
Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct

pipecat-flows
Pipecat Flows is a framework designed for building structured conversations in AI applications. It allows users to create both predefined conversation paths and dynamically generated flows, handling state management and LLM interactions. The framework includes a Python module for building conversation flows and a visual editor for designing and exporting flow configurations. Pipecat Flows is suitable for scenarios such as customer service scripts, intake forms, personalized experiences, and complex decision trees.

redis-vl-python
The Python Redis Vector Library (RedisVL) is a tailor-made client for AI applications leveraging Redis. It enhances applications with Redis' speed, flexibility, and reliability, incorporating capabilities like vector-based semantic search, full-text search, and geo-spatial search. The library bridges the gap between the emerging AI-native developer ecosystem and the capabilities of Redis by providing a lightweight, elegant, and intuitive interface. It abstracts the features of Redis into a grammar that is more aligned to the needs of today's AI/ML Engineers or Data Scientists.

llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.

langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.

WebRL
WebRL is a self-evolving online curriculum learning framework designed for training web agents in the WebArena environment. It provides model checkpoints, training instructions, and evaluation processes for training the actor and critic models. The tool enables users to generate new instructions and interact with WebArena to configure tasks for training and evaluation.

hf-waitress
HF-Waitress is a powerful server application for deploying and interacting with HuggingFace Transformer models. It simplifies running open-source Large Language Models (LLMs) locally on-device, providing on-the-fly quantization via BitsAndBytes, HQQ, and Quanto. It requires no manual model downloads, offers concurrency, streaming responses, and supports various hardware and platforms. The server uses a `config.json` file for easy configuration management and provides detailed error handling and logging.

llm-sandbox
LLM Sandbox is a lightweight and portable sandbox environment designed to securely execute large language model (LLM) generated code in a safe and isolated manner using Docker containers. It provides an easy-to-use interface for setting up, managing, and executing code in a controlled Docker environment, simplifying the process of running code generated by LLMs. The tool supports multiple programming languages, offers flexibility with predefined Docker images or custom Dockerfiles, and allows scalability with support for Kubernetes and remote Docker hosts.

npi
NPi is an open-source platform providing Tool-use APIs to empower AI agents with the ability to take action in the virtual world. It is currently under active development, and the APIs are subject to change in future releases. NPi offers a command line tool for installation and setup, along with a GitHub app for easy access to repositories. The platform also includes a Python SDK and examples like Calendar Negotiator and Twitter Crawler. Join the NPi community on Discord to contribute to the development and explore the roadmap for future enhancements.

bot-on-anything
The 'bot-on-anything' repository allows developers to integrate various AI models into messaging applications, enabling the creation of intelligent chatbots. By configuring the connections between models and applications, developers can easily switch between multiple channels within a project. The architecture is highly scalable, allowing the reuse of algorithmic capabilities for each new application and model integration. Supported models include ChatGPT, GPT-3.0, New Bing, and Google Bard, while supported applications range from terminals and web platforms to messaging apps like WeChat, Telegram, QQ, and more. The repository provides detailed instructions for setting up the environment, configuring the models and channels, and running the chatbot for various tasks across different messaging platforms.

VectorETL
VectorETL is a lightweight ETL framework designed to assist Data & AI engineers in processing data for AI applications quickly. It streamlines the conversion of diverse data sources into vector embeddings and storage in various vector databases. The framework supports multiple data sources, embedding models, and vector database targets, simplifying the creation and management of vector search systems for semantic search, recommendation systems, and other vector-based operations.

json-repair
JSON Repair is a toolkit designed to address JSON anomalies that can arise from Large Language Models (LLMs). It offers a comprehensive solution for repairing JSON strings, ensuring accuracy and reliability in your data processing. With its user-friendly interface and extensive capabilities, JSON Repair empowers developers to seamlessly integrate JSON repair into their workflows.
For similar tasks

nlp-llms-resources
The 'nlp-llms-resources' repository is a comprehensive resource list for Natural Language Processing (NLP) and Large Language Models (LLMs). It covers a wide range of topics including traditional NLP datasets, data acquisition, libraries for NLP, neural networks, sentiment analysis, optical character recognition, information extraction, semantics, topic modeling, multilingual NLP, domain-specific LLMs, vector databases, ethics, costing, books, courses, surveys, aggregators, newsletters, papers, conferences, and societies. The repository provides valuable information and resources for individuals interested in NLP and LLMs.

adata
AData is a free and open-source A-share database that focuses on transaction-related data. It provides comprehensive data on stocks, including basic information, market data, and sentiment analysis. AData is designed to be easy to use and integrate with other applications, making it a valuable tool for quantitative trading and AI training.

PIXIU
PIXIU is a project designed to support the development, fine-tuning, and evaluation of Large Language Models (LLMs) in the financial domain. It includes components like FinBen, a Financial Language Understanding and Prediction Evaluation Benchmark, FIT, a Financial Instruction Dataset, and FinMA, a Financial Large Language Model. The project provides open resources, multi-task and multi-modal financial data, and diverse financial tasks for training and evaluation. It aims to encourage open research and transparency in the financial NLP field.

hezar
Hezar is an all-in-one AI library designed specifically for the Persian community. It brings together various AI models and tools, making it easy to use AI with just a few lines of code. The library seamlessly integrates with Hugging Face Hub, offering a developer-friendly interface and task-based model interface. In addition to models, Hezar provides tools like word embeddings, tokenizers, feature extractors, and more. It also includes supplementary ML tools for deployment, benchmarking, and optimization.

text-embeddings-inference
Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for popular models like FlagEmbedding, Ember, GTE, and E5. It implements features such as no model graph compilation step, Metal support for local execution on Macs, small docker images with fast boot times, token-based dynamic batching, optimized transformers code for inference using Flash Attention, Candle, and cuBLASLt, Safetensors weight loading, and production-ready features like distributed tracing with Open Telemetry and Prometheus metrics.

CodeProject.AI-Server
CodeProject.AI Server is a standalone, self-hosted, fast, free, and open-source Artificial Intelligence microserver designed for any platform and language. It can be installed locally without the need for off-device or out-of-network data transfer, providing an easy-to-use solution for developers interested in AI programming. The server includes a HTTP REST API server, backend analysis services, and the source code, enabling users to perform various AI tasks locally without relying on external services or cloud computing. Current capabilities include object detection, face detection, scene recognition, sentiment analysis, and more, with ongoing feature expansions planned. The project aims to promote AI development, simplify AI implementation, focus on core use-cases, and leverage the expertise of the developer community.

spark-nlp
Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant, and accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 36000+ pretrained pipelines and models in more than 200+ languages. It offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation, Summarization, Question Answering, Table Question Answering, Text Generation, Image Classification, Image to Text (captioning), Automatic Speech Recognition, Zero-Shot Learning, and many more NLP tasks. Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, CamemBERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, DeBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Llama-2, M2M100, BART, Instructor, E5, Google T5, MarianMT, OpenAI GPT2, Vision Transformers (ViT), OpenAI Whisper, and many more not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.

scikit-llm
Scikit-LLM is a tool that seamlessly integrates powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. It allows users to leverage large language models for various text analysis applications within the familiar scikit-learn framework. The tool simplifies the process of incorporating advanced language processing capabilities into machine learning pipelines, enabling users to benefit from the latest advancements in natural language processing.
For similar jobs

promptic
Promptic is a tool designed for LLM app development, providing a productive and pythonic way to build LLM applications. It leverages LiteLLM, allowing flexibility to switch LLM providers easily. Promptic focuses on building features by providing type-safe structured outputs, easy-to-build agents, streaming support, automatic prompt caching, and built-in conversation memory.

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

executorch
ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices. Key value propositions of ExecuTorch are: * **Portability:** Compatibility with a wide variety of computing platforms, from high-end mobile phones to highly constrained embedded systems and microcontrollers. * **Productivity:** Enabling developers to use the same toolchains and SDK from PyTorch model authoring and conversion, to debugging and deployment to a wide variety of platforms. * **Performance:** Providing end users with a seamless and high-performance experience due to a lightweight runtime and utilizing full hardware capabilities such as CPUs, NPUs, and DSPs.

autogen
AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.