litserve
Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.
Stars: 53
LitServe is a high-throughput serving engine for deploying AI models at scale. It generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs, and more. Built for enterprise scale, it supports every framework like PyTorch, JAX, Tensorflow, and more. LitServe is designed to let users focus on model performance, not the serving boilerplate. It is like PyTorch Lightning for model serving but with broader framework support and scalability.
README:
High-throughput serving engine for AI models
✅ Batching ✅ Streaming ✅ Auto-GPU, multi-GPU ✅ PyTorch/JAX/TF ✅ Full control ✅ Auth
LitServe is a high-throughput serving engine for deploying AI models at scale. LitServe generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs and more.
Why we wrote LitServe:
- Work with any model: LLMs, vision, time-series, etc...
- We wanted a zero abstraction, minimal, hackable code-base without bloat.
- Built for enterprise scale (not demos, etc...).
- Easy enough for researchers, scalable and hackable for engineers.
- Work on any hardware (GPU/TPU) automatically.
- Let you focus on model performance, not the serving boilerplate.
Think of LitServe as PyTorch Lightning for model serving (if you're familiar with Lightning) but supports every framework like PyTorch, JAX, Tensorflow and more.
Explore various examples that show different models deployed with LitServe:
Example | description | Run |
---|---|---|
Hello world | Hello world model | |
ANY Hugging face model | (Text) Deploy any Hugging Face model | |
Hugging face BERT model | (Text) Deploy model for tasks like text generation and more | |
Open AI CLIP | (Multimodal) Deploy Open AI CLIP for tasks like image understanding | |
Open AI Whisper | (Audio) Deploy Open AI Whisper for tasks like speech to text | |
Stable diffusion 2 | (Vision) Deploy Stable diffusion 2 for tasks like image generation |
Install LitServe via pip:
pip install litserve
Advanced install options
Install the main branch:
pip install git+https://github.com/Lightning-AI/litserve.git@main
Install from source:
git clone https://github.com/Lightning-AI/litserve
cd litserve
pip install -e '.[all]'
LitServe is an inference server for AI/ML models that is minimal and highly scalable.
It has 2 simple, minimal APIs - LitAPI and LitServer.
Here's a hello world example:
# server.py
import litserve as ls
# STEP 1: DEFINE YOUR MODEL API
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# Setup the model so it can be called in `predict`.
self.model = lambda x: x**2
def decode_request(self, request):
# Convert the request payload to your model input.
return request["input"]
def predict(self, x):
# Run the model on the input and return the output.
return self.model(x)
def encode_response(self, output):
# Convert the model output to a response payload.
return {"output": output}
# STEP 2: START THE SERVER
if __name__ == "__main__":
api = SimpleLitAPI()
server = ls.LitServer(api, accelerator="auto")
server.run(port=8000)
Now run the server via the command-line
python server.py
LitServe automatically generates a client when it starts. Use this client to test the server:
python client.py
Or ping the server yourself directly
import requests
response = requests.post("http://127.0.0.1:8000/predict", json={"input": 4.0})
The server expects the client to send a POST
to the /predict
URL with a JSON payload.
The way the payload is structured is up to the implementation of the LitAPI
subclass.
LitServe supports multiple advanced state-of-the-art features.
Feature | description |
---|---|
Accelerators | CPU, GPU, Multi-GPU, mps |
Auto-GPU | Detects and auto-runs on all GPUs on a machine |
Model types | LLMs, Vision, Time series, any model type... |
ML frameworks | PyTorch, Jax, Tensorflow, numpy, etc... |
Batching | ✅ |
API authentication | ✅ |
Full request/response control | ✅ |
Automatic schema validation | ✅ |
Handle timeouts | ✅ |
Handle disconnects | ✅ |
Streaming | ✅ |
[!NOTE] Our goal is not to jump on every hype train, but instead support features that scale under the most demanding enterprise deployments.
Explore each feature in detail:
Use accelerators automatically (GPUs, CPU, mps)
LitServe automatically detects GPUs on a machine and uses them when available:
import litserve as ls
# Automatically selects the available accelerator
api = SimpleLitAPI() # defined by you with ls.LitAPI
# when running on GPUs these are equivalent. It's best to let Lightning decide by not specifying it!
server = ls.LitServer(api)
server = ls.LitServer(api, accelerator="cuda")
server = ls.LitServer(api, accelerator="auto")
LitServer
accepts an accelerator
argument which defaults to "auto"
. It can also be explicitly set to "cpu"
, "cuda"
, or
"mps"
if you wish to manually control the device placement.
The following example shows how to set the accelerator manually:
import litserve as ls
# Run on CUDA-supported GPUs
server = ls.LitServer(SimpleLitAPI(), accelerator="cuda")
# Run on Apple's Metal-powered GPUs
server = ls.LitServer(SimpleLitAPI(), accelerator="mps")
Serve on multi-GPUs
LitServer
has the ability to coordinate serving from multiple GPUs.
LitServer
accepts a devices
argument which defaults to "auto"
. On multi-GPU machines, LitServe
will run a copy of the model on each device detected on the machine.
The devices
argument can also be explicitly set to the desired number of devices to use on the machine.
import litserve as ls
# Automatically selects the available accelerators
api = SimpleLitAPI() # defined by you with ls.LitAPI
# when running on a 4-GPUs machine these are equivalent.
# It's best to let Lightning decide by not specifying accelerator and devices!
server = ls.LitServer(api)
server = ls.LitServer(api, accelerator="cuda", devices=4)
server = ls.LitServer(api, accelerator="auto", devices="auto")
For example, running the API server on a 4-GPU machine, with a PyTorch model served on each GPU:
import torch, torch.nn as nn
import litserve as ls
class Linear(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.linear.weight.data.fill_(2.0)
self.linear.bias.data.fill_(1.0)
def forward(self, x):
return self.linear(x)
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
# move the model to the correct device
# keep track of the device for moving data accordingly
self.model = Linear().to(device)
self.device = device
def decode_request(self, request):
# get the input and create a 1D tensor on the correct device
content = request["input"]
return torch.tensor([content], device=self.device)
def predict(self, x):
# the model expects a batch dimension, so create it
return self.model(x[None, :])
def encode_response(self, output):
# float will take the output value directly onto CPU memory
return {"output": float(output)}
if __name__ == "__main__":
# accelerator="auto" (or "cuda"), devices="auto" (or 4) will lead to 4 workers serving
# the model from "cuda:0", "cuda:1", "cuda:2", "cuda:3" respectively
server = ls.LitServer(SimpleLitAPI(), accelerator="auto", devices="auto")
server.run(port=8000)
The devices
argument can also be an array specifying what device id to
run the model on:
server = LitServer(SimpleLitAPI(), accelerator="cuda", devices=[0, 3])
Last, you can run multiple copies of the same model from the same device, if the model is small. The following will load two copies of the model on each of the 4 GPUs:
server = LitServer(SimpleLitAPI(), accelerator="cuda", devices=4, workers_per_device=2)
Timeouts and disconnections
The server will remove a queued request if the client requesting it disconnects.
You can configure a timeout (in seconds) after which clients will receive a 504
HTTP
response (Gateway Timeout) indicating that their request has timed out.
For example, this is how you can configure the server with a timeout of 30 seconds per response.
server = LitServer(SimpleLitAPI(), accelerator="cuda", devices=4, timeout=30)
This is useful to avoid requests queuing up beyond the ability of the server to respond.
To disable the timeout for long-running tasks, set timeout=False
or timeout=-1
:
server = LitServer(SimpleLitAPI(), timeout=False)
Use API key authentication
In order to secure the API behind an API key, just define the env var when starting the server
LIT_SERVER_API_KEY=supersecretkey python main.py
Clients are expected to auth with the same API key set in the X-API-Key
HTTP header.
Dynamic batching
LitServe can combine individual requests into a batch to improve throughput.
To enable batching, you need to set the max_batch_size
argument to match the batch size that your model can handle
and implement LitAPI.predict
to process batched inputs.
import numpy as np
import litserve as ls
class SimpleStreamAPI(ls.LitAPI):
def setup(self, device) -> None:
self.model = lambda x: x ** 2
def decode_request(self, request):
return np.asarray(request["input"])
def predict(self, x):
result = self.model(x)
return result
def encode_response(self, output):
return {"output": output}
if __name__ == "__main__":
api = SimpleStreamAPI()
server = ls.LitServer(api, max_batch_size=4, batch_timeout=0.05)
server.run(port=8000)
You can control the wait time to aggregate requests into a batch with the batch_timeout
argument.
In the above example, the server will wait for 0.05 seconds to combine 4 requests together.
LitServe automatically stacks NumPy arrays and PyTorch tensors along the batch dimension before calling the
LitAPI.predict
method, and splits the output across requests afterward. You can customize this behavior by overriding the
LitAPI.batch
and LitAPI.unbatch
methods to handle different data types.
class SimpleStreamAPI(ls.LitAPI):
...
def batch(self, inputs):
return np.stack(inputs)
def unbatch(self, output):
return list(output)
...
Stream long responses
LitServe can stream outputs from the model in real-time, such as returning text one word at a time from a language model.
To enable streaming, you need to set LitServer(..., stream=True)
and implement LitAPI.predict
and LitAPI.encode_response
as a generator (a Python function that yields output).
For example, streaming long responses generated over time:
import json
import litserve as ls
class SimpleStreamAPI(ls.LitAPI):
def setup(self, device) -> None:
self.model = lambda x, y: x * y
def decode_request(self, request):
return request["input"]
def predict(self, x):
for i in range(10):
yield self.model(x, i)
def encode_response(self, output):
for out in output:
yield json.dumps({"output": out})
if __name__ == "__main__":
api = SimpleStreamAPI()
server = ls.LitServer(api, stream=True)
server.run(port=8000)
Automatic schema validation
Define the request and response as Pydantic models, to automatically validate the request.
from pydantic import BaseModel
import litserve as ls
class PredictRequest(BaseModel):
input: float
class PredictResponse(BaseModel):
output: float
class SimpleLitAPI(ls.LitAPI):
def setup(self, device):
self.model = lambda x: x**2
def decode_request(self, request: PredictRequest) -> float:
return request.input
def predict(self, x):
return self.model(x)
def encode_response(self, output: float) -> PredictResponse:
return PredictResponse(output=output)
if __name__ == "__main__":
api = SimpleLitAPI()
server = ls.LitServer(api, accelerator="auto")
server.run(port=8888)
LitServe is a community project accepting contributions. Let's make the world's most advanced AI inference engine.
Run tests
Use pytest
to run tests locally.
First, install test dependencies:
pip install -r _requirements/test.txt
Run the tests
pytest tests
litserve is released under the Apache 2.0 license. See LICENSE file for details.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for litserve
Similar Open Source Tools
litserve
LitServe is a high-throughput serving engine for deploying AI models at scale. It generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs, and more. Built for enterprise scale, it supports every framework like PyTorch, JAX, Tensorflow, and more. LitServe is designed to let users focus on model performance, not the serving boilerplate. It is like PyTorch Lightning for model serving but with broader framework support and scalability.
paxml
Pax is a framework to configure and run machine learning experiments on top of Jax.
pebblo
Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.
mistral-inference
Mistral Inference repository contains minimal code to run 7B, 8x7B, and 8x22B models. It provides model download links, installation instructions, and usage guidelines for running models via CLI or Python. The repository also includes information on guardrailing, model platforms, deployment, and references. Users can interact with models through commands like mistral-demo, mistral-chat, and mistral-common. Mistral AI models support function calling and chat interactions for tasks like testing models, chatting with models, and using Codestral as a coding assistant. The repository offers detailed documentation and links to blogs for further information.
openedai-speech
OpenedAI Speech is a free, private text-to-speech server compatible with the OpenAI audio/speech API. It offers custom voice cloning and supports various models like tts-1 and tts-1-hd. Users can map their own piper voices and create custom cloned voices. The server provides multilingual support with XTTS voices and allows fixing incorrect sounds with regex. Recent changes include bug fixes, improved error handling, and updates for multilingual support. Installation can be done via Docker or manual setup, with usage instructions provided. Custom voices can be created using Piper or Coqui XTTS v2, with guidelines for preparing audio files. The tool is suitable for tasks like generating speech from text, creating custom voices, and multilingual text-to-speech applications.
react-native-fast-tflite
A high-performance TensorFlow Lite library for React Native that utilizes JSI for power, zero-copy ArrayBuffers for efficiency, and low-level C/C++ TensorFlow Lite core API for direct memory access. It supports swapping out TensorFlow Models at runtime and GPU-accelerated delegates like CoreML/Metal/OpenGL. Easy VisionCamera integration allows for seamless usage. Users can load TensorFlow Lite models, interpret input and output data, and utilize GPU Delegates for faster computation. The library is suitable for real-time object detection, image classification, and other machine learning tasks in React Native applications.
raglite
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite. It offers configurable options for choosing LLM providers, database types, and rerankers. The toolkit is fast and permissive, utilizing lightweight dependencies and hardware acceleration. RAGLite provides features like PDF to Markdown conversion, multi-vector chunk embedding, optimal semantic chunking, hybrid search capabilities, adaptive retrieval, and improved output quality. It is extensible with a built-in Model Context Protocol server, customizable ChatGPT-like frontend, document conversion to Markdown, and evaluation tools. Users can configure RAGLite for various tasks like configuring, inserting documents, running RAG pipelines, computing query adapters, evaluating performance, running MCP servers, and serving frontends.
clarifai-python-grpc
This is the official Clarifai gRPC Python client for interacting with their recognition API. Clarifai offers a platform for data scientists, developers, researchers, and enterprises to utilize artificial intelligence for image, video, and text analysis through computer vision and natural language processing. The client allows users to authenticate, predict concepts in images, and access various functionalities provided by the Clarifai API. It follows a versioning scheme that aligns with the backend API updates and includes specific instructions for installation and troubleshooting. Users can explore the Clarifai demo, sign up for an account, and refer to the documentation for detailed information.
DeepPavlov
DeepPavlov is an open-source conversational AI library built on PyTorch. It is designed for the development of production-ready chatbots and complex conversational systems, as well as for research in the area of NLP and dialog systems. The library offers a wide range of models for tasks such as Named Entity Recognition, Intent/Sentence Classification, Question Answering, Sentence Similarity/Ranking, Syntactic Parsing, and more. DeepPavlov also provides embeddings like BERT, ELMo, and FastText for various languages, along with AutoML capabilities and integrations with REST API, Socket API, and Amazon AWS.
aicsimageio
AICSImageIO is a Python tool for Image Reading, Metadata Conversion, and Image Writing for Microscopy Images. It supports various file formats like OME-TIFF, TIFF, ND2, DV, CZI, LIF, PNG, GIF, and Bio-Formats. Users can read and write metadata and imaging data, work with different file systems like local paths, HTTP URLs, s3fs, and gcsfs. The tool provides functionalities for full image reading, delayed image reading, mosaic image reading, metadata reading, xarray coordinate plane attachment, cloud IO support, and saving to OME-TIFF. It also offers benchmarking and developer resources.
neural-speed
Neural Speed is an innovative library designed to support the efficient inference of large language models (LLMs) on Intel platforms through the state-of-the-art (SOTA) low-bit quantization powered by Intel Neural Compressor. The work is inspired by llama.cpp and further optimized for Intel platforms with our innovations in NeurIPS' 2023
mflux
MFLUX is a line-by-line port of the FLUX implementation in the Huggingface Diffusers library to Apple MLX. It aims to run powerful FLUX models from Black Forest Labs locally on Mac machines. The codebase is minimal and explicit, prioritizing readability over generality and performance. Models are implemented from scratch in MLX, with tokenizers from the Huggingface Transformers library. Dependencies include Numpy and Pillow for image post-processing. Installation can be done using `uv tool` or classic virtual environment setup. Command-line arguments allow for image generation with specified models, prompts, and optional parameters. Quantization options for speed and memory reduction are available. LoRA adapters can be loaded for fine-tuning image generation. Controlnet support provides more control over image generation with reference images. Current limitations include generating images one by one, lack of support for negative prompts, and some LoRA adapters not working.
mLoRA
mLoRA (Multi-LoRA Fine-Tune) is an open-source framework for efficient fine-tuning of multiple Large Language Models (LLMs) using LoRA and its variants. It allows concurrent fine-tuning of multiple LoRA adapters with a shared base model, efficient pipeline parallelism algorithm, support for various LoRA variant algorithms, and reinforcement learning preference alignment algorithms. mLoRA helps save computational and memory resources when training multiple adapters simultaneously, achieving high performance on consumer hardware.
aiohttp-session
aiohttp_session is a Python library that provides session management for aiohttp.web applications. It allows storing user-specific data in session objects with a dict-like interface. The library offers different session storage options, including SimpleCookieStorage for testing, EncryptedCookieStorage for secure data storage, and RedisStorage for storing data in Redis. Users can easily integrate session management into their aiohttp.web applications by registering the session middleware. The library is designed to simplify session handling and enhance the security of web applications.
LLMDebugger
This repository contains the code and dataset for LDB, a novel debugging framework that enables Large Language Models (LLMs) to refine their generated programs by tracking the values of intermediate variables throughout the runtime execution. LDB segments programs into basic blocks, allowing LLMs to concentrate on simpler code units, verify correctness block by block, and pinpoint errors efficiently. The tool provides APIs for debugging and generating code with debugging messages, mimicking how human developers debug programs.
extractor
Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.
For similar tasks
litserve
LitServe is a high-throughput serving engine for deploying AI models at scale. It generates an API endpoint for a model, handles batching, streaming, autoscaling across CPU/GPUs, and more. Built for enterprise scale, it supports every framework like PyTorch, JAX, Tensorflow, and more. LitServe is designed to let users focus on model performance, not the serving boilerplate. It is like PyTorch Lightning for model serving but with broader framework support and scalability.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.
zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.
AI-in-a-Box
AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.
NeMo
NeMo Framework is a generative AI framework built for researchers and pytorch developers working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.
E2B
E2B Sandbox is a secure sandboxed cloud environment made for AI agents and AI apps. Sandboxes allow AI agents and apps to have long running cloud secure environments. In these environments, large language models can use the same tools as humans do. For example: * Cloud browsers * GitHub repositories and CLIs * Coding tools like linters, autocomplete, "go-to defintion" * Running LLM generated code * Audio & video editing The E2B sandbox can be connected to any LLM and any AI agent or app.
floneum
Floneum is a graph editor that makes it easy to develop your own AI workflows. It uses large language models (LLMs) to run AI models locally, without any external dependencies or even a GPU. This makes it easy to use LLMs with your own data, without worrying about privacy. Floneum also has a plugin system that allows you to improve the performance of LLMs and make them work better for your specific use case. Plugins can be used in any language that supports web assembly, and they can control the output of LLMs with a process similar to JSONformer or guidance.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.