
bedrock-claude-chatbot
Personal Chatbot powered by Amazon Bedrock LLMs with a data analytics feature that provides isolated servereless compute on Athena Spark for code execution.
Stars: 95

Bedrock Claude ChatBot is a Streamlit application that provides a conversational interface for users to interact with various Large Language Models (LLMs) on Amazon Bedrock. Users can ask questions, upload documents, and receive responses from the AI assistant. The app features conversational UI, document upload, caching, chat history storage, session management, model selection, cost tracking, logging, and advanced data analytics tool integration. It can be customized using a config file and is extensible for implementing specialized tools using Docker containers and AWS Lambda. The app requires access to Amazon Bedrock Anthropic Claude Model, S3 bucket, Amazon DynamoDB, Amazon Textract, and optionally Amazon Elastic Container Registry and Amazon Athena for advanced analytics features.
README:
Bedrock Chat App is a Streamlit application that allows users to interact with various LLMs on Amazon Bedrock. It provides a conversational interface where users can ask questions, upload documents, and receive responses from the AI assistant.
READ THE FOLLOWING PREREQUISITES CAREFULLY.
- Conversational UI: The app provides a chat-like interface for seamless interaction with the AI assistant.
- Document Upload: Users can upload various types of documents (PDF, CSV, TXT, PNG, JPG, XLSX, JSON, DOCX, Python scripts etc) to provide context for the AI assistant.
- Caching: Uploaded documents and extracted text are cached in an S3 bucket for improved performance. This serves as the object storage unit for the application as documents are retrieved and loaded into the model to keep conversation context.
- Chat History: The app stores stores and retrieves chat history (including document metadata) to/from a DynamoDB table, allowing users to continue conversations across sessions.
- Session Store: The application utilizes DynamoDB to store and manage user and session information, enabling isolated conversations and state tracking for each user interaction.
-
Model Selection: Users can select from a broad list of LLMs on Amazon Bedrock including latest models from Anthropic Claude, Amazon Nova, Meta Llama, Deepseek etc for their queries and can include additional models on Bedrock by modifying teh
model-id.json
file. It incorporates the Bedrock Converse API providing a standardized model interface. -
Cost Tracking: The application calculates and displays the cost associated with each chat session based on the input and output token counts and the pricing model defined in the
pricing.json
file. - Logging: The items logged in the DynamoDB table include the user ID, session ID, messages, timestamps,uploaded documents s3 path, input and output token counts. This helps to isolate user engagement statistics and track the various items being logged, as well as attribute the cost per user.
-
Tool Usage:
Advanced Data Analytics tool
for processing and analyzing structured data (CSV, XLX and XLSX format) in an isolated and serverless enviroment. - Extensible Tool Integration: This app can be modified to leverage the extensive Domain Specific Language (DSL) knowledge inherent in Large Language Models (LLMs) to implement a wide range of specialized tools. This capability is enhanced by the versatile execution environments provided by Docker containers and AWS Lambda, allowing for dynamic and adaptable implementation of various DSL-based functionalities. This approach enables the system to handle diverse domain-specific tasks efficiently, without the need for hardcoded, specialized modules for each domain.
There are two files of interest.
- A Jupyter Notebook that walks you through the ChatBot Implementation cell by cell (Advanced Data Analytics only available in the streamlit chatbot).
- A Streamlit app that can be deployed to create a UI Chatbot.
- Amazon Bedrock Anthropic Claude Model Access
- S3 bucket to store uploaded documents and Textract output.
- Optional:
- Create an Amazon DynamoDB table to store chat history (Run the notebook BedrockChatUI to create a DynamoDB Table). This is optional as there is a local disk storage option, however, I would recommend using Amazon DynamoDB.
- Amazon Textract. This is optional as there is an option to use python libraries
pypdf2
andpytessesract
for PDF and image processing. However, I would recommend using Amazon Textract for higher quality PDF and image processing. You will experience latency when usingpytesseract
. -
Amazon Elastic Container Registry to store custom docker images if using the
Advanced Data Analytics
feature with the AWS Lambda setup.
To use the Advanced Analytics Feature, this additional step is required (ChatBot can still be used without enabling Advanced Analytics Feature
):
This feature can be powered by a python runtime on AWS Lambda and/or a pyspark runtime on Amazon Athena. Expand the appropiate section below to view the set-up instructions.
AWS Lambda Python Runtime Setup
-
Amazon Lambda function with custom python image to execute python code for analytics.
-
Create an private ECR repository by following the link in step 3.
-
On your local machine or any related AWS services including AWS CloudShell, Amazon Elastic Compute Cloud, Amazon Sageamker Studio etc. run the following CLI commands:
- install git and clone this git repo
git clone [github_link]
- navigate into the Docker directory
cd Docker
- if using local machine, authenticate with your AWS credentials
- install AWS Command Line Interface (AWS CLI) version 2 if not already installed.
- Follow the steps in the Deploying the image section under Using an AWS base image for Python in this documentation guide. Replace the placeholders with the appropiate values. You can skip step
2
if you already created an ECR repository. - In step 6, in addition to
AWSLambdaBasicExecutionRole
policy, ONLY grant least priveledged read and write Amazon S3 policies to the execution role. Scope down the policy to only include the necessary S3 bucket and S3 directory prefix where uploaded files will be stored and read from as configured in theconfig.json
file below. - In step 7, I recommend creating the Lambda function in a Amazon Virtual Private Cloud (VPC) without internet access and attach Amazon S3 and Amazon CloudWatch gateway and interface endpoints accordingly. The following step 7 command can be modified to include VPC paramters:
aws lambda create-function \ --function-name YourFunctionName \ --package-type Image \ --code ImageUri=your-account-id.dkr.ecr.your-region.amazonaws.com/your-repo:tag \ --role arn:aws:iam::your-account-id:role/YourLambdaExecutionRole \ --vpc-config SubnetIds=subnet-xxxxxxxx,subnet-yyyyyyyy,SecurityGroupIds=sg-zzzzzzzz \ --memory-size 512 \ --timeout 300 \ --region your-region
Modify the placeholders as appropiate. I recommend to keep
timeout
andmemory-size
params conservative as that will affect cost. A good staring point for memory is512
MB.- Ignore step 8.
- install git and clone this git repo
-
Amazon Athena Spark Runtime Setup
- Follow the instructions Get started with Apache Spark on Amazon Athena to create an Amazon Athena workgroup with Apache Spark. You
DO NOT
need to selectTurn on example notebook
.- Provide S3 permissions to the workgroup execution role for the S3 buckets configured with this application.
- Note that the Amazon Athena Spark environment comes preinstalled with a select few python libraries.
⚠ IMPORTANT SECURITY NOTE:
Enabling the Advanced Analytics Feature allows the LLM to generate and execute Python code to analyze your dataset that will automatically be executed in a Lambda function environment. To mitigate potential risks:
-
VPC Configuration:
- It is recommended to place the Lambda function in an internet-free VPC.
- Use Amazon S3 and CloudWatch gateway/interface endpoints for necessary access.
-
IAM Permissions:
- Scope down the AWS Lambda and/or Amazon Athena workgroup execution role to only Amazon S3 and the required S3 resources. This is in addition to
AWSLambdaBasicExecutionRole
policy if using AWS Lambda.
- Scope down the AWS Lambda and/or Amazon Athena workgroup execution role to only Amazon S3 and the required S3 resources. This is in addition to
-
Library Restrictions:
- Only libraries specified in
Docker/requirements.txt
will be available at runtime. - Modify this list carefully based on your needs.
- Only libraries specified in
-
Resource Allocation:
- Adjust AWS Lambda function
timeout
andmemory-size
based on data size and analysis complexity.
- Adjust AWS Lambda function
-
Production Considerations:
- This application is designed for POC use.
- Implement additional security measures before deploying to production.
The goal is to limit the potential impact of generated code execution.
The application's behavior can be customized by modifying the config.json
file. Here are the available options:
-
DynamodbTable
: The name of the DynamoDB table to use for storing chat history. Leave this field empty if you decide to use local storage for chat history. -
UserId
: The DynamoDB user ID for the application. Leave this field empty if you decide to use local storage for chat history. -
Bucket_Name
: The name of the S3 bucket used for caching documents and extracted text. This is required. -
max-output-token
: The maximum number of output tokens allowed for the AI assistant's response. -
chat-history-loaded-length
: The number of recent chat messages to load from the DynamoDB table or Local storage. -
bedrock-region
: The AWS region where Amazon Bedrock is enabled. -
load-doc-in-chat-history
: A boolean flag indicating whether to load documents in the chat history. Iftrue
all documents would be loaded in chat history as context (provides more context of previous chat history to the AI at the cost of additional price and latency). Iffalse
only the user query and response would be loaded in the chat history, the AI would have no recollection of any document context from those chat conversations. When setting boolean in JSON use all lower caps. -
AmazonTextract
: A boolean indicating whether to use Amazon Textract or python libraries for PDF and image processing. Set tofalse
if you do not have access to Amazon Textract. When setting boolean in JSON use all lower caps. -
csv-delimiter
: The delimiter to use when parsing structured content to string. Supported formats are "|", "\t", and ",". -
document-upload-cache-s3-path
: S3 bucket path to cache uploaded files. Do not include the bucket name, just the prefix without a trailing slash. For example "path/to/files". -
AmazonTextract-result-cache
: S3 bucket path to cache Amazon Textract result. Do not include the bucket name, just the prefix without a trailing slash. For example "path/to/files". -
lambda-function
: Name of the Lambda function deploy in the steps above. This is required if using theAdvanced Analytics Tool
with AWS Lambda. -
input_s3_path
: S3 directory prefix, without the foward and trailing slash, to render the S3 objects in the Chat UI. -
input_bucket
: S3 bucket name where the files to be rendered on the screen are stored. -
input_file_ext
: comma-seperated file extension names (without ".") for files in the S3 buckets to be rendered on the screen. By defaultxlsx
andcsv
are included. -
athena-work-group-name
: Spark Amazon Athena workkgroup name created above. This is required if using theAdvanced Analytics Tool
with Amazon Athena.
⚠ IMPORTANT ADVISORY FOR ADVANCED ANALYTICS FEATURE
When using the Advanced Analytics Feature, take the following precautions:
-
Sandbox Environment:
- Set
Bucket_Name
anddocument-upload-cache-s3-path
to point to a separate, isolated "sandbox" S3 location. - Grant read and write access as documented to this bucket/prefix resource to the lambda execution role.
- Do NOT use your main storage path for these parameters. This isolation is crucial, to avoid potential file overwrite, as the app will execute LLM-generated code.
- Set
-
Input Data Safety:
-
input_s3_path
andinput_bucket
are used for read-only operations and can safely point to your main data storage. The LLM is not aware of this parameters unless explicitly provided by user during chat. - Only grant read access to this bucket/prefix resource in the execution role attached to the Lambda function.
-
IMPORTANT: Ensure
input_bucket
is different fromBucket_Name
.
-
By following these guidelines, you mitigate the potential risk of unintended data modification or loss in your primary storage areas.
If You have a Sagemaker AI Studio Domain already set up, ignore the first item, however, item 2 is required.
- Set Up SageMaker Studio
- SageMaker execution role should have access to interact with Bedrock, S3 and optionally Textract and DynamoDB, AWS Lambda and Amazon Athenaif these services are used.
- Create a JupyterLab space
-
- Open a terminal by clicking File -> New -> Terminal
- Navigate into the cloned repository directory using the
cd bedrock-claude-chatbot
command and run the following commands to install the application python libraries:- sudo apt update
- sudo apt upgrade -y
- chmod +x install_package.sh
- ./install_package.sh
-
NOTE: If you run into this error
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: /opt/conda/lib/python3.10/site-packages/fsspec-2023.6.0.dist-info/METADATA
, I solved it by deleting thefsspec
package by running the following command (this is due to have two versions offsspec
install 2023* and 2024*):rm /opt/conda/lib/python3.10/site-packages/fsspec-2023.6.0.dist-info -rdf
- pip install -U fsspec # fsspec 2024.9.0 should already be installed.
- If you decide to use Python Libs for PDF and image processing, this requires tesserect-ocr. Run the following command:
- sudo apt update -y
- sudo apt-get install tesseract-ocr-all -y
- Run command
python3 -m streamlit run bedrock-chat.py --server.enableXsrfProtection false
to start the Streamlit server. Do not use the links generated by the command as they won't work in studio. - Copy the URL of the SageMaker JupyterLab. It should look something like this https://qukigdtczjsdk.studio.us-east-1.sagemaker.aws/jupyterlab/default/lab/tree/healthlake/app_fhir.py. Replace everything after .../default/ with proxy/8501/, something like https://qukigdtczjsdk.studio.us-east-1.sagemaker.aws/jupyterlab/default/proxy/8501/. Make sure the port number (8501 in this case) matches with the port number printed out when you run the
python3 -m streamlit run bedrock-chat.py --server.enableXsrfProtection false
command; port number is the last 4 digits after the colon in the generated URL.
- Create a new ec2 instance
- Expose TCP port range 8500-8510 on Inbound connections of the attached Security group to the ec2 instance. TCP port 8501 is needed for Streamlit to work. See image below
-
- EC2 instance profile role has the required permissions to access the services used by this application mentioned above.
- Connect to your ec2 instance
- Run the appropiate commands to update the ec2 instance (
sudo apt update
andsudo apt upgrade
-for Ubuntu) - Clone this git repo
git clone [github_link]
andcd bedrock-claude-chatbot
- Install python3 and pip if not already installed,
sudo apt install python3
andsudo apt install python3-pip
. - If you decide to use Python Libs for PDF and image processing, this requires tesserect-ocr. Run the following command:
- If using Centos-OS or Amazon-Linux:
- sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
- sudo yum -y update
- sudo yum install -y tesseract
- For Ubuntu or Debian:
- sudo apt-get install tesseract-ocr-all -y
- If using Centos-OS or Amazon-Linux:
- Install the dependencies by running the following commands (use
yum
for Centos-OS or Amazon-Linux):- sudo apt update
- sudo apt upgrade -y
- chmod +x install_package.sh
- ./install_package.sh
- Run command
tmux new -s mysession
to create a new session. Then in the new session createdcd bedrock-claude-chatbot
into the ChatBot dir and runpython3 -m streamlit run bedrock-chat.py
to start the streamlit app. This allows you to run the Streamlit application in the background and keep it running even if you disconnect from the terminal session. - Copy the External URL link generated and paste in a new browser tab.
-
⚠ NOTE: The generated link is not secure! For additional guidance.
To stop the
tmux
session, in your ec2 terminal PressCtrl+b
, thend
to detach. to kill the session, runtmux kill-session -t mysession
-
Pricing: Pricing is only calculated for the Bedrock models not including cost of any other AWS service used. In addition, the pricing information of the models are stored in a static
pricing.json
file. Do manually update the file to refelct current Bedrock pricing details. Use this cost implementation in this app as a rough estimate of actual cost of interacting with the Bedrock models as actual cost reported in your account may differ. -
Storage Encryption: This application does not implement storing and reading files to and from S3 and/or DynamoDB using KMS keys for data at rest encryption.
-
Production-Ready: For an enterprise and production-ready chatbot application architecture pattern, check out Generative AI Application Builder on AWS and Bedrock-Claude-Chat for best practices and recommendations.
-
Tools Suite: This application only includes a single tool. However, with the many niche applications of LLM's, a library of tools will make this application robust.
Guidelines
- When a document is uploaded (and for everytime it stays uploaded), its content is attached to the user's query, and the chatbot's responses are grounded in the document ( a sperate prompt template is used). That chat conversation is tagged with the document name as metadata to be used in the chat history.
- If the document is detached, the chat history will only contain the user's queries and the chatbot's responses, unless the
load-doc-in-chat-history
configuration parameter is enabled, in which case the document content will be retained in the chat history. - You can refer to documents by their names of format (PDF, WORD, IMAGE etc) when having a conversation with the AI.
- The
chat-history-loaded-length
setting determines how many previous conversations the LLM will be aware of, including any attached documents (if theload-doc-in-chat-history
option is enabled). A higher value for this setting means that the LLM will have access to more historical context, but it may also increase the cost and potentially introduce latency, as more tokens will be inputted into the LLM. For optimal performance and cost-effectiveness, it's recommended to set the 'chat-history-loaded-length' to a value between 5 and 10. This range strikes a balance between providing the LLM with sufficient historical context while minimizing the input payload size and associated costs. ⚠️ When using the Streamlit app, any uploaded document will be persisted for the current chat conversation. This means that subsequent questions, as well as chat histories (if the 'load-doc-in-chat-history' option is enabled), will have the document(s) as context, and the responses will be grounded in that document(s). However, this can increase the cost and latency, as the input payload will be larger due to the loaded document(s) in every chat turn. Therefore if you have theload-doc-in-chat-history
option enabled, after your first question response with the uploaded document(s), it is recommended to remove the document(s) by clicking the X sign next to the uploaded file(s). The document(s) will be saved in the chat history, and you can ask follow-up questions about it, as the LLM will have knowledge of the document(s) from the chat history. On the other hand, if theload-doc-in-chat-history
option is disabled, and you want to keep asking follow-up questions about the document(s), leave the document(s) uploaded until you are done. This way, only the current chat turn will have the document(s) loaded, and not the entire chat history. The choice between enablingload-doc-in-chat-history
or not is dependent on cost and latency. I would recommend enabling for a smoother experience following the aforementioned guidelines.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for bedrock-claude-chatbot
Similar Open Source Tools

bedrock-claude-chatbot
Bedrock Claude ChatBot is a Streamlit application that provides a conversational interface for users to interact with various Large Language Models (LLMs) on Amazon Bedrock. Users can ask questions, upload documents, and receive responses from the AI assistant. The app features conversational UI, document upload, caching, chat history storage, session management, model selection, cost tracking, logging, and advanced data analytics tool integration. It can be customized using a config file and is extensible for implementing specialized tools using Docker containers and AWS Lambda. The app requires access to Amazon Bedrock Anthropic Claude Model, S3 bucket, Amazon DynamoDB, Amazon Textract, and optionally Amazon Elastic Container Registry and Amazon Athena for advanced analytics features.

geti-sdk
The Intel® Geti™ SDK is a python package that enables teams to rapidly develop AI models by easing the complexities of model development and fostering collaboration. It provides tools to interact with an Intel® Geti™ server via the REST API, allowing for project creation, downloading, uploading, deploying for local inference with OpenVINO, configuration management, training job monitoring, media upload, and prediction. The repository also includes tutorial-style Jupyter notebooks demonstrating SDK usage.

geti-sdk
The Intel® Geti™ SDK is a python package that enables teams to rapidly develop AI models by easing the complexities of model development and enhancing collaboration between teams. It provides tools to interact with an Intel® Geti™ server via the REST API, allowing for project creation, downloading, uploading, deploying for local inference with OpenVINO, setting project and model configuration, launching and monitoring training jobs, and media upload and prediction. The SDK also includes tutorial-style Jupyter notebooks demonstrating its usage.

cognita
Cognita is an open-source framework to organize your RAG codebase along with a frontend to play around with different RAG customizations. It provides a simple way to organize your codebase so that it becomes easy to test it locally while also being able to deploy it in a production ready environment. The key issues that arise while productionizing RAG system from a Jupyter Notebook are: 1. **Chunking and Embedding Job** : The chunking and embedding code usually needs to be abstracted out and deployed as a job. Sometimes the job will need to run on a schedule or be trigerred via an event to keep the data updated. 2. **Query Service** : The code that generates the answer from the query needs to be wrapped up in a api server like FastAPI and should be deployed as a service. This service should be able to handle multiple queries at the same time and also autoscale with higher traffic. 3. **LLM / Embedding Model Deployment** : Often times, if we are using open-source models, we load the model in the Jupyter notebook. This will need to be hosted as a separate service in production and model will need to be called as an API. 4. **Vector DB deployment** : Most testing happens on vector DBs in memory or on disk. However, in production, the DBs need to be deployed in a more scalable and reliable way. Cognita makes it really easy to customize and experiment everything about a RAG system and still be able to deploy it in a good way. It also ships with a UI that makes it easier to try out different RAG configurations and see the results in real time. You can use it locally or with/without using any Truefoundry components. However, using Truefoundry components makes it easier to test different models and deploy the system in a scalable way. Cognita allows you to host multiple RAG systems using one app. ### Advantages of using Cognita are: 1. A central reusable repository of parsers, loaders, embedders and retrievers. 2. Ability for non-technical users to play with UI - Upload documents and perform QnA using modules built by the development team. 3. Fully API driven - which allows integration with other systems. > If you use Cognita with Truefoundry AI Gateway, you can get logging, metrics and feedback mechanism for your user queries. ### Features: 1. Support for multiple document retrievers that use `Similarity Search`, `Query Decompostion`, `Document Reranking`, etc 2. Support for SOTA OpenSource embeddings and reranking from `mixedbread-ai` 3. Support for using LLMs using `Ollama` 4. Support for incremental indexing that ingests entire documents in batches (reduces compute burden), keeps track of already indexed documents and prevents re-indexing of those docs.

mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic

agentok
Agentok Studio is a visual tool built for AutoGen, a cutting-edge agent framework from Microsoft and various contributors. It offers intuitive visual tools to simplify the construction and management of complex agent-based workflows. Users can create workflows visually as graphs, chat with agents, and share flow templates. The tool is designed to streamline the development process for creators and developers working on next-generation Multi-Agent Applications.

latex2ai
LaTeX2AI is a plugin for Adobe Illustrator that allows users to use editable text labels typeset in LaTeX inside an Illustrator document. It provides a seamless integration of LaTeX functionality within the Illustrator environment, enabling users to create and edit LaTeX labels, manage item scaling behavior, set global options, and save documents as PDF with included LaTeX labels. The tool simplifies the process of including LaTeX-generated content in Illustrator designs, ensuring accurate scaling and alignment with other elements in the document.

serverless-pdf-chat
The serverless-pdf-chat repository contains a sample application that allows users to ask natural language questions of any PDF document they upload. It leverages serverless services like Amazon Bedrock, AWS Lambda, and Amazon DynamoDB to provide text generation and analysis capabilities. The application architecture involves uploading a PDF document to an S3 bucket, extracting metadata, converting text to vectors, and using a LangChain to search for information related to user prompts. The application is not intended for production use and serves as a demonstration and educational tool.

guidellm
GuideLLM is a platform for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM enables users to assess the performance, resource requirements, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality. The tool provides features for performance evaluation, resource optimization, cost estimation, and scalability testing.

unitycatalog
Unity Catalog is an open and interoperable catalog for data and AI, supporting multi-format tables, unstructured data, and AI assets. It offers plugin support for extensibility and interoperates with Delta Sharing protocol. The catalog is fully open with OpenAPI spec and OSS implementation, providing unified governance for data and AI with asset-level access control enforced through REST APIs.

vector-vein
VectorVein is a no-code AI workflow software inspired by LangChain and langflow, aiming to combine the powerful capabilities of large language models and enable users to achieve intelligent and automated daily workflows through simple drag-and-drop actions. Users can create powerful workflows without the need for programming, automating all tasks with ease. The software allows users to define inputs, outputs, and processing methods to create customized workflow processes for various tasks such as translation, mind mapping, summarizing web articles, and automatic categorization of customer reviews.

MARS5-TTS
MARS5 is a novel English speech model (TTS) developed by CAMB.AI, featuring a two-stage AR-NAR pipeline with a unique NAR component. The model can generate speech for various scenarios like sports commentary and anime with just 5 seconds of audio and a text snippet. It allows steering prosody using punctuation and capitalization in the transcript. Speaker identity is specified using an audio reference file, enabling 'deep clone' for improved quality. The model can be used via torch.hub or HuggingFace, supporting both shallow and deep cloning for inference. Checkpoints are provided for AR and NAR models, with hardware requirements of 750M+450M params on GPU. Contributions to improve model stability, performance, and reference audio selection are welcome.

lmql
LMQL is a programming language designed for large language models (LLMs) that offers a unique way of integrating traditional programming with LLM interaction. It allows users to write programs that combine algorithmic logic with LLM calls, enabling model reasoning capabilities within the context of the program. LMQL provides features such as Python syntax integration, rich control-flow options, advanced decoding techniques, powerful constraints via logit masking, runtime optimization, sync and async API support, multi-model compatibility, and extensive applications like JSON decoding and interactive chat interfaces. The tool also offers library integration, flexible tooling, and output streaming options for easy model output handling.

STMP
SillyTavern MultiPlayer (STMP) is an LLM chat interface that enables multiple users to chat with an AI. It features a sidebar chat for users, tools for the Host to manage the AI's behavior and moderate users. Users can change display names, chat in different windows, and the Host can control AI settings. STMP supports Text Completions, Chat Completions, and HordeAI. Users can add/edit APIs, manage past chats, view user lists, and control delays. Hosts have access to various controls, including AI configuration, adding presets, and managing characters. Planned features include smarter retry logic, host controls enhancements, and quality of life improvements like user list fading and highlighting exact usernames in AI responses.

open-source-slack-ai
This repository provides a ready-to-run basic Slack AI solution that allows users to summarize threads and channels using OpenAI. Users can generate thread summaries, channel overviews, channel summaries since a specific time, and full channel summaries. The tool is powered by GPT-3.5-Turbo and an ensemble of NLP models. It requires Python 3.8 or higher, an OpenAI API key, Slack App with associated API tokens, Poetry package manager, and ngrok for local development. Users can customize channel and thread summaries, run tests with coverage using pytest, and contribute to the project for future enhancements.

guidellm
GuideLLM is a powerful tool for evaluating and optimizing the deployment of large language models (LLMs). By simulating real-world inference workloads, GuideLLM helps users gauge the performance, resource needs, and cost implications of deploying LLMs on various hardware configurations. This approach ensures efficient, scalable, and cost-effective LLM inference serving while maintaining high service quality. Key features include performance evaluation, resource optimization, cost estimation, and scalability testing.
For similar tasks

Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

sorrentum
Sorrentum is an open-source project that aims to combine open-source development, startups, and brilliant students to build machine learning, AI, and Web3 / DeFi protocols geared towards finance and economics. The project provides opportunities for internships, research assistantships, and development grants, as well as the chance to work on cutting-edge problems, learn about startups, write academic papers, and get internships and full-time positions at companies working on Sorrentum applications.

tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

zep-python
Zep is an open-source platform for building and deploying large language model (LLM) applications. It provides a suite of tools and services that make it easy to integrate LLMs into your applications, including chat history memory, embedding, vector search, and data enrichment. Zep is designed to be scalable, reliable, and easy to use, making it a great choice for developers who want to build LLM-powered applications quickly and easily.

telemetry-airflow
This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

mojo
Mojo is a new programming language that bridges the gap between research and production by combining Python syntax and ecosystem with systems programming and metaprogramming features. Mojo is still young, but it is designed to become a superset of Python over time.

pandas-ai
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.

databend
Databend is an open-source cloud data warehouse that serves as a cost-effective alternative to Snowflake. With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.