chat-ui
Open source codebase powering the HuggingChat app
Stars: 7407
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
README:
Find the docs at hf.co/docs/chat-ui.
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
- Quickstart
- No Setup Deploy
- Setup
- Launch
- Web Search
- Text Embedding Models
- Extra parameters
- Common issues
- Deploying to a HF Space
- Building
You can quickly start a locally running chat-ui & LLM text-generation server thanks to chat-ui's llama.cpp server support.
Step 1 (Start llama.cpp server):
Install llama.cpp w/ brew (for Mac):
# install llama.cpp
brew install llama.cpp
or build directly from the source for your target device:
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make
Next, start the server with the LLM of your choice:
# start llama.cpp server (using hf.co/microsoft/Phi-3-mini-4k-instruct-gguf as an example)
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
A local LLaMA.cpp HTTP Server will start on http://localhost:8080
. Read more here.
Step 2 (tell chat-ui to use local llama.cpp server):
Add the following to your .env.local
:
MODELS=`[
{
"name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
"tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
"preprompt": "",
"parameters": {
"stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
"temperature": 0.7,
"max_new_tokens": 1024,
"truncate": 3071
},
"endpoints": [{
"type" : "llamacpp",
"baseURL": "http://localhost:8080"
}],
},
]`
The tokenizer
field will be used to find the appropriate chat template for the model. Make sure to fill in a valid model from the Hugging Face hub.
Read more here.
Step 3 (make sure you have MongoDb running locally):
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
Read more here.
Step 4 (start chat-ui):
git clone https://github.com/huggingface/chat-ui
cd chat-ui
npm install
npm run dev -- --open
Read more here.
If you don't want to configure, setup, and launch your own Chat UI yourself, you can use this option as a fast deploy alternative.
You can deploy your own customized Chat UI instance with any supported LLM of your choice on Hugging Face Spaces. To do so, use the chat-ui template available here.
Set HF_TOKEN
in Space secrets to deploy a model with gated access or a model in a private repository. It's also compatible with Inference for PROs curated list of powerful models with higher rate limits. Make sure to create your personal token first in your User Access Tokens settings.
Read the full tutorial here.
The default config for Chat UI is stored in the .env
file. You will need to override some values to get Chat UI to run locally. This is done in .env.local
.
Start by creating a .env.local
file in the root of the repository. The bare minimum config you need to get Chat UI to run locally is the following:
MONGODB_URL=<the URL to your MongoDB instance>
HF_TOKEN=<your access token>
The chat history is stored in a MongoDB instance, and having a DB instance available is needed for Chat UI to work.
You can use a local MongoDB instance. The easiest way is to spin one up using docker:
docker run -d -p 27017:27017 --name mongo-chatui mongo:latest
In which case the url of your DB will be MONGODB_URL=mongodb://localhost:27017
.
Alternatively, you can use a free MongoDB Atlas instance for this, Chat UI should fit comfortably within their free tier. After which you can set the MONGODB_URL
variable in .env.local
to match your instance.
If you use a remote inference endpoint, you will need a Hugging Face access token to run Chat UI locally. You can get one from your Hugging Face profile.
After you're done with the .env.local
file you can run Chat UI locally with:
npm install
npm run dev
Chat UI features a powerful Web Search feature. It works by:
- Generating an appropriate search query from the user prompt.
- Performing web search and extracting content from webpages.
- Creating embeddings from texts using a text embedding model.
- From these embeddings, find the ones that are closest to the user query using a vector similarity search. Specifically, we use
inner product
distance. - Get the corresponding texts to those closest embeddings and perform Retrieval-Augmented Generation (i.e. expand user prompt by adding those texts so that an LLM can use this information).
By default (for backward compatibility), when TEXT_EMBEDDING_MODELS
environment variable is not defined, transformers.js embedding models will be used for embedding tasks, specifically, Xenova/gte-small model.
You can customize the embedding model by setting TEXT_EMBEDDING_MODELS
in your .env.local
file. For example:
TEXT_EMBEDDING_MODELS = `[
{
"name": "Xenova/gte-small",
"displayName": "Xenova/gte-small",
"description": "locally running embedding",
"chunkCharLength": 512,
"endpoints": [
{"type": "transformersjs"}
]
},
{
"name": "intfloat/e5-base-v2",
"displayName": "intfloat/e5-base-v2",
"description": "hosted embedding model",
"chunkCharLength": 768,
"preQuery": "query: ", # See https://huggingface.co/intfloat/e5-base-v2#faq
"prePassage": "passage: ", # See https://huggingface.co/intfloat/e5-base-v2#faq
"endpoints": [
{
"type": "tei",
"url": "http://127.0.0.1:8080/",
"authorization": "TOKEN_TYPE TOKEN" // optional authorization field. Example: "Basic VVNFUjpQQVNT"
}
]
}
]`
The required fields are name
, chunkCharLength
and endpoints
.
Supported text embedding backends are: transformers.js
, TEI
and OpenAI
. transformers.js
models run locally as part of chat-ui
, whereas TEI
models run in a different environment & accessed through an API endpoint. openai
models are accessed through the OpenAI API.
When more than one embedding models are supplied in .env.local
file, the first will be used by default, and the others will only be used on LLM's which configured embeddingModel
to the name of the model.
The login feature is disabled by default and users are attributed a unique ID based on their browser. But if you want to use OpenID to authenticate your users, you can add the following to your .env.local
file:
OPENID_CONFIG=`{
PROVIDER_URL: "<your OIDC issuer>",
CLIENT_ID: "<your OIDC client ID>",
CLIENT_SECRET: "<your OIDC client secret>",
SCOPES: "openid profile",
TOLERANCE: // optional
RESOURCE: // optional
}`
These variables will enable the openID sign-in modal for users.
You can set the env variable TRUSTED_EMAIL_HEADER
to point to the header that contains the user's email address. This will allow you to authenticate users from the header. This setup is usually combined with a proxy that will be in front of chat-ui and will handle the auth and set the header.
[!WARNING] Make sure to only allow requests to chat-ui through your proxy which handles authentication, otherwise users could authenticate as anyone by setting the header manually! Only set this up if you understand the implications and know how to do it correctly.
Here is a list of header names for common auth providers:
- Tailscale Serve:
Tailscale-User-Login
- Cloudflare Access:
Cf-Access-Authenticated-User-Email
- oauth2-proxy:
X-Forwarded-Email
You can use a few environment variables to customize the look and feel of chat-ui. These are by default:
PUBLIC_APP_NAME=ChatUI
PUBLIC_APP_ASSETS=chatui
PUBLIC_APP_COLOR=blue
PUBLIC_APP_DESCRIPTION="Making the community's best AI chat models available to everyone."
PUBLIC_APP_DATA_SHARING=
PUBLIC_APP_DISCLAIMER=
-
PUBLIC_APP_NAME
The name used as a title throughout the app. -
PUBLIC_APP_ASSETS
Is used to find logos & favicons instatic/$PUBLIC_APP_ASSETS
, current options arechatui
andhuggingchat
. -
PUBLIC_APP_COLOR
Can be any of the tailwind colors. -
PUBLIC_APP_DATA_SHARING
Can be set to 1 to add a toggle in the user settings that lets your users opt-in to data sharing with models creator. -
PUBLIC_APP_DISCLAIMER
If set to 1, we show a disclaimer about generated outputs on login.
You can enable the web search through an API by adding YDC_API_KEY
(docs.you.com) or SERPER_API_KEY
(serper.dev) or SERPAPI_KEY
(serpapi.com) or SERPSTACK_API_KEY
(serpstack.com) or SEARCHAPI_KEY
(searchapi.io) to your .env.local
.
You can also simply enable the local google websearch by setting USE_LOCAL_WEBSEARCH=true
in your .env.local
or specify a SearXNG instance by adding the query URL to SEARXNG_QUERY_URL
.
You can enable javascript when parsing webpages to improve compatibility with WEBSEARCH_JAVASCRIPT=true
at the cost of increased CPU usage. You'll want at least 4 cores when enabling.
You can customize the parameters passed to the model or even use a new model by updating the MODELS
variable in your .env.local
. The default one can be found in .env
and looks like this :
MODELS=`[
{
"name": "mistralai/Mistral-7B-Instruct-v0.2",
"displayName": "mistralai/Mistral-7B-Instruct-v0.2",
"description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.",
"websiteUrl": "https://mistral.ai/news/announcing-mistral-7b/",
"preprompt": "",
"chatPromptTemplate" : "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s>{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.3,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 3072,
"max_new_tokens": 1024,
"stop": ["</s>"]
},
"promptExamples": [
{
"title": "Write an email from bullet list",
"prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)"
}, {
"title": "Code a snake game",
"prompt": "Code a basic snake game in python, give explanations for each step."
}, {
"title": "Assist in a task",
"prompt": "How do I make a delicious lemon cheesecake?"
}
]
}
]`
You can change things like the parameters, or customize the preprompt to better suit your needs. You can also add more models by adding more objects to the array, with different preprompts for example.
When querying the model for a chat response, the chatPromptTemplate
template is used. messages
is an array of chat messages, it has the format [{ content: string }, ...]
. To identify if a message is a user message or an assistant message the ifUser
and ifAssistant
block helpers can be used.
The following is the default chatPromptTemplate
, although newlines and indentiation have been added for readability. You can find the prompts used in production for HuggingChat here.
{{preprompt}}
{{#each messages}}
{{#ifUser}}{{@root.userMessageToken}}{{content}}{{@root.userMessageEndToken}}{{/ifUser}}
{{#ifAssistant}}{{@root.assistantMessageToken}}{{content}}{{@root.assistantMessageEndToken}}{{/ifAssistant}}
{{/each}}
{{assistantMessageToken}}
We currently support IDEFICS (hosted on TGI), OpenAI and Claude 3 as multimodal models. You can enable it by setting multimodal: true
in your MODELS
configuration. For IDEFICS, you must have a PRO HF Api token. For OpenAI, see the OpenAI section. For Anthropic, see the Anthropic section.
{
"name": "HuggingFaceM4/idefics-80b-instruct",
"multimodal" : true,
"description": "IDEFICS is the new multimodal model by Hugging Face.",
"preprompt": "",
"chatPromptTemplate" : "{{#each messages}}{{#ifUser}}User: {{content}}{{/ifUser}}<end_of_utterance>\nAssistant: {{#ifAssistant}}{{content}}\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 12,
"truncate": 1000,
"max_new_tokens": 1024,
"stop": ["<end_of_utterance>", "User:", "\nUser:"]
}
}
If you want to, instead of hitting models on the Hugging Face Inference API, you can run your own models locally.
A good option is to hit a text-generation-inference endpoint. This is what is done in the official Chat UI Spaces Docker template for instance: both this app and a text-generation-inference server run inside the same container.
To do this, you can add your own endpoints to the MODELS
variable in .env.local
, by adding an "endpoints"
key for each model in MODELS
.
{
// rest of the model config here
"endpoints": [{
"type" : "tgi",
"url": "https://HOST:PORT",
}]
}
If endpoints
are left unspecified, ChatUI will look for the model on the hosted Hugging Face inference API using the model name.
Chat UI can be used with any API server that supports OpenAI API compatibility, for example text-generation-webui, LocalAI, FastChat, llama-cpp-python, and ialacol and vllm.
The following example config makes Chat UI works with text-generation-webui, the endpoint.baseUrl
is the url of the OpenAI API compatible server, this overrides the baseUrl to be used by OpenAI instance. The endpoint.completion
determine which endpoint to be used, default is chat_completions
which uses v1/chat/completions
, change to endpoint.completion
to completions
to use the v1/completions
endpoint.
Parameters not supported by OpenAI (e.g., top_k, repetition_penalty, etc.) must be set in the extraBody of endpoints. Be aware that setting them in parameters will cause them to be omitted.
MODELS=`[
{
"name": "text-generation-webui",
"id": "text-generation-webui",
"parameters": {
"temperature": 0.9,
"top_p": 0.95,
"max_new_tokens": 1024,
"stop": []
},
"endpoints": [{
"type" : "openai",
"baseURL": "http://localhost:8000/v1",
"extraBody": {
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000
}
}]
}
]`
The openai
type includes official OpenAI models. You can add, for example, GPT4/GPT3.5 as a "openai" model:
OPENAI_API_KEY=#your openai api key here
MODELS=`[{
"name": "gpt-4",
"displayName": "GPT 4",
"endpoints" : [{
"type": "openai"
}]
},
{
"name": "gpt-3.5-turbo",
"displayName": "GPT 3.5 Turbo",
"endpoints" : [{
"type": "openai"
}]
}]`
You may also consume any model provider that provides compatible OpenAI API endpoint. For example, you may self-host Portkey gateway and experiment with Claude or GPTs offered by Azure OpenAI. Example for Claude from Anthropic:
MODELS=`[{
"name": "claude-2.1",
"displayName": "Claude 2.1",
"description": "Anthropic has been founded by former OpenAI researchers...",
"parameters": {
"temperature": 0.5,
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "openai",
"baseURL": "https://gateway.example.com/v1",
"defaultHeaders": {
"x-portkey-config": '{"provider":"anthropic","api_key":"sk-ant-abc...xyz"}'
}
}
]
}]`
Example for GPT 4 deployed on Azure OpenAI:
MODELS=`[{
"id": "gpt-4-1106-preview",
"name": "gpt-4-1106-preview",
"displayName": "gpt-4-1106-preview",
"parameters": {
"temperature": 0.5,
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "openai",
"baseURL": "https://{resource-name}.openai.azure.com/openai/deployments/{deployment-id}",
"defaultHeaders": {
"api-key": "{api-key}"
},
"defaultQuery": {
"api-version": "2023-05-15"
}
}
]
}]`
Or try Mistral from Deepinfra:
Note, apiKey can either be set custom per endpoint, or globally using
OPENAI_API_KEY
variable.
MODELS=`[{
"name": "mistral-7b",
"displayName": "Mistral 7B",
"description": "A 7B dense Transformer, fast-deployed and easily customisable. Small, yet powerful for a variety of use cases. Supports English and code, and a 8k context window.",
"parameters": {
"temperature": 0.5,
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "openai",
"baseURL": "https://api.deepinfra.com/v1/openai",
"apiKey": "abc...xyz"
}
]
}]`
chat-ui also supports the llama.cpp API server directly without the need for an adapter. You can do this using the llamacpp
endpoint type.
If you want to run Chat UI with llama.cpp, you can do the following, using microsoft/Phi-3-mini-4k-instruct-gguf as an example model:
# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096
MODELS=`[
{
"name": "Local Zephyr",
"chatPromptTemplate": "<|system|>\n{{preprompt}}</s>\n{{#each messages}}{{#ifUser}}<|user|>\n{{content}}</s>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}</s>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["</s>"]
},
"endpoints": [
{
"url": "http://127.0.0.1:8080",
"type": "llamacpp"
}
]
}
]`
Start chat-ui with npm run dev
and you should be able to chat with Zephyr locally.
We also support the Ollama inference server. Spin up a model with
ollama run mistral
Then specify the endpoints like so:
MODELS=`[
{
"name": "Ollama Mistral",
"chatPromptTemplate": "<s>{{#each messages}}{{#ifUser}}[INST] {{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}} {{content}} [/INST]{{/ifUser}}{{#ifAssistant}}{{content}}</s> {{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 3072,
"max_new_tokens": 1024,
"stop": ["</s>"]
},
"endpoints": [
{
"type": "ollama",
"url" : "http://127.0.0.1:11434",
"ollamaName" : "mistral"
}
]
}
]`
We also support Anthropic models (including multimodal ones via multmodal: true
) through the official SDK. You may provide your API key via the ANTHROPIC_API_KEY
env variable, or alternatively, through the endpoints.apiKey
as per the following example.
MODELS=`[
{
"name": "claude-3-haiku-20240307",
"displayName": "Claude 3 Haiku",
"description": "Fastest and most compact model for near-instant responsiveness",
"multimodal": true,
"parameters": {
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "anthropic",
// optionals
"apiKey": "sk-ant-...",
"baseURL": "https://api.anthropic.com",
"defaultHeaders": {},
"defaultQuery": {}
}
]
},
{
"name": "claude-3-sonnet-20240229",
"displayName": "Claude 3 Sonnet",
"description": "Ideal balance of intelligence and speed",
"multimodal": true,
"parameters": {
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "anthropic",
// optionals
"apiKey": "sk-ant-...",
"baseURL": "https://api.anthropic.com",
"defaultHeaders": {},
"defaultQuery": {}
}
]
},
{
"name": "claude-3-opus-20240229",
"displayName": "Claude 3 Opus",
"description": "Most powerful model for highly complex tasks",
"multimodal": true,
"parameters": {
"max_new_tokens": 4096
},
"endpoints": [
{
"type": "anthropic",
// optionals
"apiKey": "sk-ant-...",
"baseURL": "https://api.anthropic.com",
"defaultHeaders": {},
"defaultQuery": {}
}
]
}
]`
We also support using Anthropic models running on Vertex AI. Authentication is done using Google Application Default Credentials. Project ID can be provided through the endpoints.projectId
as per the following example:
MODELS=`[
{
"name": "claude-3-sonnet@20240229",
"displayName": "Claude 3 Sonnet",
"description": "Ideal balance of intelligence and speed",
"multimodal": true,
"parameters": {
"max_new_tokens": 4096,
},
"endpoints": [
{
"type": "anthropic-vertex",
"region": "us-central1",
"projectId": "gcp-project-id",
// optionals
"defaultHeaders": {},
"defaultQuery": {}
}
]
},
{
"name": "claude-3-haiku@20240307",
"displayName": "Claude 3 Haiku",
"description": "Fastest, most compact model for near-instant responsiveness",
"multimodal": true,
"parameters": {
"max_new_tokens": 4096
},
"endpoints": [
{
"type": "anthropic-vertex",
"region": "us-central1",
"projectId": "gcp-project-id",
// optionals
"defaultHeaders": {},
"defaultQuery": {}
}
]
}
]`
You can also specify your Amazon SageMaker instance as an endpoint for chat-ui. The config goes like this:
"endpoints": [
{
"type" : "aws",
"service" : "sagemaker"
"url": "",
"accessKey": "",
"secretKey" : "",
"sessionToken": "",
"region": "",
"weight": 1
}
]
You can also set "service" : "lambda"
to use a lambda instance.
You can get the accessKey
and secretKey
from your AWS user, under programmatic access.
You can also use Cloudflare Workers AI to run your own models with serverless inference.
You will need to have a Cloudflare account, then get your account ID as well as your API token for Workers AI.
You can either specify them directly in your .env.local
using the CLOUDFLARE_ACCOUNT_ID
and CLOUDFLARE_API_TOKEN
variables, or you can set them directly in the endpoint config.
You can find the list of models available on Cloudflare here.
{
"name" : "nousresearch/hermes-2-pro-mistral-7b",
"tokenizer": "nousresearch/hermes-2-pro-mistral-7b",
"parameters": {
"stop": ["<|im_end|>"]
},
"endpoints" : [
{
"type" : "cloudflare"
<!-- optionally specify these
"accountId": "your-account-id",
"authToken": "your-api-token"
-->
}
]
}
You can also use Cohere to run their models directly from chat-ui. You will need to have a Cohere account, then get your API token. You can either specify it directly in your .env.local
using the COHERE_API_TOKEN
variable, or you can set it in the endpoint config.
Here is an example of a Cohere model config. You can set which model you want to use by setting the id
field to the model name.
{
"name" : "CohereForAI/c4ai-command-r-v01",
"id": "command-r",
"description": "C4AI Command-R is a research release of a 35 billion parameter highly performant generative model",
"endpoints": [
{
"type": "cohere",
<!-- optionally specify these, or use COHERE_API_TOKEN
"apiKey": "your-api-token"
-->
}
]
}
Chat UI can connect to the google Vertex API endpoints (List of supported models).
To enable:
- Select or create a Google Cloud project.
- Enable billing for your project.
- Enable the Vertex AI API.
- Set up authentication with a service account so you can access the API from your local workstation.
The service account credentials file can be imported as an environmental variable:
GOOGLE_APPLICATION_CREDENTIALS = clientid.json
Make sure your docker container has access to the file and the variable is correctly set. Afterwards Google Vertex endpoints can be configured as following:
MODELS=`[
//...
{
"name": "gemini-1.5-pro",
"displayName": "Vertex Gemini Pro 1.5",
"multimodal": true,
"endpoints" : [{
"type": "vertex",
"project": "abc-xyz",
"location": "europe-west3",
"extraBody": {
"model_version": "gemini-1.5-pro-preview-0409",
},
// Optional
"safetyThreshold": "BLOCK_MEDIUM_AND_ABOVE",
"apiEndpoint": "", // alternative api endpoint url,
"tools": [{
"googleSearchRetrieval": {
"disableAttribution": true
}
}],
"multimodal": {
"image": {
"supportedMimeTypes": ["image/png", "image/jpeg", "image/webp"],
"preferredMimeType": "image/png",
"maxSizeInMB": 5,
"maxWidth": 2000,
"maxHeight": 1000,
}
}
}]
},
]`
LangChain applications that are deployed using LangServe can be called with the following config:
MODELS=`[
//...
{
"name": "summarization-chain", //model-name
"endpoints" : [{
"type": "langserve",
"url" : "http://127.0.0.1:8100",
}]
},
]`
Custom endpoints may require authorization, depending on how you configure them. Authentication will usually be set either with Basic
or Bearer
.
For Basic
we will need to generate a base64 encoding of the username and password.
echo -n "USER:PASS" | base64
VVNFUjpQQVNT
For Bearer
you can use a token, which can be grabbed from here.
You can then add the generated information and the authorization
parameter to your .env.local
.
"endpoints": [
{
"url": "https://HOST:PORT",
"authorization": "Basic VVNFUjpQQVNT",
}
]
Please note that if HF_TOKEN
is also set or not empty, it will take precedence.
If the model being hosted will be available on multiple servers/instances add the weight
parameter to your .env.local
. The weight
will be used to determine the probability of requesting a particular endpoint.
"endpoints": [
{
"url": "https://HOST:PORT",
"weight": 1
},
{
"url": "https://HOST:PORT",
"weight": 2
}
...
]
Custom endpoints may require client certificate authentication, depending on how you configure them. To enable mTLS between Chat UI and your custom endpoint, you will need to set the USE_CLIENT_CERTIFICATE
to true
, and add the CERT_PATH
and KEY_PATH
parameters to your .env.local
. These parameters should point to the location of the certificate and key files on your local machine. The certificate and key files should be in PEM format. The key file can be encrypted with a passphrase, in which case you will also need to add the CLIENT_KEY_PASSWORD
parameter to your .env.local
.
If you're using a certificate signed by a private CA, you will also need to add the CA_PATH
parameter to your .env.local
. This parameter should point to the location of the CA certificate file on your local machine.
If you're using a self-signed certificate, e.g. for testing or development purposes, you can set the REJECT_UNAUTHORIZED
parameter to false
in your .env.local
. This will disable certificate validation, and allow Chat UI to connect to your custom endpoint.
A model can use any of the embedding models defined in .env.local
, (currently used when web searching),
by default it will use the first embedding model, but it can be changed with the field embeddingModel
:
TEXT_EMBEDDING_MODELS = `[
{
"name": "Xenova/gte-small",
"chunkCharLength": 512,
"endpoints": [
{"type": "transformersjs"}
]
},
{
"name": "intfloat/e5-base-v2",
"chunkCharLength": 768,
"endpoints": [
{"type": "tei", "url": "http://127.0.0.1:8080/", "authorization": "Basic VVNFUjpQQVNT"},
{"type": "tei", "url": "http://127.0.0.1:8081/"}
]
}
]`
MODELS=`[
{
"name": "Ollama Mistral",
"chatPromptTemplate": "...",
"embeddingModel": "intfloat/e5-base-v2"
"parameters": {
...
},
"endpoints": [
...
]
}
]`
Most likely you are running chat-ui over HTTP. The recommended option is to setup something like NGINX to handle HTTPS and proxy the requests to chat-ui. If you really need to run over HTTP you can add ALLOW_INSECURE_COOKIES=true
to your .env.local
.
Make sure to set your PUBLIC_ORIGIN
in your .env.local
to the correct URL as well.
Create a DOTENV_LOCAL
secret to your HF space with the content of your .env.local, and they will be picked up automatically when you run.
To create a production version of your app:
npm run build
You can preview the production build with npm run preview
.
To deploy your app, you may need to install an adapter for your target environment.
The config file for HuggingChat is stored in the chart/env/prod.yaml
file. It is the source of truth for the environment variables used for our CI/CD pipeline. For HuggingChat, as we need to customize the app color, as well as the base path, we build a custom docker image. You can find the workflow here.
[!TIP] If you want to make changes to the model config used in production for HuggingChat, you should do so against
chart/env/prod.yaml
.
If you want to run an exact copy of HuggingChat locally, you will need to do the following first:
- Create an OAuth App on the hub with
openid profile email
permissions. Make sure to set the callback URL to something likehttp://localhost:5173/chat/login/callback
which matches the right path for your local instance. - Create a HF Token with your Hugging Face account. You will need a Pro account to be able to access some of the larger models available through HuggingChat.
- Create a free account with serper.dev (you will get 2500 free search queries)
- Run an instance of mongoDB, however you want. (Local or remote)
You can then create a new .env.SECRET_CONFIG
file with the following content
MONGODB_URL=<link to your mongo DB from step 4>
HF_TOKEN=<your HF token from step 2>
OPENID_CONFIG=`{
PROVIDER_URL: "https://huggingface.co",
CLIENT_ID: "<your client ID from step 1>",
CLIENT_SECRET: "<your client secret from step 1>",
}`
SERPER_API_KEY=<your serper API key from step 3>
MESSAGES_BEFORE_LOGIN=<can be any numerical value, or set to 0 to require login>
You can then run npm run updateLocalEnv
in the root of chat-ui. This will create a .env.local
file which combines the chart/env/prod.yaml
and the .env.SECRET_CONFIG
file. You can then run npm run dev
to start your local instance of HuggingChat.
[!WARNING] The
MONGODB_URL
used for this script will be fetched from.env.local
. Make sure it's correct! The command runs directly on the database.
You can populate the database using faker data using the populate
script:
npm run populate <flags here>
At least one flag must be specified, the following flags are available:
-
reset
- resets the database -
all
- populates all tables -
users
- populates the users table -
settings
- populates the settings table for existing users -
assistants
- populates the assistants table for existing users -
conversations
- populates the conversations table for existing users
For example, you could use it like so:
npm run populate reset
to clear out the database. Then login in the app to create your user and run the following command:
npm run populate users settings assistants conversations
to populate the database with fake data, including fake conversations and assistants for your user.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for chat-ui
Similar Open Source Tools
chat-ui
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
openmacro
Openmacro is a multimodal personal agent that allows users to run code locally. It acts as a personal agent capable of completing and automating tasks autonomously via self-prompting. The tool provides a CLI natural-language interface for completing and automating tasks, analyzing and plotting data, browsing the web, and manipulating files. Currently, it supports API keys for models powered by SambaNova, with plans to add support for other hosts like OpenAI and Anthropic in future versions.
mistreevous
Mistreevous is a library written in TypeScript for Node and browsers, used to declaratively define, build, and execute behaviour trees for creating complex AI. It allows defining trees with JSON or a minimal DSL, providing in-browser editor and visualizer. The tool offers methods for tree state, stepping, resetting, and getting node details, along with various composite, decorator, leaf nodes, callbacks, guards, and global functions/subtrees. Version history includes updates for node types, callbacks, global functions, and TypeScript conversion.
ruby-openai
Use the OpenAI API with Ruby! 🤖🩵 Stream text with GPT-4, transcribe and translate audio with Whisper, or create images with DALL·E... Hire me | 🎮 Ruby AI Builders Discord | 🐦 Twitter | 🧠 Anthropic Gem | 🚂 Midjourney Gem ## Table of Contents * Ruby OpenAI * Table of Contents * Installation * Bundler * Gem install * Usage * Quickstart * With Config * Custom timeout or base URI * Extra Headers per Client * Logging * Errors * Faraday middleware * Azure * Ollama * Counting Tokens * Models * Examples * Chat * Streaming Chat * Vision * JSON Mode * Functions * Edits * Embeddings * Batches * Files * Finetunes * Assistants * Threads and Messages * Runs * Runs involving function tools * Image Generation * DALL·E 2 * DALL·E 3 * Image Edit * Image Variations * Moderations * Whisper * Translate * Transcribe * Speech * Errors * Development * Release * Contributing * License * Code of Conduct
008
008 is an open-source event-driven AI powered WebRTC Softphone compatible with macOS, Windows, and Linux. It is also accessible on the web. The name '008' or 'agent 008' reflects our ambition: beyond crafting the premier Open Source Softphone, we aim to introduce a programmable, event-driven AI agent. This agent utilizes embedded artificial intelligence models operating directly on the softphone, ensuring efficiency and reduced operational costs.
VectorETL
VectorETL is a lightweight ETL framework designed to assist Data & AI engineers in processing data for AI applications quickly. It streamlines the conversion of diverse data sources into vector embeddings and storage in various vector databases. The framework supports multiple data sources, embedding models, and vector database targets, simplifying the creation and management of vector search systems for semantic search, recommendation systems, and other vector-based operations.
llm-rag-workshop
The LLM RAG Workshop repository provides a workshop on using Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to generate and understand text in a human-like manner. It includes instructions on setting up the environment, indexing Zoomcamp FAQ documents, creating a Q&A system, and using OpenAI for generation based on retrieved information. The repository focuses on enhancing language model responses with retrieved information from external sources, such as document databases or search engines, to improve factual accuracy and relevance of generated text.
firecrawl
Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown. It crawls all accessible subpages and provides clean markdown for each, without requiring a sitemap. The API is easy to use and can be self-hosted. It also integrates with Langchain and Llama Index. The Python SDK makes it easy to crawl and scrape websites in Python code.
AICentral
AI Central is a powerful tool designed to take control of your AI services with minimal overhead. It is built on Asp.Net Core and dotnet 8, offering fast web-server performance. The tool enables advanced Azure APIm scenarios, PII stripping logging to Cosmos DB, token metrics through Open Telemetry, and intelligent routing features. AI Central supports various endpoint selection strategies, proxying asynchronous requests, custom OAuth2 authorization, circuit breakers, rate limiting, and extensibility through plugins. It provides an extensibility model for easy plugin development and offers enriched telemetry and logging capabilities for monitoring and insights.
motorhead
Motorhead is a memory and information retrieval server for LLMs. It provides three simple APIs to assist with memory handling in chat applications using LLMs. The first API, GET /sessions/:id/memory, returns messages up to a maximum window size. The second API, POST /sessions/:id/memory, allows you to send an array of messages to Motorhead for storage. The third API, DELETE /sessions/:id/memory, deletes the session's message list. Motorhead also features incremental summarization, where it processes half of the maximum window size of messages and summarizes them when the maximum is reached. Additionally, it supports searching by text query using vector search. Motorhead is configurable through environment variables, including the maximum window size, whether to enable long-term memory, the model used for incremental summarization, the server port, your OpenAI API key, and the Redis URL.
bolna
Bolna is an open-source platform for building voice-driven conversational applications using large language models (LLMs). It provides a comprehensive set of tools and integrations to handle various aspects of voice-based interactions, including telephony, transcription, LLM-based conversation handling, and text-to-speech synthesis. Bolna simplifies the process of creating voice agents that can perform tasks such as initiating phone calls, transcribing conversations, generating LLM-powered responses, and synthesizing speech. It supports multiple providers for each component, allowing users to customize their setup based on their specific needs. Bolna is designed to be easy to use, with a straightforward local setup process and well-documented APIs. It is also extensible, enabling users to integrate with other telephony providers or add custom functionality.
langchain-extract
LangChain Extract is a simple web server that allows you to extract information from text and files using LLMs. It is built using FastAPI, LangChain, and Postgresql. The backend closely follows the extraction use-case documentation and provides a reference implementation of an app that helps to do extraction over data using LLMs. This repository is meant to be a starting point for building your own extraction application which may have slightly different requirements or use cases.
aiocsv
aiocsv is a Python module that provides asynchronous CSV reading and writing. It is designed to be a drop-in replacement for the Python's builtin csv module, but with the added benefit of being able to read and write CSV files asynchronously. This makes it ideal for use in applications that need to process large CSV files efficiently.
eos-airdrops
This repository contains a list of EOS airdrops. Airdrops are a way for projects to distribute tokens to their community. They can be used to reward early adopters, promote the project, or raise funds. This repository includes airdrops for a variety of projects, including both new and established projects.
trex
Trex is a tool that transforms unstructured data into structured data by specifying a regex or context-free grammar. It intelligently restructures data to conform to the defined schema. It offers a Python client for installation and requires an API key obtained by signing up at automorphic.ai. The tool supports generating structured JSON objects based on user-defined schemas and prompts. Trex aims to provide significant speed improvements, structured custom CFG and regex generation, and generation from JSON schema. Future plans include auto-prompt generation for unstructured ETL and more intelligent models.
vim-ai
vim-ai is a plugin that adds Artificial Intelligence (AI) capabilities to Vim and Neovim. It allows users to generate code, edit text, and have interactive conversations with GPT models powered by OpenAI's API. The plugin uses OpenAI's API to generate responses, requiring users to set up an account and obtain an API key. It supports various commands for text generation, editing, and chat interactions, providing a seamless integration of AI features into the Vim text editor environment.
For similar tasks
blog
这是一个程序员关于 ChatGPT 学习过程的记录,其中包括了 ChatGPT 的使用技巧、相关工具和资源的整理,以及一些个人见解和思考。 **使用技巧** * **充值 OpenAI API**:可以通过 https://beta.openai.com/account/api-keys 进行充值,支持信用卡和 PayPal。 * **使用专梯**:推荐使用稳定的专梯,可以有效提高 ChatGPT 的访问速度和稳定性。 * **使用魔法**:可以通过 https://my.x-air.app:666/#/register?aff=32853 访问 ChatGPT,无需魔法即可访问。 * **下载各种 apk**:可以通过 https://apkcombo.com 下载各种安卓应用的 apk 文件。 * **ChatGPT 官网**:ChatGPT 的官方网站是 https://ai.com。 * **Midjourney**:Midjourney 是一个生成式 AI 图像平台,可以通过 https://midjourney.com 访问。 * **文本转视频**:可以通过 https://www.d-id.com 将文本转换为视频。 * **国内大模型**:国内也有很多大模型,如阿里巴巴的通义千问、百度文心一言、讯飞星火、阿里巴巴通义听悟等。 * **查看 OpenAI 状态**:可以通过 https://status.openai.com/ 查看 OpenAI 的服务状态。 * **Canva 画图**:Canva 是一个在线平面设计平台,可以通过 https://www.canva.cn 进行画图。 **相关工具和资源** * **文字转语音**:可以通过 https://modelscope.cn/models?page=1&tasks=text-to-speech&type=audio 找到文字转语音的模型。 * **可好好玩玩的项目**: * https://github.com/sunner/ChatALL * https://github.com/labring/FastGPT * https://github.com/songquanpeng/one-api * **个人博客**: * https://baoyu.io/ * https://gorden-sun.notion.site/527689cd2b294e60912f040095e803c5?v=4f6cc12006c94f47aee4dc909511aeb5 * **srt 2 lrc 歌词**:可以通过 https://gotranscript.com/subtitle-converter 将 srt 格式的字幕转换为 lrc 格式的歌词。 * **5 种速率限制**:OpenAI API 有 5 种速率限制:RPM(每分钟请求数)、RPD(每天请求数)、TPM(每分钟 tokens 数量)、TPD(每天 tokens 数量)、IPM(每分钟图像数量)。 * **扣子平台**:coze.cn 是一个扣子平台,可以提供各种扣子。 * **通过云函数免费使用 GPT-3.5**:可以通过 https://juejin.cn/post/7353849549540589587 免费使用 GPT-3.5。 * **不蒜子 统计网页基数**:可以通过 https://busuanzi.ibruce.info/ 统计网页的基数。 * **视频总结和翻译网页**:可以通过 https://glarity.app/zh-CN 总结和翻译视频。 * **视频翻译和配音工具**:可以通过 https://github.com/jianchang512/pyvideotrans 翻译和配音视频。 * **文字生成音频**:可以通过 https://www.cnblogs.com/jijunjian/p/18118366 将文字生成音频。 * **memo ai**:memo.ac 是一个多模态 AI 平台,可以将视频链接、播客链接、本地音视频转换为文字,支持多语言转录后翻译,还可以将文字转换为新的音频。 * **视频总结工具**:可以通过 https://summarize.ing/ 总结视频。 * **可每天免费玩玩**:可以通过 https://www.perplexity.ai/ 每天免费玩玩。 * **Suno.ai**:Suno.ai 是一个 AI 语言模型,可以通过 https://bibigpt.co/ 访问。 * **CapCut**:CapCut 是一个视频编辑软件,可以通过 https://www.capcut.cn/ 下载。 * **Valla.ai**:Valla.ai 是一个多模态 AI 模型,可以通过 https://www.valla.ai/ 访问。 * **Viggle.ai**:Viggle.ai 是一个 AI 视频生成平台,可以通过 https://viggle.ai 访问。 * **使用免费的 GPU 部署文生图大模型**:可以通过 https://www.cnblogs.com/xuxiaona/p/18088404 部署文生图大模型。 * **语音转文字**:可以通过 https://speech.microsoft.com/portal 将语音转换为文字。 * **投资界的 ai**:可以通过 https://reportify.cc/ 了解投资界的 ai。 * **抓取小视频 app 的各种信息**:可以通过 https://github.com/NanmiCoder/MediaCrawler 抓取小视频 app 的各种信息。 * **马斯克 Grok1 开源**:马斯克的 Grok1 模型已经开源,可以通过 https://github.com/xai-org/grok-1 访问。 * **ChatALL**:ChatALL 是一个跨端支持的聊天机器人,可以通过 https://github.com/sunner/ChatALL 访问。 * **零一万物**:零一万物是一个 AI 平台,可以通过 https://www.01.ai/cn 访问。 * **智普**:智普是一个 AI 语言模型,可以通过 https://chatglm.cn/ 访问。 * **memo ai 下载**:可以通过 https://memo.ac/ 下载 memo ai。 * **ffmpeg 学习**:可以通过 https://www.ruanyifeng.com/blog/2020/01/ffmpeg.html 学习 ffmpeg。 * **自动生成文章小工具**:可以通过 https://www.cognition-labs.com/blog 生成文章。 * **简易商城**:可以通过 https://www.cnblogs.com/whuanle/p/18086537 搭建简易商城。 * **物联网**:可以通过 https://www.cnblogs.com/xuxiaona/p/18088404 学习物联网。 * **自定义表单、自定义列表、自定义上传和下载、自定义流程、自定义报表**:可以通过 https://www.cnblogs.com/whuanle/p/18086537 实现自定义表单、自定义列表、自定义上传和下载、自定义流程、自定义报表。 **个人见解和思考** * ChatGPT 是一个强大的工具,可以用来提高工作效率和创造力。 * ChatGPT 的使用门槛较低,即使是非技术人员也可以轻松上手。 * ChatGPT 的发展速度非常快,未来可能会对各个行业产生深远的影响。 * 我们应该理性看待 ChatGPT,既要看到它的优点,也要意识到它的局限性。 * 我们应该积极探索 ChatGPT 的应用场景,为社会创造价值。
chat-ui
A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.
ChatterUI
ChatterUI is a mobile app that allows users to manage chat files and character cards, and to interact with Large Language Models (LLMs). It supports multiple backends, including local, koboldcpp, text-generation-webui, Generic Text Completions, AI Horde, Mancer, Open Router, and OpenAI. ChatterUI provides a mobile-friendly interface for interacting with LLMs, making it easy to use them for a variety of tasks, such as generating text, translating languages, writing code, and answering questions.
99AI
99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.
chatnio
Chat Nio is a next-generation AI one-stop solution that provides a rich and user-friendly interface for interacting with various AI models. It offers features such as AI chat conversation, rich format compatibility, markdown support, message menu support, multi-platform adaptation, dialogue memory, full-model file parsing, full-model DuckDuckGo online search, full-screen large text editing, model marketplace, preset support, site announcements, preference settings, internationalization support, and a rich admin system. Chat Nio also boasts a powerful channel management system that utilizes a self-developed channel distribution algorithm, supports multi-channel management, is compatible with multiple formats, allows for custom models, supports channel retries, enables balanced load within the same channel, and provides channel model mapping and user grouping. Additionally, Chat Nio offers forwarding API services that are compatible with multiple formats in the OpenAI universal format and support multiple model compatible layers. It also provides a custom build and install option for highly customizable deployments. Chat Nio is an open-source project licensed under the Apache License 2.0 and welcomes contributions from the community.
Awesome-LLM-Reasoning
**Curated collection of papers and resources on how to unlock the reasoning ability of LLMs and MLLMs.** **Description in less than 400 words, no line breaks and quotation marks.** Large Language Models (LLMs) have revolutionized the NLP landscape, showing improved performance and sample efficiency over smaller models. However, increasing model size alone has not proved sufficient for high performance on challenging reasoning tasks, such as solving arithmetic or commonsense problems. This curated collection of papers and resources presents the latest advancements in unlocking the reasoning abilities of LLMs and Multimodal LLMs (MLLMs). It covers various techniques, benchmarks, and applications, providing a comprehensive overview of the field. **5 jobs suitable for this tool, in lowercase letters.** - content writer - researcher - data analyst - software engineer - product manager **Keywords of the tool, in lowercase letters.** - llm - reasoning - multimodal - chain-of-thought - prompt engineering **5 specific tasks user can use this tool to do, in less than 3 words, Verb + noun form, in daily spoken language.** - write a story - answer a question - translate a language - generate code - summarize a document
Chinese-LLaMA-Alpaca-2
Chinese-LLaMA-Alpaca-2 is a large Chinese language model developed by Meta AI. It is based on the Llama-2 model and has been further trained on a large dataset of Chinese text. Chinese-LLaMA-Alpaca-2 can be used for a variety of natural language processing tasks, including text generation, question answering, and machine translation. Here are some of the key features of Chinese-LLaMA-Alpaca-2: * It is the largest Chinese language model ever trained, with 13 billion parameters. * It is trained on a massive dataset of Chinese text, including books, news articles, and social media posts. * It can be used for a variety of natural language processing tasks, including text generation, question answering, and machine translation. * It is open-source and available for anyone to use. Chinese-LLaMA-Alpaca-2 is a powerful tool that can be used to improve the performance of a wide range of natural language processing tasks. It is a valuable resource for researchers and developers working in the field of artificial intelligence.
Linly-Talker
Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) 🤖, Automatic Speech Recognition (ASR) 🎙️, Text-to-Speech (TTS) 🗣️, and voice cloning technology 🎤. This system offers an interactive web interface through the Gradio platform 🌐, allowing users to upload images 📷 and engage in personalized dialogues with AI 💬.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.