ai-dial-core

ai-dial-core

The main component of AI DIAL, which provides unified API to different chat completion and embedding models, assistants, and applications

Stars: 472

Visit
 screenshot

AI DIAL Core is an HTTP Proxy that provides a unified API to different chat completion and embedding models, assistants, and applications. It is written in Java 17 and built on Eclipse Vert.x. The core functionality includes handling static and dynamic settings, deployment on Kubernetes using Helm charts, and storing user data in Blob Storage and Redis. It supports various identity providers, storage providers like AWS S3, Google Cloud Storage, and Azure Blob Store, and features like AI DIAL Addons, Interceptors, Assistants, Applications, and Models with customizable parameters and configurations.

README:

DIAL CORE

Overview

HTTP Proxy provides unified API to different chat completion and embedding models, assistants and applications. Written in Java 17 and built on top of Eclipse Vert.x.

Build

Build the project with Gradle and Java 17:

./gradlew build

Run

Run the project with Gradle:

./gradlew run

Or run com.epam.aidial.core.AIDial class from your favorite IDE.

Helm Deployment

You have the option to deploy the AI DIAL Core on the Kubernetes cluster by utilizing an umbrella dial Helm chart, which also deploys other AI DIAL components. Alternatively, you can use dial-core Helm chart to deploy just Core.

Refer to Examples for guidelines.

In any case, in your Helm values file, it is necessary to provide application's configurations described in the Configuration section.

Configuration

Static settings

Static settings are used on startup and cannot be changed while application is running. Refer to example to view the example configuration file.

Priority order:

  1. Environment variables with extra "aidial." prefix. E.g. "aidial.server.port", "aidial.config.files".
  2. File specified in "AIDIAL_SETTINGS" environment variable.
  3. Default resource file: src/main/resources/aidial.settings.json.
Setting Default Required Description
config.files aidial.config.json No List of paths to dynamic settings. Refer to example of the file with dynamic settings.
config.reload 60000 No Config reload interval in milliseconds.
config.jsonMergeStrategy.overwriteArrays false No Specifies a merging strategy for JSON arrays. If it's set to true, arrays will be overwritten. Otherwise, they will be concatenated.
identityProviders - Yes Map of identity providers. Note: At least one identity provider must be provided. Refer to examples to view available providers. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.jwksUrl - Optional Url to jwks provider. Required if disabledVerifyJwt is set to false. Note: Either jwksUrl or userInfoEndpoint must be provided.
identityProviders.*.userInfoEndpoint - Optional Url to user info endpoint. Note: Either jwksUrl or userInfoEndpoint must be provided or disableJwtVerification is unset. Refer to Google example.
identityProviders.*.rolePath - Yes Path(s) to the claim user roles in JWT token or user info response, e.g. resource_access.chatbot-ui.roles or just roles. Can be single String or Array of Strings. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.projectPath - No Path(s) to the claim in JWT token or user info response, e.g. azp, aud or some.path.client from which project name can be taken. Can be single String. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.rolesDelimiter - No Delimiter to split roles into array in case when list of roles presented as single String. e.g. "rolesDelimiter": " "
identityProviders.*.loggingKey - No User information to search in claims of JWT token. email or sub should be sufficient in most cases. Note: email might be unavailable for some IDPs. Please check your IDP documentation in this case.
identityProviders.*.loggingSalt - No Salt to hash user information for logging.
identityProviders.*.positiveCacheExpirationMs 600000 No How long to retain JWKS response in the cache in case of successfull response.
identityProviders.*.negativeCacheExpirationMs 10000 No How long to retain JWKS response in the cache in case of failed response.
identityProviders.*.issuerPattern - No Regexp to match the claim "iss" to identity provider.
identityProviders.*.disableJwtVerification false No The flag disables JWT verification. Note. userInfoEndpoint must be unset if the flag is set to true.
vertx.* - No Vertx settings. Refer to vertx.io to learn more.
server.* - No Vertx HTTP server settings for incoming requests.
client.* - No Vertx HTTP client settings for outbound requests.
storage.provider filesystem Yes Specifies blob storage provider. Supported providers: s3, aws-s3, azureblob, google-cloud-storage, filesystem. See examples in the sections below.
storage.endpoint - Optional Specifies endpoint url for s3 compatible storages. Note: The setting might be required. That depends on a concrete provider.
storage.identity - Optional Blob storage access key. Can be optional for filesystem, aws-s3, google-cloud-storage providers. Refer to sections in this document dedicated to specific storage providers.
storage.credential - Optional Blob storage secret key. Can be optional for filesystem, aws-s3, google-cloud-storage providers.
storage.bucket - No Blob storage bucket.
storage.overrides.* - No Key-value pairs to override storage settings. * might be any specific blob storage setting to be overridden. Refer to examples in the sections below.
storage.createBucket false No Indicates whether bucket should be created on start-up.
storage.prefix - No Base prefix for all stored resources. The purpose to use the same bucket for different environments, e.g. dev, prod, pre-prod. Must not contain path separators or any invalid chars.
storage.maxUploadedFileSize 536870912 No Maximum size in bytes of uploaded file. If a size of uploaded file exceeds the limit the server returns HTTP code 413
encryption.secret - No Secret is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
encryption.key - No Key is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
resources.maxSize 67108864 No Max allowed size in bytes for a resource.
resources.maxSizeToCache 1048576 No Max size in bytes for a resource to cache in Redis.
resources.syncPeriod 60000 No Period in milliseconds, how frequently check for resources to sync.
resources.syncDelay 120000 No Delay in milliseconds for a resource to be written back in object storage after last modification.
resources.syncBatch 4096 No How many resources to sync in one go.
resources.cacheExpiration 300000 No Expiration in milliseconds for synced resources in Redis.
resources.compressionMinSize 256 No Compress a resource with gzip if its size in bytes more or equal to this value.
redis.singleServerConfig.address - Yes Redis single server addresses, e.g. "redis://host:port". Either singleServerConfig or clusterServersConfig must be provided.
redis.clusterServersConfig.nodeAddresses - Yes Json array with Redis cluster server addresses, e.g. ["redis://host1:port1","redis://host2:port2"]. Either singleServerConfig or clusterServersConfig must be provided.
redis.provider.* - No Provider specific settings
redis.provider.name - Yes Provider name. The valid values are aws-elasti-cache(see instructions).
redis.provider.userId - Yes IAM-enabled user ID. Note. It's applied to aws-elasti-cache
redis.provider.region - Yes Geo region where the cache is located. Note. It's applied to aws-elasti-cache
redis.provider.clusterName - Yes Redis cluster name. Note. It's applied to aws-elasti-cache
redis.provider.serverless - Yes The flag indicates if the cache is serverless. Note. It's applied to aws-elasti-cache
invitations.ttlInSeconds 259200 No Invitation time to live in seconds.
access.admin.rules - No Matches claims from identity providers with the rules to figure out whether a user is allowed to perform admin actions, like deleting any resource or approving a publication. Example: [{"source": "roles", "function": "EQUAL", "targets": ["admin"]}]. If roles contain "admin, the actions are allowed.
applications.includeCustomApps false No The flag indicates whether custom applications should be included into openai listing
applications.controllerEndpoint - No The endpoint to Application Controller Web Service that manages deployments for applications with functions
applications.controllerTimeout 240000 No The timeout of operations to Application Controller Web Service
codeInterpreter.sessionImage - No The code interpreter session image to use
codeInterpreter.sessionTtl 600000 No The session time to leave after the last API call
codeInterpreter.checkPeriod 10000 No The interval at which to check active sessions for expiration
codeInterpreter.checkSize 256 No The maximum number of active sessions to check in single check

Storage requirements

AI DIAL Core stores user data in the following storages:

  • Blob Storage keeps permanent data.
  • Redis keeps volatile in-memory data for fast access.

AWS S3 Blob Store

There are two types of credential providers supported:

  • User credentials. You can create a service principle and authenticate using its secret from the Azure console.
  • Temporary credentials with IAM roles for service accounts.

User credentials

Set storage.credential to Secret Access Key and storage.identity - Access Key ID.

Temporary credentials

Follow instructions to setup your pod in AWS EKS. storage.credential and storage.identity must be unset.

Google Cloud Storage

There are two types of credential providers supported:

  • User credentials. You can create a service account and authenticate using its private key obtained from the Developer console.
  • Temporary credentials. Application default credentials (ADC).

User credentials

Set storage.credential to a path to the private key JSON file and storage.identity must be unset. Refer to the example below:

{
  "type": "service_account",
  "project_id": "<your_project_id>",
  "private_key_id": "<your_project_key_id>",
  "private_key": "-----BEGIN PRIVATE KEY-----\n<your_private_key>\n-----END PRIVATE KEY-----\n",
  "client_email": "gcp-dial-core@<your_project_id>.iam.gserviceaccount.com",
  "client_id": "<client_id>",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/gcp-dial-core.iam.gserviceaccount.com",
  "universe_domain": "googleapis.com"
}

Otherwise, storage.credential is a private key in PEM format and storage.identity is a client's email address.

Temporary credentials

Follow instructions to setup your pod in GKE. storage.credential and storage.identity must be unset. JClouds property jclouds.oauth.credential-type should be set to bearerTokenCredentials, refer to the below example.

{
  "storage": {
    "overrides": {
      "jclouds.oauth.credential-type": "bearerTokenCredentials"
    }
  }
}

Azure Blob Store

There are two types of credential providers supported:

  • User credentials. You can create a service principle and authenticate using its secret from the Azure console.
  • Temporary credentials with Azure AD Workload Identity.

User credentials

Set storage.credential to the service principle secret and storage.identity - service principle ID.

Temporary credentials

Follow instructions to setup your pod in Azure k8s. storage.credential and storage.identity must be unset.

This example demonstrates the properties to be overridden:

{
  "storage": {
    "endpoint": "https://<Azure Blob storage account>.blob.core.windows.net"
    "overrides": {
      "jclouds.azureblob.auth": "azureAd",
      "jclouds.oauth.credential-type": "bearerTokenCredentials"
    }
  }
}

Redis

Redis can be used as a cache with volatile-* eviction policies:

maxmemory 4G
maxmemory-policy volatile-lfu

Note: Redis will be strictly required in the upcoming releases 0.8+.

Dynamic settings

Dynamic settings are stored in JSON files, specified via "config.files" static setting, and reloaded at interval, specified via "config.reload" static setting. Refer to example.

Dynamic settings can include the following parameters:

Parameter Description
routes Path(s) for specific upstream routing or to respond with a configured body.
addons A list of deployed AI DIAL Addons and their parameters:
<addon_name>: Unique addon name.
addons.<addon_name> endpoint: AI DIAL Addon API for chat completions.
iconUrl: Icon path for the AI DIAL addon on UI.
description: Brief AI DIAL addon description.
displayName: AI DIAL addon name on UI.
inputAttachmentTypes: A list of allowed MIME types for the input attachments.
maxInputAttachments: Maximum number of input attachments (default is zero when inputAttachmentTypes is unset, otherwise, infinity)
forwardAuthToken: If flag is set to true forward Http header with authorization token to chat completion endpoint of the addon.
userRoles: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples.
interceptors A list of deployed AI DIAL Interceptors and their parameters:
<interceptor_name>: Unique interceptor name. Refer to Interceptors to learn more.
interceptors.<interceptor_name> endpoint: AI DIAL Interceptor API for chat completions.
iconUrl: Icon path for the AI DIAL Interceptor on UI.
description: Brief AI DIAL interceptor description.
displayName: AI DIAL interceptor name on UI.
forwardAuthToken: If flag is set to true forward Http header with authorization token to chat completion endpoint of the interceptor. Refer to Interceptors to learn more.
assistant A list of deployed AI DIAL Assistants and their parameters:
<assistant_name>: Unique assistan name.
assistant.endpoint Assistant main endpoint
assistant.assistants.<assistant_name> iconUrl: Icon path for the AI DIAL assistant on UI.
description: Brief AI DIAL assistant description.
displayName: AI DIAL assistant name on UI.
inputAttachmentTypes: A list of allowed MIME types for the input attachments.
maxInputAttachments: Maximum number of input attachments (default is zero when inputAttachmentTypes is unset, otherwise, infinity)
forwardAuthToken: If flag is set to true forward Http header with authorization token to chat completion endpoint of the assistant.
userRoles: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples.
descriptionKeywords: a list of keywords describes the model, e.g. code-gen, text2image.
assistant.assistants.<assistant_name>.defaults Default parameters are applied if a request doesn't contain them in OpenAI chat/completions API call
applications A list of deployed AI DIAL Applications and their parameters:
<application_name>: Unique application name.
applications.<application_name> endpoint: AI DIAL Application API for chat completions.
iconUrl: Icon path for the AI DIAL Application on UI.
description: Brief AI DIAL Application description.
displayName: AI DIAL Application name on UI.
inputAttachmentTypes: A list of allowed MIME types for the input attachments.
maxInputAttachments: Maximum number of input attachments (default is zero when inputAttachmentTypes is unset, otherwise, infinity)
forwardAuthToken: If flag is set to true forward Http header with authorization token to chat completion endpoint of the application.
userRoles: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples.
descriptionKeywords: a list of keywords describes the model, e.g. code-gen, text2image.
maxRetryAttempts: max retry attempts to route a single user request to the application's endpoint.
applications.<application_name>.defaults Default parameters are applied if a request doesn't contain them in OpenAI chat/completions API call
applications.<application_name>.interceptors A list of interceptors to be triggered for the given application. Refer to Interceptors to learn more.
applications.<application_name>.features rateEndpoint: endpoint for rate requests (exposed by DIAL Core as <deployment name>/rate).
tokenizeEndpoint: endpoint for requests to the model tokenizer (exposed by DIAL Core as <deployment name>/tokenize).
truncatePromptEndpoint: endpoint for truncating prompt requests (exposed by DIAL Core as <deployment name>/truncate_prompt).
systemPromptSupported: does the application support system prompt (default is true).
toolsSupported: does the application support tools (default is false).
seedSupported: does the application support seed request parameter (default is false).
urlAttachmentsSupported: does the application support attachments with URLs (default is false).
folderAttachmentsSupported: does the application support folder attachments (default is false)
configurationEndpoint: the endpoint to request application configuration parameters as JSON schema (exposed by DIAL Core as <deployment name>/configuration).
accessibleByPerRequestKey: indicates whether the deployment is accessible using a per-request API key (default is true).
contentPartsSupported: indicates whether the deployment supports requests with content parts or not (default is false).
models A list of deployed models and their parameters:
<model_name>: Unique model name.
models.<model_name> type: Model type—chat or embedding.
iconUrl: Icon path for the model on UI.
description: Brief model description.
displayName: Model name on UI.
displayVersion: Model version on UI.
endpoint: Model API for chat completions or embeddings.
tokenizerModel: Identifies the specific model whose tokenization algorithm exactly matches that of the referenced model. This is typically the name of the earliest-released model in a series of models sharing an identical tokenization algorithm (e.g. gpt-3.5-turbo-0301, gpt-4-0314, or gpt-4-1106-vision-preview). This parameter is essential for DIAL clients that reimplement tokenization algorithms on their side, instead of utilizing the tokenizeEndpoint provided by the model.
features: Model features.
limits: Model token limits.
pricing: Model pricing.
upstreams: Used for load-balancing—request is sent to model endpoint containing X-UPSTREAM-ENDPOINT and X-UPSTREAM-KEY headers.
userRoles: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples.
descriptionKeywords: a list of keywords describes the model, e.g. code-gen, text2image.
maxRetryAttempts: max retry attempts to route a single user request to upstreams
models.<model_name>.limits maxPromptTokens: maximum number of tokens in a completion request.
maxCompletionTokens: maximum number of tokens in a completion response.
maxTotalTokens: maximum number of tokens in completion request and response combined.
Typically either maxTotalTokens is specified or maxPromptTokens and maxCompletionTokens.
models.<model_name>.pricing unit: the pricing units (currently token and char_without_whitespace are supported).
prompt: per-unit price for the completion request in USD.
completion: per-unit price for the completion response in USD.
models.<model_name>.features rateEndpoint: endpoint for rate requests (exposed by core as <deployment name>/rate).
tokenizeEndpoint: endpoint for requests to the model tokenizer (exposed by DIAL Core as <deployment name>/tokenize).
truncatePromptEndpoint: endpoint for truncating prompt requests (exposed by DIAL Core as <deployment name>/truncate_prompt).
systemPromptSupported: does the model support system prompt (default is true).
toolsSupported: does the model support tools (default is false).
seedSupported: does the model support seed request parameter (default is false).
urlAttachmentsSupported: does the model/application support attachments with URLs (default is false).
folderAttachmentsSupported: does the model/application support folder attachments (default is false)
accessibleByPerRequestKey: indicates whether the deployment is accessible using a per-request API key (default is true).
contentPartsSupported: indicates whether the deployment supports requests with content parts or not (default is false).
models.<model_name>.upstreams endpoint: Model endpoint.
key: Your API key.
weight: Weight for upstream endpoint; positive number represents an endpoint capacity, zero or negative disables this enpoint from routing. Default value: 1.
tier: Specifies tier group for the endpoint. Only positive numbers allowed. All requests will be routed to the endpoints with the highest tier (the lowest tier value), other endpoints (with lower tier/higher tier value) may be used only if the highest tier endpoints are unavailable. Default value: 0 - highest tier. Refer to Load Balancer to learn more.
extraData: Additional metadata containing any information that is passed to the upstream's endpoint. It can be a JSON or String.
models.<model_name>.defaults Default parameters are applied if a request doesn't contain them in OpenAI chat/completions API call
models.<model_name>.interceptors A list of interceptors to be triggered for the given model. Refer to Interceptors to learn more.
keys API Keys parameters:
<core_key>: Your API key. Refer to API Keys to learn more.
keys.<core_key> project: Project name is assigned to this key. Required
role: a role to be assigned to the key. Note: a key is invalid if role and roles are missed.
roles: a list of roles to be assigned to the key. Note: a key is invalid if role and roles are missed.
secured: the flag indicates if the key is secured. If it's set to true user request and deployment response won't be saved to the prompt log storage.
roles API key or user roles. Each role may have limits to be associated with applications, models, assistants or addons. Refer to API Keys to learn more.
roles.<role_name> limits: Limits for models, applications, or assistants. Note: it is necessary to define this for a role.
roles.<role_name>.limits minute: Total tokens per minute limit sent to the model, managed via floating window approach for well-distributed rate limiting. If it's not set the default value is unlimited
day: Total tokens per day limit sent to the model, managed via floating window approach for balanced rate limiting.
week: Total tokens per week limit sent to the model, managed via floating window approach for balanced rate limiting.
month: Total tokens per month limit sent to the model, managed via floating window approach for balanced rate limiting.
Note: you can skip these parameters to apply their default value - unlimited.
retriableErrorCodes List of retriable error codes for handling outages at LLM providers.

License

Copyright (C) 2024 EPAM Systems

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for ai-dial-core

Similar Open Source Tools

For similar tasks

For similar jobs