ai-dial-core

The main component of AI DIAL, which provides unified API to different chat completion and embedding models, assistants, and applications

Stars: 495

Visit

AI DIAL Core is an HTTP Proxy that provides a unified API to different chat completion and embedding models, assistants, and applications. It is written in Java 17 and built on Eclipse Vert.x. The core functionality includes handling static and dynamic settings, deployment on Kubernetes using Helm charts, and storing user data in Blob Storage and Redis. It supports various identity providers, storage providers like AWS S3, Google Cloud Storage, and Azure Blob Store, and features like AI DIAL Addons, Interceptors, Assistants, Applications, and Models with customizable parameters and configurations.

README:

DIAL CORE

Overview

HTTP Proxy provides unified API to different chat completion and embedding models, assistants and applications. Written in Java 21 and built on top of Eclipse Vert.x.

Build

DIAL Core has a dependency on GitHub packages of JClouds. Github doesn't provide anonymous access to packages.

That requires to pass credentials GitHub for access to published JClouds packages. See the code snippet below:

repositories {
        maven {
            url = uri("https://maven.pkg.github.com/epam/jclouds")
            credentials {
                username = project.findProperty("gpr.user") ?: System.getenv("GPR_USERNAME")
                password = project.findProperty("gpr.key") ?: System.getenv("GPR_PASSWORD")
            }
        }
        mavenCentral()
    }

You should set env variables GPR_USERNAME and GPR_PASSWORD to valid values, where GPR_USERNAME - GitHub username and GPR_PASSWORD - GitHub personal access token.

Note. The access token requires the permission read:packages.

See more details here to generate personal access token in GitHub.

Build the project with Gradle and Java 21:

./gradlew build

Run

Run the project with Gradle:

./gradlew :server:run

Or run com.epam.aidial.core.AIDial class from your favorite IDE.

Helm Deployment

You have the option to deploy the DIAL Core on the Kubernetes cluster by utilizing an umbrella dial Helm chart, which also deploys other DIAL components. Alternatively, you can use dial-core Helm chart to deploy just Core.

Refer to Examples for guidelines.

In any case, in your Helm values file, it is necessary to provide application's configurations described in the Configuration section.

Static settings

Static settings are used on startup and cannot be changed while application is running. Refer to example to view the example configuration file.

Priority order:

Environment variables with extra "aidial." prefix. E.g. "aidial.server.port", "aidial.config.files".
File specified in "AIDIAL_SETTINGS" environment variable.
Default resource file: src/main/resources/aidial.settings.json.

Setting	Default	Required	Description
config.files	aidial.config.json	No	List of paths to dynamic settings. Refer to example of the file with dynamic settings.
config.reload	60000	No	Config reload interval in milliseconds.
config.jsonMergeStrategy.overwriteArrays	false	No	Specifies a merging strategy for JSON arrays. If it's set to `true`, arrays will be overwritten. Otherwise, they will be concatenated.
identityProviders	-	Yes	Map of identity providers. Note: At least one identity provider must be provided. Refer to examples to view available providers. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.jwksUrl	-	Optional	Url to jwks provider. Required if `disabledVerifyJwt` is set to `false`. Note: Either `jwksUrl` or `userInfoEndpoint` must be provided.
identityProviders.*.userInfoEndpoint	-	Optional	Url to user info endpoint. Note: Either `jwksUrl` or `userInfoEndpoint` must be provided or `disableJwtVerification` is unset. Refer to Google example.
identityProviders.*.rolePath	-	Yes	Path(s) to the claim user roles in JWT token or user info response, e.g. `resource_access.chatbot-ui.roles` or just `roles`. Can be single String or Array of Strings. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.projectPath	-	No	Path(s) to the claim in JWT token or user info response, e.g. `azp`, `aud` or `some.path.client` from which project name can be taken. Can be single String. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.rolesDelimiter	-	No	Delimiter to split roles into array in case when list of roles presented as single String. e.g. `"rolesDelimiter": " "`
identityProviders.*.loggingKey	-	No	User information to search in claims of JWT token. `email` or `sub` should be sufficient in most cases. Note: `email` might be unavailable for some IDPs. Please check your IDP documentation in this case.
identityProviders.*.loggingSalt	-	No	Salt to hash user information for logging.
identityProviders.*.positiveCacheExpirationMs	600000	No	How long to retain JWKS response in the cache in case of successfull response.
identityProviders.*.negativeCacheExpirationMs	10000	No	How long to retain JWKS response in the cache in case of failed response.
identityProviders.*.issuerPattern	-	No	Regexp to match the claim "iss" to identity provider.
identityProviders.*.disableJwtVerification	false	No	The flag disables JWT verification. Note. `userInfoEndpoint` must be unset if the flag is set to `true`.
identityProviders.*.audience	-	No	If the setting is set it will be validated against the claim `aud` in JWT
identityProviders.*.userDisplayName	-	No	Path to the claim in JWT token or user info response where user display name can be taken.
toolsets.security.authorizationServers	-	No	Path(s) to the authorization server URLs trusted to issue access tokens for MCP clients.
toolsets.security.resourceSchema	https	No	Schema of the resource server. This URL schema is used to construct the resource identifier for token validation, as defined in RFC 9728. If not specified, the default value will be applied.
toolsets.security.resourceHost	-	No	The public, fully-qualified hostname of this resource server (e.g., api.example.com). This is used to construct the resource identifier for token validation per RFC 9728. If not set, the host is derived from the incoming request.
toolsets.security.scopesSupported	-	No	List of scope values, as defined in OAuth 2.0 [RFC6749], that are used in authorization requests to request access to this protected resource.
vertx.*	-	No	Vertx settings. Refer to vertx.io to learn more.
server.*	-	No	Vertx HTTP server settings for incoming requests.
client.*	-	No	Vertx HTTP client settings for outbound requests.
storage.provider	filesystem	Yes	Specifies blob storage provider. Supported providers: s3, aws-s3, azureblob, google-cloud-storage, filesystem. See examples in the sections below.
storage.endpoint	-	Optional	Specifies endpoint url for s3 compatible storages. Note: The setting might be required. That depends on a concrete provider.
storage.identity	-	Optional	Blob storage access key. Can be optional for filesystem, aws-s3, google-cloud-storage providers. Refer to sections in this document dedicated to specific storage providers.
storage.credential	-	Optional	Blob storage secret key. Can be optional for filesystem, aws-s3, google-cloud-storage providers.
storage.bucket	-	No	Blob storage bucket.
storage.overrides.*	-	No	Key-value pairs to override storage settings. `*` might be any specific blob storage setting to be overridden. Refer to examples in the sections below.
storage.createBucket	false	No	Indicates whether bucket should be created on start-up.
storage.prefix	-	No	Base prefix for all stored resources. The purpose to use the same bucket for different environments, e.g. dev, prod, pre-prod. Must not contain path separators or any invalid chars.
storage.maxUploadedFileSize	536870912	No	Maximum size in bytes of uploaded file. If a size of uploaded file exceeds the limit the server returns HTTP code 413
encryption.secret	-	No	Secret is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
encryption.key	-	No	Key is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
resources.maxSize	67108864	No	Max allowed size in bytes for a resource.
resources.maxSizeToCache	1048576	No	Max size in bytes for a resource to cache in Redis.
resources.syncPeriod	60000	No	Period in milliseconds, how frequently check for resources to sync.
resources.syncDelay	120000	No	Delay in milliseconds for a resource to be written back in object storage after last modification.
resources.syncBatch	4096	No	How many resources to sync in one go.
resources.cacheExpiration	300000	No	Expiration in milliseconds for synced resources in Redis.
resources.compressionMinSize	256	No	Compress a resource with gzip if its size in bytes more or equal to this value.
redis.singleServerConfig.address	-	Yes	Redis single server addresses, e.g. "redis://host:port". Either `singleServerConfig` or `clusterServersConfig` must be provided.
redis.clusterServersConfig.nodeAddresses	-	Yes	Json array with Redis cluster server addresses, e.g. ["redis://host1:port1","redis://host2:port2"]. Either `singleServerConfig` or `clusterServersConfig` must be provided.
redis.provider.*	-	No	Provider specific settings
redis.provider.name	-	Yes	Provider name. The valid values are `aws-elasti-cache`(see instructions), `gcp-memory-store`(see instructions), `azure-redis-cache`(see instructions.
redis.provider.userId	-	Yes	IAM-enabled user ID. Note. It's applied to `aws-elasti-cache`
redis.provider.accountName	-	Yes	The resource name of the service account for which the credentials are requested, in the following format: `projects/-/serviceAccounts/{ACCOUNT_EMAIL_OR_UNIQUEID}`. The `-` wildcard character is required; replacing it with a project ID is invalid. Note. It's applied to `gcp-memory-store`
redis.provider.region	-	Yes	Geo region where the cache is located. Note. It's applied to `aws-elasti-cache`
redis.provider.clusterName	-	Yes	Redis cluster name. Note. It's applied to `aws-elasti-cache`
redis.provider.serverless	-	Yes	The flag indicates if the cache is serverless. Note. It's applied to `aws-elasti-cache`
invitations.ttlInSeconds	259200	No	Invitation time to live in seconds.
access.admin.rules	-	No	Matches claims from identity providers with the rules to figure out whether a user is allowed to perform admin actions (READ and WRITE access to any resource, approving publication requests from DIAL users. Configuration example for DIAL Core: "access": {"admin": {"rules": [{"function": "EQUAL","source": "roles","targets": ["admin"]}]}} Where, `function` - a matching function one of TRUE (any user is admin), FALSE (noone is admin), EQUAL, CONTAIN, REGEX `source` - the path to the claim in the JWT token payload that should be evaluated against the targets. `targets` - is an array of values that the system checks for in the source claim.
access.createCodeAppRoles	-	No	The list of user roles to be allowed to create custom code applications or run code interpreter. Note. Calls by per request key are permitted even if the originator doesn't have permissions.
applications.includeCustomApps	false	No	The flag indicates whether applications should be included into openai listing (required for Code Apps, Custom Apps, Quick Apps, etc)
applications.controllerEndpoint	-	No	The endpoint to Application Controller Web Service that manages deployments for applications with functions
applications.controllerTimeout	240000	No	The timeout of operations to Application Controller Web Service
codeInterpreter.sessionImage	-	No	The code interpreter session image to use
codeInterpreter.sessionProxyUrl	-	No	The code interpreter will be deployed as a pod instead of knative deployment and all requests will be proxied through nginx proxy
codeInterpreter.sessionTtl	600000	No	The session time to leave after the last API call
codeInterpreter.checkPeriod	10000	No	The interval at which to check active sessions for expiration
codeInterpreter.checkSize	256	No	The maximum number of active sessions to check in single check
perRequestApiKey.ttl	1800	No	The TTL in seconds of per request API key
asyncTaskExecutor.useVirtualThreads	true	No	The flag determines if virtual threads are used to run blocking tasks or platform threads.

Storage requirements

DIAL Core stores user data in the following storages:

Blob Storage keeps permanent data.
Redis keeps volatile in-memory data for fast access.

Refer to Storage Requirements to learn more.

Dynamic settings

Dynamic settings are stored in JSON files, specified via "config.files" static setting, and reloaded at interval, specified via "config.reload" static setting. Refer to example.

Dynamic settings can include the following parameters:

Parameter	Description
routes	A list of registered routes in DIAL Core. Refer to Routes to see dynamic settings.
interceptors	A list of deployed DIAL Interceptors and their parameters. Refer to Interceptors to see dynamic settings.
applications	A list of deployed applications and their parameters. Refer to Applications to see dynamic settings.
models	A list of deployed models and their parameters. Refer to Models to see dynamic settings.
toolsets	A list of available toolsets and their parameters. Refer to Toolsets to see dynamic settings.
roles	API key or JWT roles and their parameters. Refer to Roles to see dynamic settings.
keys	API keys and their parameters. Refer to API Keys to see dynamic settings.
retriableErrorCodes	List of retriable error codes for handling outages at LLM providers.
applicationTypeSchemas	Map of application schemas where key - schema ID, value - schema itself in JSON format. All schemas must be conformed to the root schema `https://dial.epam.com/application_type_schemas/schema#`. See link

Obsolete Dynamic Settings

Parameter	Description
addons	A list of deployed DIAL Addons and their parameters: `<addon_name>`: Unique addon name.
addons.<addon_name>	`endpoint`: DIAL Addon API for chat completions. `iconUrl`: Icon path for the DIAL addon on UI. `description`: Brief DIAL addon description. `displayName`: DIAL addon name on UI. `inputAttachmentTypes`: A list of allowed MIME types for the input attachments. `maxInputAttachments`: Maximum number of input attachments (default is zero when `inputAttachmentTypes` is unset, otherwise, infinity) `forwardAuthToken`: If flag is set to `true` forward Http header with authorization token to chat completion endpoint of the addon. `userRoles`: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples. `author`: the addon's developer. `createdAt`: the date of the addon creation. `updatedAt`: the date of the last addon update.
assistant	A list of deployed DIAL Assistants and their parameters: `<assistant_name>`: Unique assistan name.
assistant.endpoint	Assistant main endpoint
assistant.assistants.<assistant_name>	`iconUrl`: Icon path for the DIAL assistant on UI. `description`: Brief DIAL assistant description. `displayName`: DIAL assistant name on UI. `inputAttachmentTypes`: A list of allowed MIME types for the input attachments. `maxInputAttachments`: Maximum number of input attachments (default is zero when `inputAttachmentTypes` is unset, otherwise, infinity) `forwardAuthToken`: If flag is set to `true` forward Http header with authorization token to chat completion endpoint of the assistant. `userRoles`: a specific claim value provided by a specific IDP. Refer to IDP Configuration to view examples. `descriptionKeywords`: a list of keywords describes the assistant, e.g. `code-gen`, `text2image`. `author`: the assistant's developer. `createdAt`: the date of the assistant creation. `updatedAt`: the date of the last assistant update.
assistant.assistants.<assistant_name>.defaults	Default parameters are applied if a request doesn't contain them in OpenAI `chat/completions` API call

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

For Tasks:

Click tags to check more tools for each tasks

build project run project deploy on kubernetes configure static settings store user data

For Jobs:

java developer software engineer backend developer devops engineer cloud architect

Alternative AI tools for ai-dial-core

Similar Open Source Tools

ai-dial-core

github

: 495

weblinx

WebLINX is a Python library and dataset for real-world website navigation with multi-turn dialogue. The repository provides code for training models reported in the WebLINX paper, along with a comprehensive API to work with the dataset. It includes modules for data processing, model evaluation, and utility functions. The modeling directory contains code for processing, training, and evaluating models such as DMR, LLaMA, MindAct, Pix2Act, and Flan-T5. Users can install specific dependencies for HTML processing, video processing, model evaluation, and library development. The evaluation module provides metrics and functions for evaluating models, with ongoing work to improve documentation and functionality.

github

: 112

OneKE

OneKE is a flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news. It employs a collaborative multi-agent approach and includes a user-customizable knowledge base to enable tailored extraction. OneKE offers various IE tasks support, data sources support, LLMs support, extraction method support, and knowledge base configuration. Users can start with examples using YAML, Python, or Web UI, and perform tasks like Named Entity Recognition, Relation Extraction, Event Extraction, Triple Extraction, and Open Domain IE. The tool supports different source formats like Plain Text, HTML, PDF, Word, TXT, and JSON files. Users can choose from various extraction models like OpenAI, DeepSeek, LLaMA, Qwen, ChatGLM, MiniCPM, and OneKE for information extraction tasks. Extraction methods include Schema Agent, Extraction Agent, and Reflection Agent. The tool also provides support for schema repository and case repository management, along with solutions for network issues. Contributors to the project include Ningyu Zhang, Haofen Wang, Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, and Huajun Chen.

github

: 57

vidur

Vidur is a high-fidelity and extensible LLM inference simulator designed for capacity planning, deployment configuration optimization, testing new research ideas, and studying system performance of models under different workloads and configurations. It supports various models and devices, offers chrome trace exports, and can be set up using mamba, venv, or conda. Users can run the simulator with various parameters and monitor metrics using wandb. Contributions are welcome, subject to a Contributor License Agreement and adherence to the Microsoft Open Source Code of Conduct.

github

: 241

llama-zip

llama-zip is a command-line utility for lossless text compression and decompression. It leverages a user-provided large language model (LLM) as the probabilistic model for an arithmetic coder, achieving high compression ratios for structured or natural language text. The tool is not limited by the LLM's maximum context length and can handle arbitrarily long input text. However, the speed of compression and decompression is limited by the LLM's inference speed.

github

: 158

llm-foundry

LLM Foundry is a codebase for training, finetuning, evaluating, and deploying LLMs for inference with Composer and the MosaicML platform. It is designed to be easy-to-use, efficient _and_ flexible, enabling rapid experimentation with the latest techniques. You'll find in this repo: * `llmfoundry/` - source code for models, datasets, callbacks, utilities, etc. * `scripts/` - scripts to run LLM workloads * `data_prep/` - convert text data from original sources to StreamingDataset format * `train/` - train or finetune HuggingFace and MPT models from 125M - 70B parameters * `train/benchmarking` - profile training throughput and MFU * `inference/` - convert models to HuggingFace or ONNX format, and generate responses * `inference/benchmarking` - profile inference latency and throughput * `eval/` - evaluate LLMs on academic (or custom) in-context-learning tasks * `mcli/` - launch any of these workloads using MCLI and the MosaicML platform * `TUTORIAL.md` - a deeper dive into the repo, example workflows, and FAQs

github

: 4.1k

skyvern

Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions. Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them. This approach gives us a few advantages: 1. Skyvern can operate on websites it’s never seen before, as it’s able to map visual elements to actions necessary to complete a workflow, without any customized code 2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16 2. If you were doing competitor analysis, it’s understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) Want to see examples of Skyvern in action? Jump to #real-world-examples-of- skyvern

github

: 14.5k

docker-aio

The docker-aio repository provides an accelerated mirror service for Docker users, allowing them to speed up image pulls by replacing original domains with corresponding accelerated domains. Users in Asia are advised to comply with local laws and regulations when using this service. The repository offers installation scripts and instructions on how to modify Docker configurations to utilize the accelerated mirrors effectively.

github

: 56

qlib

Qlib is an open-source, AI-oriented quantitative investment platform that supports diverse machine learning modeling paradigms, including supervised learning, market dynamics modeling, and reinforcement learning. It covers the entire chain of quantitative investment, from alpha seeking to order execution. The platform empowers researchers to explore ideas and implement productions using AI technologies in quantitative investment. Qlib collaboratively solves key challenges in quantitative investment by releasing state-of-the-art research works in various paradigms. It provides a full ML pipeline for data processing, model training, and back-testing, enabling users to perform tasks such as forecasting market patterns, adapting to market dynamics, and modeling continuous investment decisions.

github

: 18.0k

llama-recipes

The llama-recipes repository provides a scalable library for fine-tuning Llama 2, along with example scripts and notebooks to quickly get started with using the Llama 2 models in a variety of use-cases, including fine-tuning for domain adaptation and building LLM-based applications with Llama 2 and other tools in the LLM ecosystem. The examples here showcase how to run Llama 2 locally, in the cloud, and on-prem.

github

: 15.7k

call-center-ai

Call Center AI is an AI-powered call center solution leveraging Azure and OpenAI GPT. It allows for AI agent-initiated phone calls or direct calls to the bot from a configured phone number. The bot is customizable for various industries like insurance, IT support, and customer service, with features such as accessing claim information, conversation history, language change, SMS sending, and more. The project is a proof of concept showcasing the integration of Azure Communication Services, Azure Cognitive Services, and Azure OpenAI for an automated call center solution.

github

: 325

maxtext

MaxText is a high-performance, highly scalable, open-source LLM written in pure Python/Jax and targeting Google Cloud TPUs and GPUs for training and inference. MaxText achieves high MFUs and scales from single host to very large clusters while staying simple and "optimization-free" thanks to the power of Jax and the XLA compiler. MaxText aims to be a launching off point for ambitious LLM projects both in research and production. We encourage users to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet their needs.

github

: 1.5k

LEADS

LEADS is a lightweight embedded assisted driving system designed to simplify the development of instrumentation, control, and analysis systems for racing cars. It is written in Python and C/C++ with impressive performance. The system is customizable and provides abstract layers for component rearrangement. It supports hardware components like Raspberry Pi and Arduino, and can adapt to various hardware types. LEADS offers a modular structure with a focus on flexibility and lightweight design. It includes robust safety features, modern GUI design with dark mode support, high performance on different platforms, and powerful ESC systems for traction control and braking. The system also supports real-time data sharing, live video streaming, and AI-enhanced data analysis for driver training. LEADS VeC Remote Analyst enables transparency between the driver and pit crew, allowing real-time data sharing and analysis. The system is designed to be user-friendly, adaptable, and efficient for racing car development.

github

: 241

iw5_bot_warfare

IW5 Bot Warfare is a GSC mod for the PlutoniumIW5 project that adds playable AI to the multiplayer games of Modern Warfare 3. It features a Waypoint Editor for creating and modifying bot's waypoints, a customizable menu for editing bot DVARs, compatibility with other mods, AI clients that simulate real players, and various bot behaviors such as capturing objectives, using killstreaks, targeting equipment, and more. The mod aims to provide a comprehensive Combat Training experience for MW3 multiplayer games.

github

: 95

judges

The 'judges' repository is a small library designed for using and creating LLM-as-a-Judge evaluators. It offers a curated set of LLM evaluators in a low-friction format for various use cases, backed by research. Users can use these evaluators off-the-shelf or as inspiration for building custom LLM evaluators. The library provides two types of judges: Classifiers that return boolean values and Graders that return scores on a numerical or Likert scale. Users can combine multiple judges using the 'Jury' object and evaluate input-output pairs with the '.judge()' method. Additionally, the repository includes detailed instructions on picking a model, sending data to an LLM, using classifiers, combining judges, and creating custom LLM judges with 'AutoJudge'.

github

: 154

thepipe

The Pipe is a multimodal-first tool for feeding files and web pages into vision-language models such as GPT-4V. It is best for LLM and RAG applications that require a deep understanding of tricky data sources. The Pipe is available as a hosted API at thepi.pe, or it can be set up locally.

github

: 1.1k

For similar tasks

alog

ALog is an open-source project designed to facilitate the deployment of server-side code to Cloudflare. It provides a step-by-step guide on creating a Cloudflare worker, configuring environment variables, and updating API base URL. The project aims to simplify the process of deploying server-side code and interacting with OpenAI API. ALog is distributed under the GNU General Public License v2.0, allowing users to modify and distribute the app while adhering to App Store Review Guidelines.

github

: 416

crabml

Crabml is a llama.cpp compatible AI inference engine written in Rust, designed for efficient inference on various platforms with WebGPU support. It focuses on running inference tasks with SIMD acceleration and minimal memory requirements, supporting multiple models and quantization methods. The project is hackable, embeddable, and aims to provide high-performance AI inference capabilities.

github

: 412

chatllm.cpp

ChatLLM.cpp is a pure C++ implementation tool for real-time chatting with RAG on your computer. It supports inference of various models ranging from less than 1B to more than 300B. The tool provides accelerated memory-efficient CPU inference with quantization, optimized KV cache, and parallel computing. It allows streaming generation with a typewriter effect and continuous chatting with virtually unlimited content length. ChatLLM.cpp also offers features like Retrieval Augmented Generation (RAG), LoRA, Python/JavaScript/C bindings, web demo, and more possibilities. Users can clone the repository, quantize models, build the project using make or CMake, and run quantized models for interactive chatting.

github

: 708

ai-dial-core

github

: 495

coze-js

Coze-js is a monorepo containing packages for Coze API and Realtime API. It provides usage examples for Node.js and React Web, as well as full console and sample call up demos. The tool requires Node.js 18+, pnpm 9.12.0, and Rush 5.140.0 for installation. Developers can start developing projects within the repository by following the provided steps. Each package in the monorepo can be developed and published independently, with documentation on contributing guidelines and publishing. The tool is licensed under MIT.

github

: 90

mcp-framework

MCP-Framework is a TypeScript framework for building Model Context Protocol (MCP) servers with automatic directory-based discovery for tools, resources, and prompts. It provides powerful abstractions, simple server setup, and a CLI for rapid development and project scaffolding.

github

: 251

TheNinjaRPG

TheNinja-RPG is the official source code for the game www.TheNinja-RPG.com. It relies on external services for authentication, websockets, database, etc. Users need to sign up for free accounts on services like Clerk, UploadThing, and Replicate. The codebase provides various 'make' commands for setup, building, and database management. The project does not have a specific license and is under exclusive copyright protection.

github

: 77

langstream

LangStream is a tool for natural language processing tasks, providing a CLI for easy installation and usage. Users can try sample applications like Chat Completions and create their own applications using the developer documentation. It supports running on Kubernetes for production-ready deployment, with support for various Kubernetes distributions and external components like Apache Kafka or Apache Pulsar cluster. Users can deploy LangStream locally using minikube and manage the cluster with mini-langstream. Development requirements include Docker, Java 17, Git, Python 3.11+, and PIP, with the option to test local code changes using mini-langstream.

github

: 366

For similar jobs

resonance

Resonance is a framework designed to facilitate interoperability and messaging between services in your infrastructure and beyond. It provides AI capabilities and takes full advantage of asynchronous PHP, built on top of Swoole. With Resonance, you can: * Chat with Open-Source LLMs: Create prompt controllers to directly answer user's prompts. LLM takes care of determining user's intention, so you can focus on taking appropriate action. * Asynchronous Where it Matters: Respond asynchronously to incoming RPC or WebSocket messages (or both combined) with little overhead. You can set up all the asynchronous features using attributes. No elaborate configuration is needed. * Simple Things Remain Simple: Writing HTTP controllers is similar to how it's done in the synchronous code. Controllers have new exciting features that take advantage of the asynchronous environment. * Consistency is Key: You can keep the same approach to writing software no matter the size of your project. There are no growing central configuration files or service dependencies registries. Every relation between code modules is local to those modules. * Promises in PHP: Resonance provides a partial implementation of Promise/A+ spec to handle various asynchronous tasks. * GraphQL Out of the Box: You can build elaborate GraphQL schemas by using just the PHP attributes. Resonance takes care of reusing SQL queries and optimizing the resources' usage. All fields can be resolved asynchronously.

github

: 164

aiogram_bot_template

Aiogram bot template is a boilerplate for creating Telegram bots using Aiogram framework. It provides a solid foundation for building robust and scalable bots with a focus on code organization, database integration, and localization.

github

: 117

pluto

Pluto is a development tool dedicated to helping developers **build cloud and AI applications more conveniently** , resolving issues such as the challenging deployment of AI applications and open-source models. Developers are able to write applications in familiar programming languages like **Python and TypeScript** , **directly defining and utilizing the cloud resources necessary for the application within their code base** , such as AWS SageMaker, DynamoDB, and more. Pluto automatically deduces the infrastructure resource needs of the app through **static program analysis** and proceeds to create these resources on the specified cloud platform, **simplifying the resources creation and application deployment process**.

github

: 90

pinecone-ts-client

The official Node.js client for Pinecone, written in TypeScript. This client library provides a high-level interface for interacting with the Pinecone vector database service. With this client, you can create and manage indexes, upsert and query vector data, and perform other operations related to vector search and retrieval. The client is designed to be easy to use and provides a consistent and idiomatic experience for Node.js developers. It supports all the features and functionality of the Pinecone API, making it a comprehensive solution for building vector-powered applications in Node.js.

github

: 209

aiohttp-pydantic

Aiohttp pydantic is an aiohttp view to easily parse and validate requests. You define using function annotations what your methods for handling HTTP verbs expect, and Aiohttp pydantic parses the HTTP request for you, validates the data, and injects the parameters you want. It provides features like query string, request body, URL path, and HTTP headers validation, as well as Open API Specification generation.

github

: 63

gcloud-aio

This repository contains shared codebase for two projects: gcloud-aio and gcloud-rest. gcloud-aio is built for Python 3's asyncio, while gcloud-rest is a threadsafe requests-based implementation. It provides clients for Google Cloud services like Auth, BigQuery, Datastore, KMS, PubSub, Storage, and Task Queue. Users can install the library using pip and refer to the documentation for usage details. Developers can contribute to the project by following the contribution guide.

github

: 324

aioconsole

aioconsole is a Python package that provides asynchronous console and interfaces for asyncio. It offers asynchronous equivalents to input, print, exec, and code.interact, an interactive loop running the asynchronous Python console, customization and running of command line interfaces using argparse, stream support to serve interfaces instead of using standard streams, and the apython script to access asyncio code at runtime without modifying the sources. The package requires Python version 3.8 or higher and can be installed from PyPI or GitHub. It allows users to run Python files or modules with a modified asyncio policy, replacing the default event loop with an interactive loop. aioconsole is useful for scenarios where users need to interact with asyncio code in a console environment.

github

: 452

aiosqlite

aiosqlite is a Python library that provides a friendly, async interface to SQLite databases. It replicates the standard sqlite3 module but with async versions of all the standard connection and cursor methods, along with context managers for automatically closing connections and cursors. It allows interaction with SQLite databases on the main AsyncIO event loop without blocking execution of other coroutines while waiting for queries or data fetches. The library also replicates most of the advanced features of sqlite3, such as row factories and total changes tracking.

github

: 1.1k