llm-d-inference-scheduler

llm-d-inference-scheduler

Inference scheduler for llm-d

Stars: 86

Visit
 screenshot

This repository contains the Inference Scheduler, which makes optimized routing decisions for inference requests to the llm-d inference framework. It provides an 'Endpoint Picker (EPP)' component to the framework, scheduling incoming inference requests via a Kubernetes Gateway according to scheduler plugins. The EPP extends the Gateway API Inference Extension (GIE) project, adding custom features specific to llm-d, such as P/D Disaggregation. The scheduler architecture, routing logic, and plugin configuration are detailed in the Architecture Documentation. Contributions are welcome, and large changes should be discussed with maintainers first.

README:

Go Report Card Go Reference License Join Slack

Inference Scheduler

This scheduler makes optimized routing decisions for inference requests to the llm-d inference framework.

About

This provides an "Endpoint Picker (EPP)" component to the llm-d inference framework which schedules incoming inference requests to the platform via a Kubernetes Gateway according to scheduler plugins. For more details on the llm-d inference scheduler architecture, routing logic, and different plugins (filters and scorers), including plugin configuration, see the Architecture Documentation).

The EPP extends the Gateway API Inference Extension (GIE) project, which provides the API resources and machinery for scheduling. We add some custom features that are specific to llm-d here, such as P/D Disaggregation.

A compatible Gateway API implementation is used as the Gateway. The Gateway API implementation must utilize Envoy and support ext-proc, as this is the callback mechanism the EPP relies on to make routing decisions to model serving workloads currently.

Contributing

Our community meeting is weekly at Wednesday 10AM PDT (Google Meet, Meeting Notes).

We currently utilize the #sig-inference-scheduler channel in llm-d Slack workspace for communications.

For large changes please create an issue first describing the change so the maintainers can do an assessment, and work on the details with you. See DEVELOPMENT.md for details on how to work with the codebase.

Note that in general features should go to the upstream Gateway API Inference Extension (GIE) project first if applicable. The GIE is a major dependency of ours, and where most general purpose inference features live. If you have something that you feel is general purpose or use, it probably should go to the GIE. If you have something that's llm-d specific then it should go here. If you're not sure whether your feature belongs here or in the GIE, feel free to create a discussion or ask on Slack.

Contributions are welcome!

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for llm-d-inference-scheduler

Similar Open Source Tools

For similar tasks

For similar jobs