Skip to main content

llm-d-infra Quick Start

This document is meant to guide users through the process of using, deploying and potentially customizing a quickstart. The source of truth for installing a given quickstart will always live that particular directory, this guide is mean to walk through the common steps, and educate users on decisions that happen at each of those phases.

Overview

This guide will walk you through the steps to install and deploy llm-d on a Kubernetes cluster, using an opinionated flow in order to get up and running as quickly as possible.

Prerequisites

Tool Dependencies

You will need to install some dependencies (like helm, yq, git, etc.) and have a HuggingFace token for most examples. We have documented these requirements and instructions in the dependencies directory. To install the dependencies, use the provided install-deps.sh script.

HuggingFace Token

A HuggingFace token is required to download models from the HuggingFace Hub. You must create a Kubernetes secret containing your HuggingFace token in the target namespace before deployment, see instructions.

Gateway Control Plane

Additionally, it is assumed you have configured and deployed your Gateway Control Plane and their prerequisite CRDs. For information on this, see the gateway-control-plane-providers.

Target Platforms

Since the llm-d-infra is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. Requirements, workarounds, and any other documentation relevant to these platforms will live in the infra-providers directory.

llm-d-infra Installation

The llm-d-infra repository provides Helm charts to deploy various llm-d components. To install a specific component, navigate to its example directory and follow the instructions in its README:

Install llm-d on an Existing Kubernetes Cluster

To install llm-d components, navigate to the desired example directory and follow its README instructions. For example:

cd examples/inference-scheduling  # Navigate to your desired example directory
# Follow the README.md instructions in the example directory

Install on OpenShift

Before running any installation, ensure you have logged into the cluster as a cluster administrator. For example:

oc login --token=sha256~yourtoken --server=https://api.yourcluster.com:6443

After logging in, follow the same steps as described in the "Install llm-d on an Existing Kubernetes Cluster" section above.

Validation

After executing the install steps from the specific example README, you will find that resources are created according to the installation options.

First, you should be able to list all Helm releases to view the charts installed into your chosen namespace:

helm list -n ${NAMESPACE}

Out of the box with this example, you should have the following resources:

kubectl get all -n ${NAMESPACE}

Note: This assumes no other quickstart deployments in your given ${NAMESPACE}.

Using the Stack

For instructions on getting started with making inference requests, see getting-started-inferencing.md.

Metrics Collection

llm-d-infra includes support for metrics collection from vLLM pods. llm-d applies PodMonitors to trigger Prometheus scrape targets when enabled with llm-d-modelservice helm chart values. See MONITORING.md for details. In OpenShift, the built-in user workload monitoring Prometheus stack can be utilized to collect metrics. In Kubernetes, Prometheus and Grafana can be installed from the prometheus-community kube-prometheus-stack helm charts.

Uninstall

To remove llm-d resources from the cluster, refer to the uninstallation instructions in the specific example's README that you installed.

Content Source

This content is automatically synced from quickstart/README.md in the llm-d-incubation/llm-d-infra repository.

📝 To suggest changes, please edit the source file or create an issue.