GPUs in vClusters in Kind (Inception, but hotter)

[!note] This is a re-wording of the original gist on this topic

Sometimes the world asks the question: what if you could run a virtual cluster, inside a kind cluster, with real GPU workloads? Actually, No one asked this, ever. But we’re doing it anyway. This setup is particularly useful if you’re doing multi-tenant GPU scheduling or testing GPU-bound workloads in ephemeral environments. It’s also great if you just enjoy watching things break in increasingly novel ways. We’re building off this Substratus post and adding the steps to make it work properly with containerd, vCluster, and ClusterAPI.

Prerequisite: The NVIDIA Container Toolkit

Let’s assume you have an NVIDIA GPU and you're not trying this on a Raspberry Pi. You’ll need the NVIDIA container toolkit installed. This toolkit acts as the bridge between container runtimes and your GPU. Without it, your workloads won’t see the GPU, no matter how much YAML you throw at them.

Set the Container Runtime Straight

The container runtime is what stands between your GPU and your container, so it needs to be properly configured to use the NVIDIA runtime by default. If you’re using only Docker:

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo systemctl restart docker

If you’re using both Docker and containerd (which happens more often than it should):

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default
sudo nvidia-ctk runtime configure --runtime=containerd --set-as-default
sudo systemctl restart containerd docker

Without this step, the GPU will be present but ignored — like an unpaid intern in a meeting.

Tell the Runtime to Accept GPU Devices (Even the Fake Ones)

One small but crucial config flag makes the NVIDIA runtime play nicely with how we mount GPU devices inside kind. It enables containers to treat device paths as volume mounts, which is exactly the hack we’re going to pull off in a minute.

sudo sed -i '/accept-nvidia-visible-devices-as-volume-mounts/accept-nvidia-visible-devices-as-volume-mounts = true' /etc/nvidia-container-runtime/config.toml

Without this, kind will throw a tantrum the moment you try to mount anything GPU-like into a node.

Creating a Kind Cluster That Lies to Itself

Kind nodes don’t have GPUs. They’re just containers pretending to be machines. So we cheat. By mounting /dev/null to the expected NVIDIA device path, we convince the GPU operator that there's something worth working with. Craft the following kind.yaml file with a single control-plane and 3 worker nodes, each with /dev/null mounted as the all device for the GPU.

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
name: gputest
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: /dev/null
        containerPath: /var/run/nvidia-container-devices/all
  - role: worker
    extraMounts:
      - hostPath: /dev/null
        containerPath: /var/run/nvidia-container-devices/all
  - role: worker
    extraMounts:
      - hostPath: /dev/null
        containerPath: /var/run/nvidia-container-devices/all
  - role: worker
    extraMounts:
      - hostPath: /dev/null
        containerPath: /var/run/nvidia-container-devices/all

Then create a new cluster using this config.

kind create cluster --config kind.yaml

What you get is a 4-node cluster, each one confidently pretending to have access to a GPU. It doesn’t but the GPU operator doesn’t need to know that just yet.

(Maybe) Symlink ldconfig

Some older builds of the GPU operator expect to find /sbin/ldconfig.real. If it’s not there, it might crash and burn in ways that don’t make immediate sense. Kind nodes at version 1.29 seem fine, but if you're using an older image or the GPU operator won’t start, you can do this:

for name in $(kubectl get no -o jsonpath="{.items[*].metadata.name}"); do
    docker exec -ti ${name} ln -s /sbin/ldconfig /sbin/ldconfig.real
done

It’s a workaround for legacy expectations — the software equivalent of forging a birth certificate.

Deploy the GPU Operator (For Real This Time)

Once kind is up and lying confidently about its capabilities, we can install NVIDIA’s GPU operator. This will set up device plugins and controller logic — everything but the actual driver.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia || true
helm repo update

helm install --wait --generate-name 
     -n gpu-operator --create-namespace 
     nvidia/gpu-operator --set driver.enabled=false

We're disabling the driver install because the host system (your actual machine) already has one. No point trying to install it again inside a container. Wait a minute or five — the operator can be sluggish on startup. Then confirm that nodes are reporting allocatable GPU resources:

kubectl get node -o yaml | yq '.items[] | [{"name": .metadata.name, "status": .status.allocatable."nvidia.com/gpu"}]'

Now Let's Nest This Thing: vCluster inside Kind

The goal is to run GPU workloads inside a vCluster that lives inside our kind cluster. To make this work, we need vCluster to see the real nodes — not the fake one it normally invents.

Install ClusterAPI for vCluster Support

First install clusterctl, the CLI for ClusterAPI:

VERSION=$(curl --silent "https://api.github.com/repos/kubernetes-sigs/cluster-api/releases/latest" | jq -r .tag_name)
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/${VERSION}/clusterctl-linux-amd64 -o clusterctl
sudo install -o root -g root -m 0755 clusterctl /usr/local/bin/clusterctl

Then initialise ClusterAPI in the kind cluster using the vCluster provider:

clusterctl init --infrastructure vcluster

Sync the Real Nodes Into the Virtual Cluster

By default, vCluster creates its own virtual node to represent a Kubernetes environment. That’s fine for vanilla workloads, but useless when you need real allocatable GPU resources. To fix this, we tell vCluster to sync nodes from the host (kind) cluster:

sync:
  fromHost:
    nodes:
      enabled: true

This gives the virtual cluster access to real resource data including GPUs while still operating in isolation. Then apply it:

kubectl create ns vcluster
kubectl apply -f cluster.yaml

Connect to the vCluster

Once the cluster is running:

vcluster connect kind -n vcluster

If you don’t have vcluster installed yet:

VERSION=$(curl --silent "https://api.github.com/repos/loft-sh/vcluster/releases/latest" | jq -r .tag_name)
curl -L -o vcluster "https://github.com/loft-sh/vcluster/releases/download/${VERSION}/vcluster-linux-amd64"
sudo install -c -m 0755 vcluster /usr/local/bin && rm -f vcluster

You should now be operating inside your vCluster, which in turn is running inside your kind cluster, which in turn is sitting on top of your actual GPU.

GPU Operator, Part II: Inside the vCluster

We’ll install the GPU operator again, this time inside the vCluster but with the toolkit and driver install disabled. Those were already handled by the host.

helm install --wait --generate-name     -n gpu-operator --create-namespace     nvidia/gpu-operator     --set driver.enabled=false,toolkit.enabled=false

This gives the vCluster its own GPU operator, which can schedule GPU workloads without interfering with the underlying runtime.

Run a CUDA Test Pod

Time to check whether the setup actually works.

kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vectoradd
    image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
    resources:
      limits:
        nvidia.com/gpu: 1
EOF

Give it a moment, then check logs:

kubectl logs cuda-vectoradd

If everything went right, you’ll see:

[Vector addition of 50000 elements]
...
Test PASSED

And just like that, you’ve got a GPU-enabled vCluster running inside a GPU-enabled kind cluster.

In Summary (or: what have we done?)

Pretended /dev/null was a GPU device.
Tricked Kind into thinking it had hardware.
Nested a virtual cluster inside that lie.
Passed real GPU resources through to the imaginary cluster.
Watched it all work. Somehow.

You now have a testing setup that is equal parts powerful and ridiculous — a kind of Kubernetes matryoshka doll with CUDA at its heart. Perfect for demos, ephemeral workloads, multi-tenant experiments, or just proving that, yes, you can in fact virtualise a lie and still run inference on it.