Bootstrapping the Talos Control Plane (Series: Part 2)
Part 1 made the case for Talos: an immutable, API-only node OS where every machine is described by a config file. This post is the how — turning six bare Talos VMs into a working, highly-available control plane, the way I actually bootstrapped mine.
(Addresses below are illustrative doc-range IPs; the Kubernetes-internal CIDRs are the real, conventional defaults.)
The shape: six VMs, fault-isolated
The cluster runs as six VMs across a 3-node Proxmox cluster — one control-plane and one worker per physical node:
node-1 node-2 node-3
┌─ cp-01 ─┐ ┌─ cp-02 ─┐ ┌─ cp-03 ─┐ ← 3× control plane (etcd quorum)
└─ wk-01 ─┘ └─ wk-02 ─┘ └─ wk-03 ─┘ ← 3× worker (Longhorn)
The point of spreading one control-plane node per host: etcd needs a quorum of 3, and if two control-plane nodes shared a host, losing that host loses quorum and the cluster goes read-only. One-per-host means any single Proxmox node can die and the cluster keeps making decisions.
Control-plane VMs are small (2 vCPU / 4 GB / 40 GB system disk). Workers are bigger (4 vCPU / 16 GB, plus a dedicated data disk for Longhorn).
The config model: secrets once, configs from patches
Talos config generation is two ideas worth internalizing:
# 1. Generate the cluster PKI/secrets ONCE — keep secrets.yaml so you can
# add nodes later with the same trust roots.
talosctl gen secrets -o secrets.yaml
# 2. Generate machine configs from those secrets + your patches
talosctl gen config homelab-k8s "https://192.0.2.50:6443" \
--with-secrets secrets.yaml \
--config-patch @controlplane.patch.yaml \
--config-patch-worker @worker.patch.yaml
# → controlplane.yaml, worker.yaml, talosconfig
secrets.yaml is the cluster’s root of trust — guard it, and reuse it whenever you add nodes. The base configs are generic; everything opinionated lives in patches you keep in Git.
What’s in the control-plane patch (the interesting part)
This is where Talos’s declarative model shines. There’s no bash configuring the host — it’s all data:
machine:
sysctls:
net.ipv4.ip_forward: "1"
net.netfilter.nf_conntrack_max: "1048576" # busy-cluster conntrack
fs.inotify.max_user_watches: "524288" # operators watch a LOT
vm.max_map_count: "262144" # Prometheus/ES friendly
kernel:
modules:
- name: nbd # Longhorn block device
- name: iscsi_tcp # Longhorn iSCSI transport
install:
disk: /dev/sda
cluster:
proxy:
disabled: true # Cilium will replace kube-proxy
network:
cni:
name: none # we install Cilium ourselves, via GitOps
podSubnets: ["10.244.0.0/16"]
serviceSubnets: ["10.96.0.0/12"]
Three decisions here matter more than they look:
proxy.disabled: true+cni.name: none. Talos will not install kube-proxy or a CNI. Cilium takes over both jobs (eBPF, kube-proxy replacement). If you forget this, you get a kube-proxy fighting Cilium for the same packets.- Swap is off and containerd is built in — no action needed. On a normal distro that’s a checklist; on Talos it’s the default, locked down.
- Kernel modules and sysctls are config, not a provisioning script.
nbd/iscsi_tcpare there because Longhorn needs them; the conntrack/inotify bumps are because a real platform (Prometheus, operators) exhausts the defaults.
There’s one more piece worth calling out — KubePrism. Talos gives every node a local, HA-load-balanced kube-apiserver endpoint at 127.0.0.1:7445. Cilium points its kube-proxy-replacement at that, so nodes reach the API server even if one control-plane node is down. It’s on by default on modern Talos; I pin it for clarity.
Apply, then bootstrap etcd exactly once
The VMs boot from the Talos ISO into maintenance mode — alive on the network, no config yet. You push config to them:
# Apply over insecure maintenance-mode API (no trust established yet)
for ip in 192.0.2.51 192.0.2.52 192.0.2.53; do
talosctl apply-config --insecure --nodes "$ip" --file controlplane.yaml
done
Then the single most important “do this exactly once” step in the whole build:
# Bootstrap etcd — ONE TIME, against ONE control-plane node. Never repeat this.
talosctl bootstrap --nodes 192.0.2.51
talosctl bootstrap initializes the etcd cluster. Run it twice, or against two different nodes, and you can split-brain etcd. Run it once on one node; the other two control-plane nodes discover and join automatically.
# Pull the kubeconfig and point kubectl at the cluster
talosctl kubeconfig ./kubeconfig --nodes 192.0.2.51
export KUBECONFIG=$PWD/kubeconfig
kubectl get nodes
“NotReady” is the goal, not a bug
Here’s the moment that trips people up. You run kubectl get nodes and every node says NotReady. That’s correct. You told Talos cni.name: none, so there’s no pod networking yet — the kubelet can’t mark nodes Ready without a CNI. Installing Cilium (Part 3) is what flips them to Ready.
A NotReady cluster right after bootstrap means the control plane came up exactly as designed.
The gotcha that costs the most time
Before any of this: verify the version matrix. Talos ↔ Kubernetes ↔ Cilium are tightly coupled, and a mismatched trio fails in confusing ways. Pin a mutually-supported TALOS_VERSION / K8S_VERSION / CILIUM_VERSION from the Talos support matrix and Cilium’s Talos guide before you generate a single config. This is the number-one source of pain in the whole build.
Next
The control plane is up, etcd has quorum, and the nodes are waiting on a network. Part 3 installs Cilium — eBPF dataplane, kube-proxy replacement, and the thing that finally turns those nodes Ready — then hands the rest of the platform to Argo CD.