Cluster Architecture
HA Control Plane
High-availability control plane topologies, load balancer setup, and multi-master kubeadm configuration.
Overview
- A single control plane node is a single point of failure — if it goes down, the cluster cannot be managed
- HA control plane runs multiple control plane nodes (minimum 3) behind a load balancer
- etcd requires an odd number of members (3 or 5) for quorum — majority must agree to commit writes
- Quorum formula: (n/2) + 1 — a 3-member cluster tolerates 1 failure, 5-member tolerates 2
HA Topologies
| Stacked etcd | External etcd | |
|---|---|---|
| Description | etcd runs on the same nodes as control plane components | etcd runs on dedicated separate nodes |
| Minimum nodes | 3 control plane nodes | 3 control plane + 3 etcd nodes (6 total) |
| Pros | Simpler to set up and manage, fewer nodes needed | etcd failures don't directly impact control plane nodes, can scale etcd independently |
| Cons | Losing a node loses both a CP member and an etcd member | More infrastructure required, more complex setup |
| Use case | Most clusters, default kubeadm HA setup | Large production clusters requiring maximum resilience |
Load Balancer Requirements
- All API server instances must be fronted by a load balancer on port 6443
- The LB address becomes the
--control-plane-endpointused by all nodes to reach the API - Health check endpoint:
https://<api-server>:6443/healthz - Must be a TCP or HTTPS load balancer (not HTTP)
Common options:
| Option | Type | Notes |
|---|---|---|
| HAProxy | External LB | Traditional, well-documented for k8s HA |
| kube-vip | Virtual IP | Runs as static pod on CP nodes, no external infra needed |
| Cloud LB | External LB | AWS NLB/ALB, GCP LB, Azure LB — managed by cloud provider |
Setting Up HA with kubeadm
1. Set up the load balancer
Configure your load balancer to point to all control plane node IPs on port 6443.
2. Initialize the first control plane node
sudo kubeadm init \
--control-plane-endpoint "LOAD_BALANCER_IP:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16
--control-plane-endpoint— stable address (LB IP/DNS) all nodes use to reach the API server--upload-certs— encrypts and uploads certificates to a kubeadm-certs Secret so other CP nodes can retrieve them
Save the output — it contains the join commands for both control plane and worker nodes.
3. Install a CNI plugin
# example: Calico
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
4. Join additional control plane nodes
sudo kubeadm join LOAD_BALANCER_IP:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <certificate-key>
--control-plane— tells kubeadm this node joins as a control plane member (not a worker)--certificate-key— key to decrypt the certificates uploaded in step 2
5. Join worker nodes
sudo kubeadm join LOAD_BALANCER_IP:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
Verifying HA Setup
# all control plane nodes should show role control-plane and be Ready
kubectl get nodes
# check etcd cluster health (run on a control plane node)
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list -w table
# check etcd endpoint status (leader, DB size, etc.)
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint status -w table
Useful Commands
# regenerate join token (expires after 24h)
kubeadm token create --print-join-command
# regenerate certificate key for adding new control plane nodes
sudo kubeadm init phase upload-certs --upload-certs
# check certificate expiration
sudo kubeadm certs check-expiration
# verify all control plane pods are running
kubectl get pods -n kube-system -l tier=control-plane
# check kube-apiserver endpoints
kubectl get endpoints kubernetes -o yaml