Troubleshooting

Debugging Networking

DNS debugging, service connectivity flow, kube-proxy, network policy, and CNI troubleshooting.

DNS Debugging

Verify DNS is Working

Deploy a pod with DNS tools for testing:

apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
spec:
  containers:
    - name: dnsutils
      image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
      command: ["sleep", "3600"]
kubectl apply -f dnsutils.yaml

# test DNS resolution
kubectl exec dnsutils -- nslookup kubernetes.default
kubectl exec dnsutils -- nslookup <service-name>.<namespace>.svc.cluster.local

Check resolv.conf

Every pod gets a /etc/resolv.conf configured by kubelet:

kubectl exec <pod-name> -- cat /etc/resolv.conf

# expected output:
# nameserver 10.96.0.10          (CoreDNS ClusterIP)
# search <namespace>.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5
  • nameserver — should point to the CoreDNS service ClusterIP
  • search — allows short names like my-svc to resolve within the namespace
  • ndots:5 — names with fewer than 5 dots get the search domains appended first

Check CoreDNS

# check CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns

# check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# check CoreDNS service
kubectl get svc kube-dns -n kube-system

# check CoreDNS endpoints
kubectl get endpointslices -n kube-system -l kubernetes.io/service-name=kube-dns

CoreDNS ConfigMap

# view CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml

# edit to enable query logging (add "log" plugin)
kubectl edit configmap coredns -n kube-system

Add the log plugin inside the Corefile block for debugging:

.:53 {
    log        # <-- add this line to enable query logging
    errors
    health
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
    }
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}
# restart CoreDNS to pick up changes
kubectl rollout restart deployment coredns -n kube-system

CoreDNS RBAC

If CoreDNS can't list services/endpoints, it may be a permissions issue:

# check the CoreDNS ClusterRole
kubectl describe clusterrole system:coredns

# check the ClusterRoleBinding
kubectl describe clusterrolebinding system:coredns

Common DNS Issues

IssuePossible CauseFix
Can't resolve any namesCoreDNS pods not runningCheck CoreDNS deployment, restart pods
SERVFAIL responsesCoreDNS can't reach upstream DNSCheck forward directive in ConfigMap
Forwarding loop detectedCoreDNS forwards to itselfFix /etc/resolv.conf on node or use explicit upstream IPs
Cross-namespace resolution failsWrong DNS name formatUse <svc>.<namespace>.svc.cluster.local
Slow DNS lookupsndots:5 causing extra queriesSet dnsConfig.options with lower ndots in pod spec

Service Connectivity Debugging

Service Debugging Flow

Service not working?
├── Does the Service exist?              → kubectl get svc
├── Does it have Endpoints?              → kubectl get endpoints
│   └── No endpoints?                    → Check selector matches pod labels
├── Is targetPort correct?               → kubectl describe svc
├── Can you reach pod IPs directly?      → kubectl exec -- wget -qO- <pod-ip>:<port>
│   └── Pod not responding?              → Debug the application (see application.md)
├── Can you reach the Service ClusterIP? → kubectl exec -- wget -qO- <cluster-ip>:<port>
│   └── ClusterIP not working?           → Check kube-proxy
└── Does DNS resolve?                    → kubectl exec -- nslookup <svc-name>
    └── DNS not resolving?               → Check CoreDNS (see DNS section above)

Verifying Endpoints

# check endpoints for a service
kubectl get endpoints <service-name>

# check endpointslices (more detailed)
kubectl get endpointslices -l kubernetes.io/service-name=<service-name>

# describe to see individual endpoint addresses
kubectl describe endpointslice <endpointslice-name>

# if endpoints are empty, check that selectors match pod labels
kubectl get svc <service-name> -o jsonpath='{.spec.selector}'
kubectl get pods --show-labels

Testing Pod-to-Pod Connectivity

# get pod IPs
kubectl get pods -o wide

# test connectivity from a temp pod
kubectl run tmp --image=busybox --rm -it --restart=Never -- wget -qO- -T 5 <pod-ip>:<port>

# or use curl
kubectl run tmp --image=curlimages/curl --rm -it --restart=Never -- curl -s -m 5 <pod-ip>:<port>

Testing Pod-to-Service

# test using ClusterIP
kubectl run tmp --image=busybox --rm -it --restart=Never -- wget -qO- -T 5 <cluster-ip>:<port>

# test using DNS name
kubectl run tmp --image=busybox --rm -it --restart=Never -- wget -qO- -T 5 <service-name>.<namespace>.svc.cluster.local:<port>

# test from within the same namespace (short name)
kubectl run tmp --image=busybox --rm -it --restart=Never -- wget -qO- -T 5 <service-name>:<port>

kube-proxy

# check kube-proxy pods
kubectl get pods -n kube-system -l k8s-app=kube-proxy

# check kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy

# check kube-proxy mode
kubectl logs -n kube-system <kube-proxy-pod> | grep "Using .* Proxier"

# check iptables rules (on a node)
sudo iptables -L -t nat | grep <service-name>

# check IPVS rules (if using IPVS mode)
sudo ipvsadm -L -n

Network Policy Debugging

# list all network policies across namespaces
kubectl get networkpolicies -A

# describe a specific policy
kubectl describe networkpolicy <policy-name> -n <namespace>

# check which pods a policy selects
kubectl get pods -n <namespace> --show-labels

# test connectivity between pods
kubectl exec <source-pod> -- wget -qO- -T 5 <target-pod-ip>:<port>

Common mistakes with NetworkPolicies:

  • Missing egress rules — once you define a policy, all non-matching traffic is denied (both ingress and egress if specified)
  • Forgetting to allow DNS egress (UDP/TCP port 53) — pods can't resolve service names
  • Wrong label selectors in podSelector or namespaceSelector
  • Not realizing policies are additive — if any policy allows the traffic, it's allowed

Example: allow DNS egress in a restrictive policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to: []
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

CNI Plugin Issues

# check CNI configuration
ls /etc/cni/net.d/
cat /etc/cni/net.d/*.conflist

# check CNI binaries
ls /opt/cni/bin/

# check kubelet logs for CNI errors
sudo journalctl -u kubelet | grep -i cni

# check if CNI pods are running (e.g., Calico, Flannel, Weave)
kubectl get pods -n kube-system | grep -E "calico|flannel|weave|cilium"

Common CNI issues:

  • Pod stuck in ContainerCreating — often a CNI misconfiguration or missing plugin
  • No CNI config in /etc/cni/net.d/ — CNI plugin not installed
  • CNI binary missing from /opt/cni/bin/ — need to install CNI plugins
  • IP address exhaustion — pod CIDR range is full

Useful Commands

# deploy a debug pod with networking tools
kubectl run netdebug --image=nicolaka/netshoot --rm -it --restart=Never -- /bin/bash

# test DNS resolution
kubectl exec <pod-name> -- nslookup <service-name>

# test HTTP connectivity
kubectl run tmp --image=busybox --rm -it --restart=Never -- wget -qO- -T 5 <url>

# check all services and their endpoints
kubectl get svc,endpoints -A

# check kube-proxy config
kubectl get configmap kube-proxy -n kube-system -o yaml

# trace iptables rules for a service
sudo iptables-save | grep <service-name>

# check node networking
ip addr show
ip route show

# check if a port is listening inside a pod
kubectl exec <pod-name> -- netstat -tlnp
kubectl exec <pod-name> -- ss -tlnp