Kubernetes is a container orchestration platform designed to automate the deployment, scaling, and management of containerized applications. Its architecture follows a primary/replica pattern, with a clear separation between the control plane (management components) and the data plane (worker nodes that run applications).
Control Plane Components:
The control plane is the brain of Kubernetes, making global decisions about the cluster and responding to cluster events.
API Server: Acts as the frontend for Kubernetes control plane. All communications, both internal and external, go through the API server. It validates and processes RESTful requests and updates the corresponding objects in etcd.
etcd: A consistent and highly-available key-value store used as Kubernetes’ backing store for all cluster data. Consider it the source of truth for the cluster state.
Scheduler: Watches for newly created Pods with no assigned node and selects a node for them to run on. Factors considered include individual and collective resource requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, and more.
Controller Manager: Runs controller processes that regulate the state of the cluster. Each controller is a separate process, but they are compiled into a single binary and run in a single process for simplicity. Examples include Node Controller, Job Controller, and Replication Controller.
Node Components:
These components run on every node in the cluster, maintaining running pods and providing the Kubernetes runtime environment.
kubelet: An agent that runs on each node and ensures containers are running in a Pod as expected. It takes a set of PodSpecs provided through various mechanisms and ensures the containers described in those PodSpecs are running and healthy.
kube-proxy: A network proxy that runs on each node, implementing part of the Kubernetes Service concept. It maintains network rules that allow network communication to Pods from network sessions inside or outside of the cluster.
Container Runtime: The software responsible for running containers. Kubernetes supports container runtimes such as containerd, CRI-O, and any implementation of the Kubernetes CRI (Container Runtime Interface).
Kubernetes objects are persistent entities in the system that represent the state of your cluster. They describe what containerized applications are running, the resources available to them, and the policies affecting their behavior.
Essential Resources:
Pods: The smallest deployable units in Kubernetes that can be created and managed. A Pod represents a single instance of a running process in your cluster and can contain one or more containers that share storage, network, and a specification for how to run the containers.
ReplicaSets: Ensures that a specified number of Pod replicas are running at any given time. It’s often used to guarantee the availability of a specified number of identical Pods.
Deployments: Provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to match the desired state at a controlled rate.
Configuration Resources:
ConfigMaps: Allow you to decouple configuration artifacts from image content to keep containerized applications portable. ConfigMaps hold configuration data as key-value pairs.
Secrets: Similar to ConfigMaps but designed for storing sensitive information such as passwords, OAuth tokens, and ssh keys. They can be mounted as data volumes or exposed as environment variables.
Networking Resources:
Services: An abstraction which defines a logical set of Pods and a policy by which to access them. As Pods are ephemeral, Services allow clients to reliably discover and connect to containers running in the Pods.
Endpoints: A list of all the IP addresses and ports that provide a Service. Kubernetes automatically creates and manages Endpoint objects for Services except for Services of type ExternalName.
Storage Resources:
PersistentVolumes (PV): Pieces of storage in the cluster that have been provisioned by an administrator or dynamically provisioned using Storage Classes.
PersistentVolumeClaims (PVC): A request for storage by a user that can be fulfilled by a PersistentVolume.
Understanding how Kubernetes components communicate is essential for troubleshooting and securing your cluster.
Component Communication:
API Server as Central Hub: All communications between components flow through the API server. The API server is the only component that communicates with etcd.
Controller to API Server: Controllers watch the API server for changes to their respective resources and take action to ensure the current state matches the desired state.
Scheduler to API Server: The scheduler watches for newly created Pods with no assigned node and selects a node for them to run on.
Kubelet to API Server: Kubelets on worker nodes register themselves with the API server and report the status of containers they are running.
Authentication and Authorization Flow:
Authentication: When a request reaches the API server, it goes through the authentication step first, which identifies the client making the request.
Authorization: Once authenticated, the request is authorized to ensure the client has permissions to perform the requested operation on the specified resource.
Admission Control: If the request creates or modifies a resource, it passes through admission controllers, which can modify the resource or reject the request based on custom criteria.
Watch and Reconciliation Patterns:
Watch API: Kubernetes components use the Watch API to efficiently observe changes to resources they are interested in.
Reconciliation Loop: Controllers implement a reconciliation loop that continuously compares the desired state (specification) with the current state, making changes as necessary to achieve the desired state.
Level-Triggered Logic: Kubernetes uses level-triggered logic (rather than edge-triggered), meaning it continues to process objects until they reach the desired state, making the system more resilient to failures.
When planning a Kubernetes deployment, several architectural decisions must be made based on your organization’s requirements.
Single-node vs Multi-node Architectures:
Single-node: All components run on a single machine. This is suitable for development and testing but lacks high availability and scalability for production.
Multi-node: Components are distributed across multiple machines, with dedicated control plane nodes and worker nodes. This setup provides better resilience, scalability, and resource isolation.
High-availability Considerations:
Control Plane Redundancy: For production environments, running multiple control plane nodes (typically 3 or 5) ensures that the cluster remains operational even if some control plane nodes fail.
etcd Clustering: etcd should be deployed as a cluster with an odd number of members (typically 3, 5, or 7) to maintain quorum and ensure data consistency.
Load Balancing: A load balancer should be placed in front of the API servers to distribute client requests across multiple API server instances.
Control Plane vs Worker Node Requirements:
Control Plane Nodes: Require less CPU and memory compared to worker nodes but benefit from faster storage (SSD) for etcd. Typically need 2 CPUs and 4GB RAM at minimum.
Worker Nodes: Resource requirements depend on the workloads they will run. Need sufficient CPU, memory, and storage to accommodate all pods scheduled on the node.
Networking Prerequisites:
Node Connectivity: All nodes must be able to communicate with each other through a network.
Port Requirements: Specific ports must be open for Kubernetes components to communicate.
Network Policies: Consideration for how to implement network policies to control pod-to-pod communication.
kubeadm is a tool built to provide best-practice “fast paths” for creating Kubernetes clusters. It performs the necessary actions to get a minimum viable cluster up and running quickly.
kubeadm Phases and Workflow:
preflight: Runs a set of checks to validate the system state before making changes.
certs: Generates the self-signed CA and certificates for all Kubernetes components.
kubeconfig: Generates kubeconfig files for the controller manager, scheduler, admin, and kubelet.
kubelet-start: Configures and starts the kubelet.
control-plane: Creates control plane components as static Pod manifests.
etcd: Creates a local etcd instance as a static Pod.
upload-config: Uploads the kubeadm and kubelet configurations to a ConfigMap.
mark-control-plane: Applies labels and taints to the control plane node.
bootstrap-token: Creates a bootstrap token to join additional nodes.
addon: Installs required addons like kube-proxy and CoreDNS.
Certificate Management:
PKI Infrastructure: kubeadm sets up a PKI infrastructure using self-signed certificates stored in /etc/kubernetes/pki.
Certificate Authority (CA): kubeadm creates a root CA that issues certificates for each component.
Component Certificates: Each component (API server, kubelet, etc.) receives its own certificate with appropriate SANs (Subject Alternative Names).
kubelet Configuration:
kubelet-config: kubeadm generates a kubelet configuration file with appropriate settings.
Systemd Service: kubelet runs as a systemd service on each node.
Authorization Mode: kubeadm configures the kubelet to use the Node Authorizer, which restricts kubelet access to only the resources it needs.
Container Runtime Interfaces (CRI):
CRI Standard: Kubernetes uses the Container Runtime Interface to interact with different container runtimes.
Supported Runtimes: Common runtimes include containerd, CRI-O, and Docker (via the dockershim, which is deprecated in newer versions).
Runtime Configuration: kubeadm detects and configures the available container runtime.
Kubernetes has specific requirements for networking, and understanding these fundamentals is crucial for a successful deployment.
Pod Network Fundamentals:
Pod IP Addressing: Each Pod receives its own IP address. All containers within a Pod share the same network namespace and IP address.
Pod-to-Pod Communication: Pods should be able to communicate with all other pods in the cluster without NAT.
Flat Network Space: The Pod network is typically implemented as a flat, layer 3 network.
Container Network Interface (CNI):
CNI Specification: A standard for configuring network interfaces in Linux containers, used by Kubernetes to set up Pod networking.
CNI Plugins: Software components that implement the CNI specification, such as Calico, Flannel, Cilium, and Weave Net.
Plugin Selection: Factors to consider include performance, network policy support, encryption, and integration with existing infrastructure.
Service Networking and kube-proxy Modes:
Service IP Range: A separate CIDR block allocated for Service ClusterIPs. This range is different from the Pod network range.
kube-proxy Modes:
DNS Architecture with CoreDNS:
Cluster DNS: CoreDNS serves as the DNS server for the cluster, providing name resolution for Services and Pods.
DNS Records: CoreDNS creates DNS records for Services and (optionally) for Pods.
Service Discovery: Containers can discover Services through DNS names like service-name.namespace.svc.cluster.local.
DNS Configuration: Pod DNS settings are configured through the kubelet, which sets up /etc/resolv.conf in each container.
Kubernetes has a comprehensive security model that addresses various aspects of cluster and application security.
Authentication Methods:
X.509 Client Certificates: Using client certificates for authentication to the API server.
Service Account Tokens: JWT tokens that are automatically mounted into Pods.
OpenID Connect (OIDC): Integration with external identity providers like Google, Azure AD, or Okta.
Webhook Token Authentication: Allowing an external service to determine authentication decisions.
Authentication Proxy: Using a proxy in front of the API server to handle authentication.
Authorization Mechanisms:
RBAC (Role-Based Access Control): Fine-grained control over what actions users and service accounts can perform on which resources.
ABAC (Attribute-Based Access Control): Policy-based access control using attributes of users, resources, and environment.
Node Authorization: Special-purpose authorizer that grants permissions to kubelets based on the Pods they are scheduled to run.
Webhook Authorization: Delegates authorization decisions to an external service.
Admission Controllers:
Validating Admission Controllers: Validate requests against a set of rules before processing them.
Mutating Admission Controllers: Can modify objects before they are persisted.
Common Admission Controllers: Include PodSecurityPolicy, ResourceQuota, LimitRanger, and ServiceAccount.
Pod Security Standards:
Privileged: No restrictions on Pod capabilities, equivalent to root on the host.
Baseline: Minimally restrictive policy, preventing known privilege escalations while allowing the default (minimally specified) Pod configuration.
Restricted: Heavily restricted policy for hardened environments, requiring Pods to run with a more restrictive security context.
RBAC is a method of regulating access to computer or network resources based on the roles of individual users.
Roles and ClusterRoles:
Role: Defines permissions within a specific namespace.
ClusterRole: Defines permissions across the entire cluster.
Permission Structure: Each role contains rules that specify which API groups, resources, and verbs (actions) are allowed.
RoleBindings and ClusterRoleBindings:
RoleBinding: Binds a Role to users, groups, or service accounts within a namespace.
ClusterRoleBinding: Binds a ClusterRole to users, groups, or service accounts cluster-wide.
Binding Subjects: Can reference users, groups, or service accounts.
Service Accounts and Their Uses:
Default Service Account: Every namespace has a default service account that is automatically assigned to Pods if no service account is specified.
Custom Service Accounts: Created for applications that need specific permissions.
Token Mounting: Service account tokens are automatically mounted into Pods at /var/run/secrets/kubernetes.io/serviceaccount/token.
Usage Scenarios: Used for Pods that need to interact with the Kubernetes API or for external applications that need cluster access.
Best Practices for Least Privilege:
Granular Permissions: Grant only the permissions needed for the task at hand.
Namespace Isolation: Use namespaces to isolate resources and limit the scope of permissions.
Avoid Cluster-Admin: Minimize the use of cluster-admin role, which grants unrestricted access.
Regular Auditing: Regularly review and audit RBAC policies to ensure they follow the principle of least privilege.
Security contexts define privilege and access control settings for Pods and containers.
Container Security Contexts:
User and Group Settings: Specify the user and group IDs under which the container processes run.
Capabilities: Add or drop Linux capabilities. For example, adding NET_ADMIN allows network administration operations.
Privilege Settings: Control whether the container can gain additional privileges beyond its parent process.
Read-Only Root Filesystem: Prevent writes to the container’s root filesystem for added security.
Pod Security Contexts:
Pod-level Settings: Apply to all containers in the Pod.
runAsUser/runAsGroup: Set the user and group IDs for all containers.
fsGroup: Set the group ID for volumes mounted in the Pod.
supplementalGroups: Additional groups that apply to all containers.
SecurityContext Fields:
allowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process.
privileged: Gives the container nearly all the same access as processes on the host.
readOnlyRootFilesystem: Forces the container’s root filesystem to be read-only.
seccompProfile: Specifies the seccomp profile for syscall restriction.
seLinuxOptions: Configures SELinux options for the container.
Pod Security Standards:
Migration from PodSecurityPolicies: Pod Security Standards are the successor to the deprecated PodSecurityPolicy.
Enforcement Levels: Pod Security Admission Controller can enforce standards at different levels: warn, audit, or enforce.
Implementation: Enabled through the Pod Security Admission Controller, which is built into Kubernetes from v1.22 onwards.
Security Levels: Define different levels of restrictions (privileged, baseline, restricted) based on the security requirements of your workloads.
This comprehensive coverage of key concepts provides you with the foundational knowledge needed to understand Kubernetes architecture, deployment, and security. These concepts directly align with the hands-on labs that follow, where you’ll apply this knowledge in practical scenarios that mirror the CKA exam.
Set up your local environment with kubectl and essential tools for interacting with Kubernetes clusters.
# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl
# Verify installation
kubectl version --client
# For bash
echo 'source <(kubectl completion bash)' >>~/.bashrc
echo 'alias k=kubectl' >>~/.bashrc
echo 'complete -o default -F __start_kubectl k' >>~/.bashrc
source ~/.bashrc
# For zsh
echo 'source <(kubectl completion zsh)' >>~/.zshrc
echo 'alias k=kubectl' >>~/.zshrc
echo 'complete -o default -F __start_kubectl k' >>~/.zshrc
source ~/.zshrc
# Get a list of all API resources
kubectl api-resources
# Explore kubectl help
kubectl --help
# Explore a specific command help
kubectl create --help
# Use kubectl explain to understand resource definitions
kubectl explain pod
kubectl explain pod.spec
kubectl explain pod.spec.containers
# Create a sample kubeconfig file for practice
mkdir -p ~/.kube
cat > ~/.kube/config-practice << EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
server: https://example-cluster:6443
name: example-cluster
contexts:
- context:
cluster: example-cluster
user: example-user
namespace: default
name: example-context
current-context: example-context
users:
- name: example-user
user:
token: example-token
EOF
# Set KUBECONFIG to use this file (temporary)
export KUBECONFIG=~/.kube/config-practice
# View the contexts
kubectl config get-contexts
# Switch back to your regular config
export KUBECONFIG=~/.kube/config
Create and explore Kubernetes objects using YAML manifests and kubectl commands.
# Create a directory for your manifests
mkdir -p ~/k8s-practice/pods
# Create a Pod manifest
cat > ~/k8s-practice/pods/nginx-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
EOF
# Create the Pod
kubectl apply -f ~/k8s-practice/pods/nginx-pod.yaml
# Get Pod information
kubectl get pods
kubectl get pod nginx -o wide
# Describe the Pod
kubectl describe pod nginx
# Get the Pod YAML with kubectl
kubectl get pod nginx -o yaml > ~/k8s-practice/pods/nginx-pod-full.yaml
# Examine the full YAML to see all the fields Kubernetes adds
cat ~/k8s-practice/pods/nginx-pod-full.yaml
# Create a Pod with multiple labels
cat > ~/k8s-practice/pods/nginx-labels.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: nginx-labels
labels:
app: nginx
environment: test
tier: frontend
spec:
containers:
- name: nginx
image: nginx:1.21
EOF
# Apply the manifest
kubectl apply -f ~/k8s-practice/pods/nginx-labels.yaml
# Query Pods using different label selectors
kubectl get pods -l app=nginx
kubectl get pods -l 'environment in (test,prod)'
kubectl get pods -l 'app=nginx,tier=frontend'
# Add a new label to an existing Pod
kubectl label pod nginx version=v1
# Remove a label
kubectl label pod nginx version-
# List Pods with custom columns showing labels
kubectl get pods -L app,environment,tier
# Create a Pod that will complete quickly
cat > ~/k8s-practice/pods/oneshot-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: oneshot
spec:
restartPolicy: Never
containers:
- name: busybox
image: busybox
command: ['sh', '-c', 'echo Hello Kubernetes && sleep 5']
EOF
# Create the Pod
kubectl apply -f ~/k8s-practice/pods/oneshot-pod.yaml
# Watch the Pod status change
kubectl get pods -w
# Once completed, describe to see completion information
kubectl describe pod oneshot
# View Pod logs
kubectl logs oneshot
# Create a Pod that fails
cat > ~/k8s-practice/pods/failing-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: failing-pod
spec:
restartPolicy: OnFailure
containers:
- name: busybox
image: busybox
command: ['sh', '-c', 'echo Starting && exit 1']
EOF
# Create the Pod
kubectl apply -f ~/k8s-practice/pods/failing-pod.yaml
# Watch the Pod restart
kubectl get pods -w
# Check the restart count and last state
kubectl describe pod failing-pod
Examine Kubernetes control plane components and their functions.
# Get control plane Pods (on a kubeadm-based cluster)
kubectl get pods -n kube-system
# Describe the API server Pod
kubectl describe pod -n kube-system -l component=kube-apiserver
# Examine the scheduler
kubectl describe pod -n kube-system -l component=kube-scheduler
# Look at the controller manager
kubectl describe pod -n kube-system -l component=kube-controller-manager
# Check etcd configuration
kubectl describe pod -n kube-system -l component=etcd
# Examine the Kubernetes endpoints for leader election
kubectl -n kube-system get endpoints kube-scheduler -o yaml
kubectl -n kube-system get endpoints kube-controller-manager -o yaml
# In a multi-master setup, you would see annotations indicating which instance is the leader
# Generate a verbose API request to see the complete flow
kubectl get pods -v=8
# Look at the API server audit logs (if enabled)
kubectl logs -n kube-system -l component=kube-apiserver | grep audit
# List admission controllers enabled in the API server
kubectl describe pod -n kube-system -l component=kube-apiserver | grep enable-admission-plugins
# List mutating admission webhook configurations
kubectl get mutatingwebhookconfigurations
# List validating admission webhook configurations
kubectl get validatingwebhookconfigurations
Prepare Linux nodes to serve as Kubernetes cluster nodes.
# Run on all nodes
# Update system
sudo apt-get update
sudo apt-get upgrade -y
# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# Load required kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Set kernel parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
# Verify the modules are loaded
lsmod | grep overlay
lsmod | grep br_netfilter
# Verify kernel parameters
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
# Run on all nodes
# Install prerequisites
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release
# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up the repository
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install containerd
sudo apt-get update
sudo apt-get install -y containerd.io
# Configure containerd to use systemd cgroup driver
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
# Restart containerd
sudo systemctl restart containerd
sudo systemctl enable containerd
# Run on all nodes
# Add Kubernetes apt repository signing key
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
# Add Kubernetes apt repository
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# Install kubeadm, kubelet, and kubectl
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
# Pin their versions to prevent accidental upgrades
sudo apt-mark hold kubelet kubeadm kubectl
# Check versions
kubeadm version
kubectl version --client
kubelet --version
# Run on all nodes
# Check if the node is ready for Kubernetes networking
ip link
ip addr
# Make sure all nodes can reach each other
# Replace <IP> with the IP of other nodes
ping -c 3 <IP>
# Check firewall rules or disable firewall for testing
sudo ufw status
sudo ufw disable # For testing only
# Ensure required ports are open (using netcat or telnet)
# Replace <IP> with the IP of control plane node
nc -zv <IP> 6443 # API server
Initialize a Kubernetes control plane using kubeadm and set up networking.
# Run on control plane node only
# Create a kubeadm configuration file
cat > ~/kubeadm-config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: "unix:///run/containerd/containerd.sock"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
podSubnet: "192.168.0.0/16" # Required for Calico
EOF
# Initialize the control plane
sudo kubeadm init --config=~/kubeadm-config.yaml --upload-certs | tee ~/kubeadm-init.log
# Set up kubectl for the current user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Verify the control plane is running
kubectl get nodes
kubectl get pods -n kube-system
# Run on control plane node only
# Install Calico operator
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml
# Install Calico custom resources
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml
# Verify that Calico pods are running
kubectl get pods -n calico-system -w
# Verify that CoreDNS pods are running (they will transition to Running once the network is ready)
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Get the join command from the control plane node
# Look for the line starting with "kubeadm join" in the kubeadm-init.log file
cat ~/kubeadm-init.log | grep -A 2 "kubeadm join"
# Run the join command on each worker node
# Example (the actual command will be different):
sudo kubeadm join 192.168.1.100:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
# If the token has expired, generate a new one on the control plane node
kubeadm token create --print-join-command
# Run on control plane node
# Check node status (wait for all nodes to be Ready)
kubectl get nodes -w
# Check that system pods are running
kubectl get pods --all-namespaces
# Verify cluster info
kubectl cluster-info
# Test pod networking by creating a test deployment
kubectl create deployment nginx --image=nginx
kubectl scale deployment nginx --replicas=3
# Check that pods are distributed across nodes
kubectl get pods -o wide
Learn how to manage a Kubernetes cluster, including node operations, upgrades, and backups.
# Check node status
kubectl get nodes
kubectl describe nodes
# Check resource usage on nodes
kubectl top nodes
# View node labels
kubectl get nodes --show-labels
# On control plane node:
# Check current versions
kubectl version
kubelet --version
# Install the new kubeadm version
sudo apt-get update
sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm=1.28.1-00
sudo apt-mark hold kubeadm
# Plan the upgrade
sudo kubeadm upgrade plan
# Apply the upgrade
sudo kubeadm upgrade apply v1.28.1
# Upgrade kubelet and kubectl on control plane
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet=1.28.1-00 kubectl=1.28.1-00
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# On worker nodes:
# Drain the node
kubectl drain <node-name> --ignore-daemonsets
# Upgrade kubeadm
sudo apt-mark unhold kubeadm
sudo apt-get update
sudo apt-get install -y kubeadm=1.28.1-00
sudo apt-mark hold kubeadm
# Upgrade node configuration
sudo kubeadm upgrade node
# Upgrade kubelet and kubectl
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet=1.28.1-00 kubectl=1.28.1-00
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# Uncordon the node
kubectl uncordon <node-name>
# Backup etcd
# First, find the etcd pod
kubectl get pods -n kube-system | grep etcd
# Get etcd certificates location
sudo ls -la /etc/kubernetes/pki/etcd/
# Create a backup of etcd
sudo ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /tmp/etcd-backup.db
# Verify the backup
sudo ETCDCTL_API=3 etcdctl --write-out=table snapshot status /tmp/etcd-backup.db
# To restore from a backup (in a disaster scenario)
# First, stop the API server
sudo systemctl stop kubelet
sudo docker ps | grep kube-apiserver | awk '{print $1}' | xargs sudo docker kill
# Then restore from backup
sudo ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--data-dir=/var/lib/etcd-backup \
snapshot restore /tmp/etcd-backup.db
# Update etcd manifest to use restored data
sudo sed -i 's/\/var\/lib\/etcd/\/var\/lib\/etcd-backup/g' /etc/kubernetes/manifests/etcd.yaml
# Restart kubelet
sudo systemctl start kubelet
Create and manage RBAC configurations for different users and service accounts.
# Create a namespace for our experiments
kubectl create namespace rbac-test
# Create a service account
kubectl create serviceaccount restricted-sa -n rbac-test
# View the service account
kubectl get serviceaccount -n rbac-test
kubectl describe serviceaccount restricted-sa -n rbac-test
# Create a Role that allows read-only access to pods
cat > ~/k8s-practice/rbac/read-pods-role.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: rbac-test
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
EOF
kubectl apply -f ~/k8s-practice/rbac/read-pods-role.yaml
# Create a RoleBinding to bind the Role to the service account
cat > ~/k8s-practice/rbac/read-pods-rolebinding.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: rbac-test
subjects:
- kind: ServiceAccount
name: restricted-sa
namespace: rbac-test
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f ~/k8s-practice/rbac/read-pods-rolebinding.yaml
# Create a ClusterRole that allows read-only access to nodes
cat > ~/k8s-practice/rbac/node-reader-clusterrole.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "watch", "list"]
EOF
kubectl apply -f ~/k8s-practice/rbac/node-reader-clusterrole.yaml
# Create a ClusterRoleBinding to bind the ClusterRole to the service account
cat > ~/k8s-practice/rbac/node-reader-clusterrolebinding.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: read-nodes
subjects:
- kind: ServiceAccount
name: restricted-sa
namespace: rbac-test
roleRef:
kind: ClusterRole
name: node-reader
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f ~/k8s-practice/rbac/node-reader-clusterrolebinding.yaml
# Create a Pod to test the permissions
cat > ~/k8s-practice/rbac/auth-tester-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: auth-tester
namespace: rbac-test
spec:
serviceAccountName: restricted-sa
containers:
- name: auth-tester
image: curlimages/curl
command: ["sleep", "3600"]
EOF
kubectl apply -f ~/k8s-practice/rbac/auth-tester-pod.yaml
# Execute commands in the Pod to test permissions
kubectl exec -it -n rbac-test auth-tester -- sh
# Inside the Pod, try to access different resources
# Get a token
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
# Try to list pods in the namespace (should succeed)
curl -s -k -H "Authorization: Bearer $TOKEN" \
https://kubernetes.default.svc/api/v1/namespaces/rbac-test/pods/
# Try to list services in the namespace (should fail)
curl -s -k -H "Authorization: Bearer $TOKEN" \
https://kubernetes.default.svc/api/v1/namespaces/rbac-test/services/
# Try to list nodes (should succeed)
curl -s -k -H "Authorization: Bearer $TOKEN" \
https://kubernetes.default.svc/api/v1/nodes/
# Exit the pod
exit
Configure security contexts for Pods and containers to enhance security posture.
# Create a Pod with a security context that sets user and group IDs
cat > ~/k8s-practice/security/security-context-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
namespace: rbac-test
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
containers:
- name: sec-ctx-container
image: busybox
command: ["sh", "-c", "sleep 3600"]
securityContext:
allowPrivilegeEscalation: false
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
volumes:
- name: sec-ctx-vol
emptyDir: {}
EOF
kubectl apply -f ~/k8s-practice/security/security-context-pod.yaml
# Verify the Pod is running
kubectl get pod security-context-demo -n rbac-test
# Check the security context settings
kubectl exec -it security-context-demo -n rbac-test -- sh
# Inside the Pod, check the user and group IDs
id
# Check file permissions on the mounted volume
ls -la /data/demo
# Try to create a file
touch /data/demo/file1
ls -la /data/demo/file1
# Exit the Pod
exit
# Create a Pod with specific Linux capabilities
cat > ~/k8s-practice/security/capabilities-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: capabilities-demo
namespace: rbac-test
spec:
containers:
- name: capabilities-container
image: ubuntu
command: ["sleep", "3600"]
securityContext:
capabilities:
add: ["NET_ADMIN", "SYS_TIME"]
drop: ["ALL"]
EOF
kubectl apply -f ~/k8s-practice/security/capabilities-pod.yaml
# Verify the Pod is running
kubectl get pod capabilities-demo -n rbac-test
# Create a Pod with a read-only root filesystem
cat > ~/k8s-practice/security/readonly-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: readonly-demo
namespace: rbac-test
spec:
containers:
- name: readonly-container
image: nginx
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: var-run
mountPath: /var/run
- name: var-cache-nginx
mountPath: /var/cache/nginx
- name: var-log-nginx
mountPath: /var/log/nginx
volumes:
- name: var-run
emptyDir: {}
- name: var-cache-nginx
emptyDir: {}
- name: var-log-nginx
emptyDir: {}
EOF
kubectl apply -f ~/k8s-practice/security/readonly-pod.yaml
# Verify the Pod is running
kubectl get pod readonly-demo -n rbac-test
# Test the read-only filesystem
kubectl exec -it readonly-demo -n rbac-test -- sh
# Try to create a file in the root filesystem (should fail)
touch /test-file
# Try to create a file in a mounted volume (should succeed)
touch /var/run/test-file
ls -la /var/run/test-file
# Exit the Pod
exit
Examine and configure API server security settings.
# View API server Pod definition
kubectl get pod -n kube-system -l component=kube-apiserver -o yaml
# Examine the command-line arguments
kubectl describe pod -n kube-system -l component=kube-apiserver | grep "\--"
# Check the currently enabled admission controllers
kubectl describe pod -n kube-system -l component=kube-apiserver | grep enable-admission-plugins
# Create an audit policy file on the control plane node
sudo mkdir -p /etc/kubernetes/audit
sudo cat > /etc/kubernetes/audit/policy.yaml << EOF
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: ""
resources: ["pods", "services"]
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
namespaces: ["rbac-test"]
- level: None
users: ["system:serviceaccount:kube-system:default"]
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
EOF
# Modify the API server manifest to enable audit logging
sudo cp /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/kube-apiserver.yaml.bak
# Edit the API server manifest (manually with an editor)
sudo nano /etc/kubernetes/manifests/kube-apiserver.yaml
# Add the following parameters to the command section:
# - --audit-policy-file=/etc/kubernetes/audit/policy.yaml
# - --audit-log-path=/var/log/kubernetes/audit/audit.log
# - --audit-log-maxage=30
# - --audit-log-maxbackup=3
# - --audit-log-maxsize=100
# Add the following volumeMounts to the container section:
# - mountPath: /etc/kubernetes/audit
# name: audit-policy
# readOnly: true
# - mountPath: /var/log/kubernetes/audit
# name: audit-log
# Add the following volumes to the volumes section:
# - hostPath:
# path: /etc/kubernetes/audit
# type: DirectoryOrCreate
# name: audit-policy
# - hostPath:
# path: /var/log/kubernetes/audit
# type: DirectoryOrCreate
# name: audit-log
# Create the audit log directory
sudo mkdir -p /var/log/kubernetes/audit
# The kubelet will automatically pick up the changes and restart the API server
# Wait for the API server to restart
sleep 30
kubectl get pods -n kube-system
# Check that audit logs are being generated
sudo ls -la /var/log/kubernetes/audit/
# Examine existing admission controllers
kubectl describe pod -n kube-system -l component=kube-apiserver | grep enable-admission-plugins
# Edit the API server manifest to add additional admission controllers
sudo nano /etc/kubernetes/manifests/kube-apiserver.yaml
# Add or modify the --enable-admission-plugins flag to include:
# --enable-admission-plugins=NodeRestriction,PodSecurityPolicy,EventRateLimit
# The kubelet will automatically pick up the changes and restart the API server
# Wait for the API server to restart
sleep 30
kubectl get pods -n kube-system
# Create a directory for certificates
mkdir -p ~/k8s-practice/certs
# Create a private key for a new user
openssl genrsa -out ~/k8s-practice/certs/jane.key 2048
# Create a certificate signing request
openssl req -new -key ~/k8s-practice/certs/jane.key -out ~/k8s-practice/certs/jane.csr -subj "/CN=jane/O=development"
# Get the CSR in base64 format
cat ~/k8s-practice/certs/jane.csr | base64 | tr -d '\n'
# Create a Kubernetes CSR object
cat > ~/k8s-practice/certs/jane-csr.yaml << EOF
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: jane
spec:
request: $(cat ~/k8s-practice/certs/jane.csr | base64 | tr -d '\n')
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 86400 # one day
usages:
- client auth
EOF
kubectl apply -f ~/k8s-practice/certs/jane-csr.yaml
# Approve the CSR
kubectl certificate approve jane
# Get the signed certificate
kubectl get csr jane -o jsonpath='{.status.certificate}' | base64 --decode > ~/k8s-practice/certs/jane.crt
# Create a role and role binding for the new user
cat > ~/k8s-practice/certs/developer-role.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: rbac-test
name: developer
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: jane-developer
namespace: rbac-test
subjects:
- kind: User
name: jane
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f ~/k8s-practice/certs/developer-role.yaml
# Create a new kubeconfig for the user
CLUSTER_NAME=$(kubectl config view --minify -o jsonpath='{.clusters[0].name}')
SERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
CA_CERT=$(kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
kubectl config set-cluster ${CLUSTER_NAME} \
--server=${SERVER} \
--certificate-authority=~/k8s-practice/certs/ca.crt \
--embed-certs=true \
--kubeconfig=~/k8s-practice/certs/jane-kubeconfig
kubectl config set-credentials jane \
--client-certificate=~/k8s-practice/certs/jane.crt \
--client-key=~/k8s-practice/certs/jane.key \
--embed-certs=true \
--kubeconfig=~/k8s-practice/certs/jane-kubeconfig
kubectl config set-context jane-context \
--cluster=${CLUSTER_NAME} \
--namespace=rbac-test \
--user=jane \
--kubeconfig=~/k8s-practice/certs/jane-kubeconfig
kubectl config use-context jane-context --kubeconfig=~/k8s-practice/certs/jane-kubeconfig
# Test the new kubeconfig
kubectl --kubeconfig=~/k8s-practice/certs/jane-kubeconfig get pods -n rbac-test
The following practice exercises cover important CKA topics that complement our lab work. Each exercise focuses on key skills you’ll need for the certification exam.
You’re an administrator of a Kubernetes cluster and need to investigate why the scheduler is not functioning properly.
# 1. List all control plane components
kubectl get pods -n kube-system
# 2. Check the scheduler logs
kubectl logs -n kube-system kube-scheduler-<control-plane-node-name>
# If you see error logs, examine the scheduler manifest
sudo cat /etc/kubernetes/manifests/kube-scheduler.yaml
# 3. Check for configuration issues (common problems include):
# - Incorrect flags or configuration
# - Path issues for certificates
# - Permission problems
# 4. To fix a broken scheduler, you might need to:
sudo cp /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/kube-scheduler-backup.yaml
sudo nano /etc/kubernetes/manifests/kube-scheduler.yaml
# Fix any identified issues
# The kubelet will automatically recreate the Pod
# Verify the scheduler is functioning
kubectl get pods -n kube-system | grep scheduler
Your organization requires regular backups of etcd data. You need to create a backup strategy and successfully perform a restore operation.
# 1. Create a proper backup
# First, identify the etcd container
kubectl -n kube-system get pods | grep etcd
# Find the endpoint and certificate locations from the etcd pod definition
kubectl -n kube-system describe pod etcd-<control-plane-node-name>
# Create a backup with proper certificates
sudo ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup-$(date +%Y%m%d-%H%M%S).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify the backup is valid
sudo ETCDCTL_API=3 etcdctl --write-out=table snapshot status \
/tmp/etcd-backup-*.db
# 2. Simulate data loss (in a real scenario, you'd never do this in production)
# For practice purposes, you might create a test namespace and some resources, then "break" something
# 3. To restore from backup, first stop the API server and etcd
sudo systemctl stop kubelet
sudo docker ps | grep etcd | awk '{print $1}' | xargs sudo docker stop
# Restore the etcd data
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup-*.db \
--data-dir=/var/lib/etcd-restore \
--name=<control-plane-node-name> \
--initial-cluster=<control-plane-node-name>=https://127.0.0.1:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380
# Update etcd manifest to use the restored data directory
sudo sed -i.bak "s|/var/lib/etcd|/var/lib/etcd-restore|g" /etc/kubernetes/manifests/etcd.yaml
# Restart the kubelet
sudo systemctl restart kubelet
# 4. Verify the cluster is functioning
kubectl get nodes
kubectl get pods --all-namespaces
Your team has decided to use Helm to manage applications in the Kubernetes cluster. You need to set up Helm and deploy an application.
# 1. Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
# Verify installation
helm version
# 2. Add a chart repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# 3. Install a chart
# First create a namespace for your Helm deployment
kubectl create namespace helm-test
# Install Nginx
helm install my-nginx bitnami/nginx --namespace helm-test
# Check status
helm status my-nginx -n helm-test
kubectl get all -n helm-test
# 4. Upgrade a release
helm upgrade my-nginx bitnami/nginx --set replicaCount=2 -n helm-test
# Verify the change
kubectl get pods -n helm-test
# Rollback to previous version
helm rollback my-nginx 1 -n helm-test
# List all releases
helm list --all-namespaces
# Uninstall a release
helm uninstall my-nginx -n helm-test
You need to deploy the same application with different configurations across development, staging, and production environments.
# 1. Create a base configuration
mkdir -p ~/kustomize-practice/base
cd ~/kustomize-practice/base
# Create a deployment.yaml file
cat > deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
EOF
# Create a service.yaml file
cat > service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
# Create a kustomization.yaml file
cat > kustomization.yaml << EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
EOF
# 2. Create overlays for different environments
mkdir -p ~/kustomize-practice/overlays/{dev,staging,prod}
# Development overlay
cd ~/kustomize-practice/overlays/dev
cat > kustomization.yaml << EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namePrefix: dev-
resources:
- ../../base
namespace: development
patchesStrategicMerge:
- deployment-patch.yaml
EOF
cat > deployment-patch.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 1
EOF
# Staging overlay
cd ~/kustomize-practice/overlays/staging
cat > kustomization.yaml << EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namePrefix: staging-
resources:
- ../../base
namespace: staging
patchesStrategicMerge:
- deployment-patch.yaml
EOF
cat > deployment-patch.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 2
EOF
# Production overlay
cd ~/kustomize-practice/overlays/prod
cat > kustomization.yaml << EOF
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namePrefix: prod-
resources:
- ../../base
namespace: production
patchesStrategicMerge:
- deployment-patch.yaml
EOF
cat > deployment-patch.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
EOF
# 3. Preview and apply the configurations
# Create namespaces
kubectl create namespace development
kubectl create namespace staging
kubectl create namespace production
# Preview development configuration
kubectl kustomize ~/kustomize-practice/overlays/dev
# Apply development configuration
kubectl apply -k ~/kustomize-practice/overlays/dev
# Apply other environments
kubectl apply -k ~/kustomize-practice/overlays/staging
kubectl apply -k ~/kustomize-practice/overlays/prod
# Verify deployments
kubectl get deployments --all-namespaces | grep nginx
You need to understand how Custom Resource Definitions work in Kubernetes and how to interact with custom resources.
# 1. Create a Custom Resource Definition
cat > ~/crd-practice/crontab-crd.yaml << EOF
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: crontabs.stable.example.com
spec:
group: stable.example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
cronSpec:
type: string
image:
type: string
replicas:
type: integer
required: ["cronSpec", "image"]
scope: Namespaced
names:
plural: crontabs
singular: crontab
kind: CronTab
shortNames:
- ct
EOF
kubectl apply -f ~/crd-practice/crontab-crd.yaml
# 2. Create a Custom Resource
cat > ~/crd-practice/my-crontab.yaml << EOF
apiVersion: stable.example.com/v1
kind: CronTab
metadata:
name: my-cron
spec:
cronSpec: "* * * * */5"
image: my-cron-image:v1
replicas: 3
EOF
kubectl apply -f ~/crd-practice/my-crontab.yaml
# 3. Interact with the Custom Resource
kubectl get crontabs
kubectl get ct # Using the short name
kubectl describe crontab my-cron
# Edit the custom resource
kubectl edit crontab my-cron
# Delete the custom resource
kubectl delete crontab my-cron
# Clean up
kubectl delete -f ~/crd-practice/crontab-crd.yaml
Your organization needs a highly available Kubernetes control plane to minimize downtime. You need to understand the requirements and steps for setting up an HA control plane.
# 1. Identify requirements for HA
# - Multiple control plane nodes (typically 3 or 5)
# - Load balancer for API server
# - Shared etcd cluster or stacked etcd
# 2. Set up a load balancer (conceptual - actual implementation depends on environment)
# This could be HAProxy, Nginx, cloud provider load balancer, etc.
# Example HAProxy configuration for Kubernetes API server:
cat > haproxy.cfg << EOF
frontend kubernetes-frontend
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-backend
backend kubernetes-backend
mode tcp
option tcp-check
balance roundrobin
server control-plane1 192.168.1.101:6443 check fall 3 rise 2
server control-plane2 192.168.1.102:6443 check fall 3 rise 2
server control-plane3 192.168.1.103:6443 check fall 3 rise 2
EOF
# 3. Initialize the first control plane node with the load balancer endpoint
# When initializing with kubeadm, you'd use:
sudo kubeadm init --control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" \
--upload-certs \
--pod-network-cidr=192.168.0.0/16
# This will output a command for joining additional control plane nodes, like:
# kubeadm join 192.168.1.100:6443 --token abcdef.0123456789abcdef \
# --discovery-token-ca-cert-hash sha256:1234... \
# --control-plane --certificate-key 0123456789abcdef...
# 4. Verify the control plane components
kubectl get pods -n kube-system
kubectl get nodes
# 5. Test failover (conceptual)
# In a real HA setup, you would:
# - Shutdown one control plane node
# - Verify the cluster still functions
# - Verify API requests still work
# - Bring the node back online
You’re investigating networking issues in your Kubernetes cluster. Pod-to-pod communication is failing, and you need to diagnose and fix the problems.
# 1. Verify the CNI configuration
# Check if CNI plugins are installed
ls -la /opt/cni/bin/
# Check CNI configuration
sudo cat /etc/cni/net.d/*
# Verify CNI pods are running
kubectl get pods -n kube-system | grep -E 'calico|flannel|cilium|weave'
# Check CNI logs
kubectl logs -n kube-system calico-node-xxxxx
# 2. Check node-to-node connectivity
# Check node IPs
kubectl get nodes -o wide
# Test connectivity between nodes (run on each node)
ping <node-ip>
# Check if required ports are open
sudo netstat -tulpn | grep -E '6443|10250|10251|10252|2379|2380'
# 3. Diagnose pod-to-pod communication issues
# Create test pods in different namespaces
kubectl create namespace network-test-1
kubectl create namespace network-test-2
kubectl run -n network-test-1 nginx --image=nginx
kubectl run -n network-test-1 busybox --image=busybox -- sleep 3600
kubectl run -n network-test-2 busybox --image=busybox -- sleep 3600
# Get pod IPs
kubectl get pods -n network-test-1 -o wide
kubectl get pods -n network-test-2 -o wide
# Test connectivity from one pod to another
kubectl exec -n network-test-1 busybox -- ping -c 4 <nginx-pod-ip>
kubectl exec -n network-test-2 busybox -- ping -c 4 <nginx-pod-ip>
# 4. Test service resolution
# Create a service
kubectl expose -n network-test-1 pod nginx --port=80
# Test DNS resolution from pods
kubectl exec -n network-test-1 busybox -- nslookup nginx.network-test-1.svc.cluster.local
kubectl exec -n network-test-2 busybox -- nslookup nginx.network-test-1.svc.cluster.local
# Test connecting to the service
kubectl exec -n network-test-1 busybox -- wget -O- nginx.network-test-1.svc.cluster.local
kubectl exec -n network-test-2 busybox -- wget -O- nginx.network-test-1.svc.cluster.local
# Clean up
kubectl delete namespace network-test-1
kubectl delete namespace network-test-2
Deployments are one of the most common Kubernetes resources, providing declarative updates for Pods and ReplicaSets. They represent a set of identical Pods, and the Deployment controller ensures that the actual state matches the desired state at a controlled rate. This makes Deployments ideal for stateless applications.
Key features of Deployments include:
The Deployment controller continuously monitors the health of Pods and replaces any that fail, ensuring high availability of your applications.
StatefulSets are specialized workload controllers designed for applications that require one or more of the following:
Unlike Deployments, StatefulSets maintain a sticky identity for each Pod. These Pods are created from the same specification but are not interchangeable—each has a persistent identifier that it maintains across any rescheduling.
StatefulSets are valuable for applications that require:
Examples include databases (MySQL, PostgreSQL), distributed systems (Kafka, ZooKeeper), and other stateful applications.
DaemonSets ensure that all (or some) nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected.
Use cases for DaemonSets include:
DaemonSets are perfect for node-level operations, monitoring, and infrastructure components that need to run on each node.
Jobs create one or more Pods and ensure that a specified number of them successfully terminate. Jobs track the successful completions of Pods, and when a specified number of successful completions is reached, the Job is complete.
CronJobs build on Jobs, allowing you to run Jobs on a time-based schedule. They are useful for creating periodic and recurring tasks, like running backups, generating reports, or sending emails.
Key aspects of Jobs and CronJobs:
ConfigMaps and Secrets are resources for decoupling configuration from container images, allowing for greater portability.
ConfigMaps provide a way to inject configuration data into Pods. The data stored in a ConfigMap can be referenced in a volume of type configMap and then consumed by containerized applications running in a Pod, or provided as environment variables.
Secrets are similar to ConfigMaps but are specifically intended for confidential data such as passwords, OAuth tokens, or SSH keys. Kubernetes Secrets are, by default, stored as unencrypted base64-encoded strings in etcd. For improved security, consider using:
ConfigMaps and Secrets can be mounted as:
Pod scheduling in Kubernetes determines where Pods are placed within the cluster. Key concepts include:
Node Affinity/Anti-Affinity: Constrain which nodes your Pod can be scheduled on based on node labels.
requiredDuringSchedulingIgnoredDuringExecution: Hard requirement that must be metpreferredDuringSchedulingIgnoredDuringExecution: Soft preference that the scheduler will try to enforcePod Affinity/Anti-Affinity: Control how Pods are scheduled relative to other Pods.
Taints and Tolerations: Taints are applied to nodes, allowing them to repel certain Pods. Tolerations are applied to Pods, allowing them to schedule onto nodes with matching taints.
Node Selectors: A simple, straightforward way to constrain Pods to nodes with specific labels.
Resource Limits and Requests:
The Kubernetes networking model requires:
This model is implemented through Container Network Interface (CNI) plugins like Calico, Flannel, Cilium, and others.
Services in Kubernetes are an abstraction that defines a logical set of Pods and a policy to access them. They enable loose coupling between dependent Pods.
There are four types of Services:
ClusterIP: Exposes the Service on a cluster-internal IP. This makes the Service only reachable from within the cluster.
NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created.
LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
ExternalName: Maps the Service to the contents of the externalName field (e.g., foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.
Endpoints track the IP addresses of Pods that match a Service’s selector. They are automatically created and updated by Kubernetes whenever Pods that match the Service’s selector come online or go offline.
Endpoints are especially useful for integrating external services. By manually creating Endpoint objects (instead of relying on selectors), you can point a Service to specific IP addresses outside your cluster.
Network Policies are specifications of how groups of Pods are allowed to communicate with each other and other network endpoints. They use labels to select Pods and define rules which specify what traffic is allowed to and from the selected Pods.
Key aspects of Network Policies:
Note that to enforce Network Policies, you need a networking solution that supports them. Not all CNI plugins do—Calico, Cilium, and some others have good Network Policy support.
The Gateway API is a newer, more expressive API for Kubernetes networking, offering significant improvements over Ingress. It provides a more flexible and extensible model for exposing services outside the cluster.
Key components of the Gateway API include:
The Gateway API is designed to support various layer 7 protocols, not just HTTP, and provides better support for multi-team environments through clear role separation.
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource.
An Ingress controller is responsible for fulfilling the Ingress, usually with a load balancer. Various Ingress controllers are available, including:
Ingress resources define rules for routing external HTTP/HTTPS traffic to internal services, including:
CoreDNS is the DNS server used within Kubernetes clusters. It provides:
CoreDNS is configured through a Corefile, which defines how DNS requests are processed. It supports various plugins for different functionality, including:
Understanding how CoreDNS works is essential for troubleshooting networking issues in Kubernetes, especially those related to service discovery.
Create, configure, and manage Deployments with ConfigMaps and Secrets for application configuration.
# Create a directory for our exercise files
mkdir -p ~/k8s-labs/configmaps-secrets
# Create a ConfigMap from literal values
cat > ~/k8s-labs/configmaps-secrets/app-config.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: webapp-config
namespace: default
data:
DB_HOST: "mysql-service"
DB_PORT: "3306"
APP_COLOR: "blue"
APP_MODE: "production"
CACHE_ENABLED: "true"
EOF
# Apply the ConfigMap
kubectl apply -f ~/k8s-labs/configmaps-secrets/app-config.yaml
# Verify the ConfigMap
kubectl get configmap webapp-config
kubectl describe configmap webapp-config
# Create a Secret for database credentials
cat > ~/k8s-labs/configmaps-secrets/app-secret.yaml << EOF
apiVersion: v1
kind: Secret
metadata:
name: webapp-secret
namespace: default
type: Opaque
data:
DB_USER: $(echo -n "admin" | base64)
DB_PASSWORD: $(echo -n "password123" | base64)
API_KEY: $(echo -n "c2VjcmV0LWFwaS1rZXktMTIzNDU2Nzg5MA==" | base64)
EOF
# Apply the Secret
kubectl apply -f ~/k8s-labs/configmaps-secrets/app-secret.yaml
# Verify the Secret
kubectl get secret webapp-secret
kubectl describe secret webapp-secret
# Create a Deployment that uses ConfigMap and Secret
cat > ~/k8s-labs/configmaps-secrets/webapp-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: nginx:1.21
ports:
- containerPort: 80
envFrom:
- configMapRef:
name: webapp-config
- secretRef:
name: webapp-secret
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: secret-volume
mountPath: /etc/secret
readOnly: true
volumes:
- name: config-volume
configMap:
name: webapp-config
- name: secret-volume
secret:
secretName: webapp-secret
EOF
# Apply the Deployment
kubectl apply -f ~/k8s-labs/configmaps-secrets/webapp-deployment.yaml
# Verify the Deployment
kubectl get deployments
kubectl get pods -l app=webapp
# Get one of the Pod names
POD_NAME=$(kubectl get pods -l app=webapp -o jsonpath='{.items[0].metadata.name}')
# Verify ConfigMap values as environment variables
kubectl exec $POD_NAME -- env | grep -E 'DB_HOST|APP_COLOR'
# Verify ConfigMap values as mounted files
kubectl exec $POD_NAME -- ls -la /etc/config
kubectl exec $POD_NAME -- cat /etc/config/APP_COLOR
# Verify Secret values as environment variables (they will be decoded)
kubectl exec $POD_NAME -- env | grep -E 'DB_USER|DB_PASSWORD'
# Verify Secret values as mounted files
kubectl exec $POD_NAME -- ls -la /etc/secret
kubectl exec $POD_NAME -- cat /etc/secret/DB_USER
# Update the ConfigMap
kubectl edit configmap webapp-config
# Change APP_COLOR from "blue" to "green"
# Save and exit
# Wait a moment for the changes to propagate to the mounted files
sleep 10
# Check if the mounted file was updated
kubectl exec $POD_NAME -- cat /etc/config/APP_COLOR
# Note: Environment variables from ConfigMaps don't update automatically
# You need to restart the Pod to pick up changes
kubectl exec $POD_NAME -- env | grep APP_COLOR
# Restart the deployment to pick up environment variable changes
kubectl rollout restart deployment webapp
# Wait for the rollout to complete
kubectl rollout status deployment webapp
# Get a new Pod name
POD_NAME=$(kubectl get pods -l app=webapp -o jsonpath='{.items[0].metadata.name}')
# Verify the updated environment variable
kubectl exec $POD_NAME -- env | grep APP_COLOR
# Update the Deployment image
kubectl set image deployment/webapp webapp=nginx:1.22 --record
# Watch the rolling update
kubectl rollout status deployment webapp
# View the rollout history
kubectl rollout history deployment webapp
# Verify the updated image version
kubectl describe deployment webapp | grep Image
# Roll back to the previous version
kubectl rollout undo deployment webapp
# Verify the rollback
kubectl rollout status deployment webapp
kubectl describe deployment webapp | grep Image
# Delete the resources we created
kubectl delete deployment webapp
kubectl delete configmap webapp-config
kubectl delete secret webapp-secret
Configure Services for Pod discovery, implement NetworkPolicies, and understand DNS resolution in Kubernetes.
# Create namespaces for our networking lab
kubectl create namespace frontend
kubectl create namespace backend
kubectl create namespace restricted
# Label the namespaces for better visibility
kubectl label namespace frontend purpose=networking-lab tier=frontend
kubectl label namespace backend purpose=networking-lab tier=backend
kubectl label namespace restricted purpose=networking-lab tier=restricted
# Create a backend API deployment and service
cat > ~/k8s-labs/networking/backend-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: backend
spec:
replicas: 2
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: nginx:1.21
ports:
- containerPort: 80
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
initContainers:
- name: init-html
image: busybox
command: ['/bin/sh', '-c', 'echo "<h1>Backend API</h1>" > /html/index.html']
volumeMounts:
- name: html
mountPath: /html
volumes:
- name: html
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: backend
spec:
selector:
app: api
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
kubectl apply -f ~/k8s-labs/networking/backend-deployment.yaml
# Create a frontend web deployment and service
cat > ~/k8s-labs/networking/frontend-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
namespace: frontend
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: nginx:1.21
ports:
- containerPort: 80
volumeMounts:
- name: html
mountPath: /usr/share/nginx/html
initContainers:
- name: init-html
image: busybox
command: ['/bin/sh', '-c', 'echo "<h1>Frontend Web</h1>" > /html/index.html']
volumeMounts:
- name: html
mountPath: /html
volumes:
- name: html
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: web-service
namespace: frontend
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
kubectl apply -f ~/k8s-labs/networking/frontend-deployment.yaml
# Create a restricted deployment for testing network policies
cat > ~/k8s-labs/networking/restricted-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: tester
namespace: restricted
spec:
replicas: 1
selector:
matchLabels:
app: tester
template:
metadata:
labels:
app: tester
spec:
containers:
- name: tester
image: busybox
command: ['sh', '-c', 'while true; do sleep 3600; done']
EOF
kubectl apply -f ~/k8s-labs/networking/restricted-deployment.yaml
# Get Pod names
FRONTEND_POD=$(kubectl get pods -n frontend -l app=web -o jsonpath='{.items[0].metadata.name}')
BACKEND_POD=$(kubectl get pods -n backend -l app=api -o jsonpath='{.items[0].metadata.name}')
RESTRICTED_POD=$(kubectl get pods -n restricted -l app=tester -o jsonpath='{.items[0].metadata.name}')
# Test frontend to backend connectivity (should work)
kubectl exec -n frontend $FRONTEND_POD -- curl -s api-service.backend.svc.cluster.local
# Test backend to frontend connectivity (should work)
kubectl exec -n backend $BACKEND_POD -- curl -s web-service.frontend.svc.cluster.local
# Test restricted to backend connectivity (should work)
kubectl exec -n restricted $RESTRICTED_POD -- wget -qO- api-service.backend.svc.cluster.local
# Test restricted to frontend connectivity (should work)
kubectl exec -n restricted $RESTRICTED_POD -- wget -qO- web-service.frontend.svc.cluster.local
# Create default deny policy for the backend namespace
cat > ~/k8s-labs/networking/backend-default-deny.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: backend
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
kubectl apply -f ~/k8s-labs/networking/backend-default-deny.yaml
# Test connectivity again (should fail)
kubectl exec -n frontend $FRONTEND_POD -- curl -s --connect-timeout 5 api-service.backend.svc.cluster.local
kubectl exec -n restricted $RESTRICTED_POD -- wget -qO- --timeout=5 api-service.backend.svc.cluster.local
# Create a NetworkPolicy to allow frontend to backend traffic
cat > ~/k8s-labs/networking/backend-allow-frontend.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend
namespace: backend
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
tier: frontend
podSelector:
matchLabels:
app: web
ports:
- protocol: TCP
port: 80
EOF
kubectl apply -f ~/k8s-labs/networking/backend-allow-frontend.yaml
# Create a NetworkPolicy to allow outbound DNS
cat > ~/k8s-labs/networking/backend-allow-dns.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: backend
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
EOF
kubectl apply -f ~/k8s-labs/networking/backend-allow-dns.yaml
# Test frontend to backend connectivity (should work)
kubectl exec -n frontend $FRONTEND_POD -- curl -s api-service.backend.svc.cluster.local
# Test restricted to backend connectivity (should still fail)
kubectl exec -n restricted $RESTRICTED_POD -- wget -qO- --timeout=5 api-service.backend.svc.cluster.local || echo "Connection timed out as expected"
# Test backend Pod's ability to resolve DNS
kubectl exec -n backend $BACKEND_POD -- nslookup kubernetes.default.svc.cluster.local
# Create a temporary debugging Pod to explore DNS
kubectl run dns-test --image=busybox -n default -- sleep 3600
# Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/dns-test -n default
# Test DNS resolution for services
kubectl exec -it dns-test -- nslookup kubernetes.default.svc.cluster.local
kubectl exec -it dns-test -- nslookup api-service.backend.svc.cluster.local
kubectl exec -it dns-test -- nslookup web-service.frontend.svc.cluster.local
# Check DNS configuration
kubectl exec -it dns-test -- cat /etc/resolv.conf
# Test Pod DNS (if enabled in your cluster)
BACKEND_POD_IP=$(kubectl get pod $BACKEND_POD -n backend -o jsonpath='{.status.podIP}')
kubectl exec -it dns-test -- nslookup $BACKEND_POD_IP.backend.pod.cluster.local
# Delete the debugging Pod
kubectl delete pod dns-test -n default
# Create a NodePort service for frontend
cat > ~/k8s-labs/networking/frontend-nodeport.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: web-nodeport
namespace: frontend
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
nodePort: 30080
type: NodePort
EOF
kubectl apply -f ~/k8s-labs/networking/frontend-nodeport.yaml
# Check the NodePort service
kubectl get service web-nodeport -n frontend
# If in a cloud environment, create a LoadBalancer service
# (Skip this step if not in a cloud environment)
cat > ~/k8s-labs/networking/frontend-loadbalancer.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: web-lb
namespace: frontend
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
type: LoadBalancer
EOF
kubectl apply -f ~/k8s-labs/networking/frontend-loadbalancer.yaml
# Check the LoadBalancer service
kubectl get service web-lb -n frontend -w
Install and configure Ingress controllers, create Ingress resources, and explore the Gateway API features.
# Create a namespace for the ingress controller
kubectl create namespace ingress-nginx
# Install the NGINX Ingress Controller using Helm or Manifests
# Option 1: Using Helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx
# Option 2: Using Manifests
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.0/deploy/static/provider/cloud/deploy.yaml
# Wait for the Ingress controller to be ready
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=120s
# Create an Ingress resource for the frontend application
cat > ~/k8s-labs/ingress/frontend-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: frontend-ingress
namespace: frontend
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: frontend.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/ingress/frontend-ingress.yaml
# Create an Ingress resource for the backend application
cat > ~/k8s-labs/ingress/backend-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backend-ingress
namespace: backend
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/ingress/backend-ingress.yaml
# Get the Ingress controller's external IP or hostname
INGRESS_HOST=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# If the above command doesn't work, try this one (for hostname-based load balancers)
if [ -z "$INGRESS_HOST" ]; then
INGRESS_HOST=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
fi
# If running locally or on minikube, you may need to use the NodePort
if [ -z "$INGRESS_HOST" ]; then
INGRESS_HOST=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}')
INGRESS_PORT=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}')
fi
# Test the Ingress for frontend
curl -H "Host: frontend.example.com" http://$INGRESS_HOST
# Test the Ingress for backend
curl -H "Host: api.example.com" http://$INGRESS_HOST
# Generate a self-signed certificate
mkdir -p ~/k8s-labs/ingress/certs
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout ~/k8s-labs/ingress/certs/tls.key \
-out ~/k8s-labs/ingress/certs/tls.crt \
-subj "/CN=*.example.com"
# Create TLS Secret for frontend
kubectl create secret tls frontend-tls \
--key ~/k8s-labs/ingress/certs/tls.key \
--cert ~/k8s-labs/ingress/certs/tls.crt \
-n frontend
# Create TLS Secret for backend
kubectl create secret tls backend-tls \
--key ~/k8s-labs/ingress/certs/tls.key \
--cert ~/k8s-labs/ingress/certs/tls.crt \
-n backend
# Update the Ingress resources to use TLS
cat > ~/k8s-labs/ingress/frontend-ingress-tls.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: frontend-ingress
namespace: frontend
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
tls:
- hosts:
- frontend.example.com
secretName: frontend-tls
rules:
- host: frontend.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/ingress/frontend-ingress-tls.yaml
cat > ~/k8s-labs/ingress/backend-ingress-tls.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backend-ingress
namespace: backend
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: backend-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/ingress/backend-ingress-tls.yaml
# Check if Gateway API CRDs are installed
kubectl get crd gateways.gateway.networking.k8s.io &> /dev/null
if [ $? -ne 0 ]; then
echo "Gateway API CRDs not found. Installing..."
# Install Gateway API CRDs
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.6.2/standard-install.yaml
# Wait for CRDs to be established
kubectl wait --for=condition=established crd/gateways.gateway.networking.k8s.io --timeout=60s
kubectl wait --for=condition=established crd/httproutes.gateway.networking.k8s.io --timeout=60s
fi
# Create a GatewayClass
cat > ~/k8s-labs/gateway-api/gateway-class.yaml << EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: GatewayClass
metadata:
name: example-gateway-class
spec:
controllerName: example.com/gateway-controller
EOF
kubectl apply -f ~/k8s-labs/gateway-api/gateway-class.yaml
# Create a Gateway
cat > ~/k8s-labs/gateway-api/gateway.yaml << EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
name: example-gateway
namespace: default
spec:
gatewayClassName: example-gateway-class
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
EOF
kubectl apply -f ~/k8s-labs/gateway-api/gateway.yaml
# Create HTTPRoute resources for our services
cat > ~/k8s-labs/gateway-api/frontend-route.yaml << EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: frontend-route
namespace: frontend
spec:
parentRefs:
- name: example-gateway
namespace: default
hostnames:
- "frontend.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: web-service
port: 80
EOF
kubectl apply -f ~/k8s-labs/gateway-api/frontend-route.yaml
cat > ~/k8s-labs/gateway-api/backend-route.yaml << EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: backend-route
namespace: backend
spec:
parentRefs:
- name: example-gateway
namespace: default
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: api-service
port: 80
EOF
kubectl apply -f ~/k8s-labs/gateway-api/backend-route.yaml
# Note: To actually use these Gateway API resources, you need a gateway controller
# that implements the Gateway API (e.g., Contour, Istio, etc.)
# Delete Gateway API resources (if created)
kubectl delete httproute frontend-route -n frontend --ignore-not-found
kubectl delete httproute backend-route -n backend --ignore-not-found
kubectl delete gateway example-gateway -n default --ignore-not-found
kubectl delete gatewayclass example-gateway-class --ignore-not-found
# Delete Ingress resources
kubectl delete ingress frontend-ingress -n frontend --ignore-not-found
kubectl delete ingress backend-ingress -n backend --ignore-not-found
kubectl delete secret frontend-tls -n frontend --ignore-not-found
kubectl delete secret backend-tls -n backend --ignore-not-found
# Delete the NGINX Ingress Controller
helm uninstall ingress-nginx -n ingress-nginx --ignore-not-found
# Or if installed via manifests:
# kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.0/deploy/static/provider/cloud/deploy.yaml
# Delete namespace
kubectl delete namespace ingress-nginx --ignore-not-found
# Delete deployments and services
kubectl delete -f ~/k8s-labs/networking/frontend-deployment.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/backend-deployment.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/restricted-deployment.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/frontend-nodeport.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/frontend-loadbalancer.yaml --ignore-not-found
# Delete NetworkPolicies
kubectl delete -f ~/k8s-labs/networking/backend-default-deny.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/backend-allow-frontend.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/networking/backend-allow-dns.yaml --ignore-not-found
# Delete namespaces
kubectl delete namespace frontend --ignore-not-found
kubectl delete namespace backend --ignore-not-found
kubectl delete namespace restricted --ignore-not-found
The following exercises build upon our lab work to cover additional workload and networking scenarios that commonly appear in the CKA exam.
You need to deploy a stateful application that requires stable network identities and persistent storage. The pods must maintain their identity across rescheduling and restarts.
# Create a namespace for our StatefulSet exercise
kubectl create namespace stateful-demo
# Create a headless service (service with no cluster IP)
cat > ~/k8s-practice/statefulset/headless-service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: nginx-headless
namespace: stateful-demo
spec:
selector:
app: nginx-stateful
ports:
- port: 80
name: web
clusterIP: None # This makes it a headless service
EOF
kubectl apply -f ~/k8s-practice/statefulset/headless-service.yaml
# Create a StatefulSet with 3 replicas
cat > ~/k8s-practice/statefulset/nginx-statefulset.yaml << EOF
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: stateful-demo
spec:
serviceName: "nginx-headless"
replicas: 3
selector:
matchLabels:
app: nginx-stateful
template:
metadata:
labels:
app: nginx-stateful
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
EOF
kubectl apply -f ~/k8s-practice/statefulset/nginx-statefulset.yaml
# Wait for all pods to be created
kubectl rollout status statefulset/web -n stateful-demo
# Verify pod creation and naming convention
kubectl get pods -n stateful-demo
# Check the DNS names of the pods
kubectl run -n stateful-demo dns-test --image=busybox --restart=Never -- sleep 3600
kubectl wait --for=condition=Ready pod/dns-test -n stateful-demo
kubectl exec -n stateful-demo dns-test -- nslookup web-0.nginx-headless.stateful-demo.svc.cluster.local
kubectl exec -n stateful-demo dns-test -- nslookup web-1.nginx-headless.stateful-demo.svc.cluster.local
kubectl exec -n stateful-demo dns-test -- nslookup web-2.nginx-headless.stateful-demo.svc.cluster.local
# Scale the StatefulSet to 5 replicas
kubectl scale statefulset web -n stateful-demo --replicas=5
# Verify the new pods follow the naming convention
kubectl get pods -n stateful-demo -w
# (Press Ctrl+C to exit the watch when all pods are ready)
# Delete one pod and observe it being recreated with the same name
kubectl delete pod web-2 -n stateful-demo
kubectl get pods -n stateful-demo -w
# (Press Ctrl+C to exit the watch when all pods are ready)
# Check that the PVCs are preserved
kubectl get pvc -n stateful-demo
# Clean up
kubectl delete statefulset web -n stateful-demo
kubectl delete service nginx-headless -n stateful-demo
kubectl delete pod dns-test -n stateful-demo
kubectl delete pvc --all -n stateful-demo
kubectl delete namespace stateful-demo
You need to deploy a monitoring agent that must run on all nodes in your cluster. Additionally, you want to deploy a specialized logging agent that should only run on nodes with an SSD.
# Create a namespace for our DaemonSet exercise
kubectl create namespace daemon-demo
# Create a DaemonSet for a monitoring agent
cat > ~/k8s-practice/daemonset/monitoring-agent.yaml << EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: monitoring-agent
namespace: daemon-demo
spec:
selector:
matchLabels:
app: monitoring-agent
template:
metadata:
labels:
app: monitoring-agent
spec:
containers:
- name: agent
image: busybox
command: ['sh', '-c', 'while true; do echo "Monitoring node $(hostname)..."; sleep 300; done']
EOF
kubectl apply -f ~/k8s-practice/daemonset/monitoring-agent.yaml
# Verify the monitoring agent is deployed on all nodes
kubectl get pods -n daemon-demo -o wide
# Label some nodes to indicate they have SSDs
# Get a list of nodes and label half of them with disk=ssd
NODE_LIST=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')
NODE_ARRAY=($NODE_LIST)
HALF_COUNT=$((${#NODE_ARRAY[@]} / 2))
for ((i=0; i<HALF_COUNT; i++)); do
kubectl label node ${NODE_ARRAY[$i]} disk=ssd
done
# Verify the labels
kubectl get nodes --show-labels | grep disk=ssd
# Create a DaemonSet for a logging agent that only runs on SSD nodes
cat > ~/k8s-practice/daemonset/logging-agent.yaml << EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: logging-agent
namespace: daemon-demo
spec:
selector:
matchLabels:
app: logging-agent
template:
metadata:
labels:
app: logging-agent
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disk
operator: In
values:
- ssd
containers:
- name: agent
image: busybox
command: ['sh', '-c', 'while true; do echo "Logging from SSD node $(hostname)..."; sleep 300; done']
EOF
kubectl apply -f ~/k8s-practice/daemonset/logging-agent.yaml
# Verify the logging agent is only deployed on nodes with the SSD label
kubectl get pods -n daemon-demo -l app=logging-agent -o wide
# Clean up
kubectl delete daemonset monitoring-agent logging-agent -n daemon-demo
kubectl delete namespace daemon-demo
# Remove the disk=ssd label from nodes
for ((i=0; i<HALF_COUNT; i++)); do
kubectl label node ${NODE_ARRAY[$i]} disk-
done
You need to deploy an application that requires a sidecar container for log processing. The main application will write logs to a shared volume, and the sidecar container will process these logs.
# Create a namespace for our multi-container exercise
kubectl create namespace multicontainer-demo
# Create a multi-container Pod with a shared volume
cat > ~/k8s-practice/multicontainer/sidecar-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: log-processor
namespace: multicontainer-demo
spec:
containers:
- name: main-app
image: busybox
command: ['sh', '-c', 'while true; do echo "$(date): Application log entry $(RANDOM)" >> /var/log/app.log; sleep 5; done']
volumeMounts:
- name: log-volume
mountPath: /var/log
- name: log-processor
image: busybox
command: ['sh', '-c', 'tail -f /var/log/app.log | while read log; do echo "Processed: $log" >> /var/log/processed.log; done']
volumeMounts:
- name: log-volume
mountPath: /var/log
volumes:
- name: log-volume
emptyDir: {}
EOF
kubectl apply -f ~/k8s-practice/multicontainer/sidecar-pod.yaml
# Wait for the Pod to be running
kubectl get pod log-processor -n multicontainer-demo -w
# (Press Ctrl+C to exit the watch when the pod is ready)
# Verify that logs are being generated
kubectl exec -n multicontainer-demo log-processor -c main-app -- cat /var/log/app.log | tail -5
# Verify that logs are being processed
kubectl exec -n multicontainer-demo log-processor -c log-processor -- cat /var/log/processed.log | tail -5
# Check the logs of the sidecar container
kubectl logs -n multicontainer-demo log-processor -c log-processor
# Clean up
kubectl delete pod log-processor -n multicontainer-demo
kubectl delete namespace multicontainer-demo
You need to create a one-time job that processes data and a recurring job that performs a backup every 5 minutes.
# Create a namespace for our jobs exercise
kubectl create namespace jobs-demo
# Create a Job to process data
cat > ~/k8s-practice/jobs/data-processor.yaml << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: data-processor
namespace: jobs-demo
spec:
completions: 5 # Run 5 successful pod completions
parallelism: 2 # Run 2 pods in parallel
backoffLimit: 3 # Retry 3 times before marking job as failed
template:
spec:
containers:
- name: processor
image: busybox
command: ['sh', '-c', 'echo "Processing data chunk $RANDOM..."; sleep 10; echo "Data processing complete"']
restartPolicy: Never
EOF
kubectl apply -f ~/k8s-practice/jobs/data-processor.yaml
# Watch the job progress
kubectl get jobs -n jobs-demo -w
# (Press Ctrl+C to exit the watch when the job completes)
# View the pods created by the job
kubectl get pods -n jobs-demo
# View the logs from one of the completed pods
POD_NAME=$(kubectl get pods -n jobs-demo -l job-name=data-processor -o jsonpath='{.items[0].metadata.name}')
kubectl logs $POD_NAME -n jobs-demo
# Create a CronJob for periodic backups
cat > ~/k8s-practice/jobs/backup-cronjob.yaml << EOF
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup-job
namespace: jobs-demo
spec:
schedule: "*/5 * * * *" # Run every 5 minutes
concurrencyPolicy: Forbid # Don't allow concurrent executions
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: busybox
command: ['sh', '-c', 'echo "Backup started at $(date)"; sleep 30; echo "Backup completed at $(date)"']
restartPolicy: OnFailure
EOF
kubectl apply -f ~/k8s-practice/jobs/backup-cronjob.yaml
# Check the CronJob
kubectl get cronjobs -n jobs-demo
# Wait for the first execution (this may take up to 5 minutes)
echo "Waiting for the first CronJob execution..."
sleep 300
kubectl get jobs -n jobs-demo
# View the logs from the CronJob's most recent execution
CRONJOB_POD=$(kubectl get pods -n jobs-demo -l job-name=backup-job-$(date +%s | cut -c 1-10) -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -z "$CRONJOB_POD" ]; then
CRONJOB_POD=$(kubectl get pods -n jobs-demo -o name | grep backup-job | head -1 | sed 's/pod\///')
fi
kubectl logs $CRONJOB_POD -n jobs-demo
# Clean up
kubectl delete job data-processor -n jobs-demo
kubectl delete cronjob backup-job -n jobs-demo
kubectl delete namespace jobs-demo
You are securing a multi-tier application with network policies and need to expose different components using various service types.
# Create a namespace for our networking exercise
kubectl create namespace advanced-networking
# Deploy a three-tier application
# 1. Database tier
cat > ~/k8s-practice/networking/database.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: database
namespace: advanced-networking
spec:
replicas: 1
selector:
matchLabels:
app: database
tier: db
template:
metadata:
labels:
app: database
tier: db
spec:
containers:
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
- name: MYSQL_DATABASE
value: "testdb"
ports:
- containerPort: 3306
---
apiVersion: v1
kind: Service
metadata:
name: database-service
namespace: advanced-networking
spec:
selector:
app: database
tier: db
ports:
- port: 3306
targetPort: 3306
type: ClusterIP
EOF
kubectl apply -f ~/k8s-practice/networking/database.yaml
# 2. Backend API tier
cat > ~/k8s-practice/networking/backend.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: advanced-networking
spec:
replicas: 2
selector:
matchLabels:
app: backend-api
tier: backend
template:
metadata:
labels:
app: backend-api
tier: backend
spec:
containers:
- name: api
image: nginx:1.21
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: advanced-networking
spec:
selector:
app: backend-api
tier: backend
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
kubectl apply -f ~/k8s-practice/networking/backend.yaml
# 3. Frontend tier
cat > ~/k8s-practice/networking/frontend.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: advanced-networking
spec:
replicas: 3
selector:
matchLabels:
app: frontend
tier: frontend
template:
metadata:
labels:
app: frontend
tier: frontend
spec:
containers:
- name: web
image: nginx:1.21
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: frontend-service
namespace: advanced-networking
spec:
selector:
app: frontend
tier: frontend
ports:
- port: 80
targetPort: 80
nodePort: 30080
type: NodePort
EOF
kubectl apply -f ~/k8s-practice/networking/frontend.yaml
# 4. Create an ExternalName service for an external API
cat > ~/k8s-practice/networking/external-service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: external-api
namespace: advanced-networking
spec:
type: ExternalName
externalName: api.example.com
EOF
kubectl apply -f ~/k8s-practice/networking/external-service.yaml
# Create Network Policies
# 1. Default deny all ingress and egress traffic
cat > ~/k8s-practice/networking/default-deny.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: advanced-networking
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
kubectl apply -f ~/k8s-practice/networking/default-deny.yaml
# 2. Allow backend to access database
cat > ~/k8s-practice/networking/backend-to-db.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-to-db
namespace: advanced-networking
spec:
podSelector:
matchLabels:
app: database
tier: db
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend-api
tier: backend
ports:
- protocol: TCP
port: 3306
EOF
kubectl apply -f ~/k8s-practice/networking/backend-to-db.yaml
# 3. Allow frontend to access backend
cat > ~/k8s-practice/networking/frontend-to-backend.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-to-backend
namespace: advanced-networking
spec:
podSelector:
matchLabels:
app: backend-api
tier: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
tier: frontend
ports:
- protocol: TCP
port: 80
EOF
kubectl apply -f ~/k8s-practice/networking/frontend-to-backend.yaml
# 4. Allow external access to frontend
cat > ~/k8s-practice/networking/external-to-frontend.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: external-to-frontend
namespace: advanced-networking
spec:
podSelector:
matchLabels:
app: frontend
tier: frontend
policyTypes:
- Ingress
ingress:
- from: []
ports:
- protocol: TCP
port: 80
EOF
kubectl apply -f ~/k8s-practice/networking/external-to-frontend.yaml
# 5. Allow DNS resolution
cat > ~/k8s-practice/networking/allow-dns.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: advanced-networking
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
EOF
kubectl apply -f ~/k8s-practice/networking/allow-dns.yaml
# Test connectivity
# Create a debugging pod
kubectl run -n advanced-networking debug-pod --image=nicolaka/netshoot --restart=Never
# Wait for the pod to be ready
kubectl wait --for=condition=Ready pod/debug-pod -n advanced-networking
# Test connectivity from debug pod to database (should fail due to network policy)
kubectl exec -n advanced-networking debug-pod -- curl -s --connect-timeout 5 database-service:3306
# Expected result: Connection times out
# Test connectivity from debug pod to backend (should fail due to network policy)
kubectl exec -n advanced-networking debug-pod -- curl -s --connect-timeout 5 backend-service
# Expected result: Connection times out
# Test connectivity from debug pod to frontend (should fail due to network policy)
kubectl exec -n advanced-networking debug-pod -- curl -s --connect-timeout 5 frontend-service
# Expected result: Connection times out
# Test connectivity from frontend pod to backend (should work)
FRONTEND_POD=$(kubectl get pods -n advanced-networking -l app=frontend -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n advanced-networking $FRONTEND_POD -- curl -s backend-service
# Expected result: Returns the NGINX welcome page
# Test connectivity from backend pod to database (should work but we'll see a MySQL error)
BACKEND_POD=$(kubectl get pods -n advanced-networking -l app=backend-api -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n advanced-networking $BACKEND_POD -- bash -c "apt-get update && apt-get install -y netcat && nc -zv database-service 3306"
# Expected result: Connection succeeds to the MySQL port
# Test ExternalName service resolution
kubectl exec -n advanced-networking $FRONTEND_POD -- nslookup external-api
# Expected result: Shows CNAME record pointing to api.example.com
# Clean up
kubectl delete -f ~/k8s-practice/networking/default-deny.yaml
kubectl delete -f ~/k8s-practice/networking/backend-to-db.yaml
kubectl delete -f ~/k8s-practice/networking/frontend-to-backend.yaml
kubectl delete -f ~/k8s-practice/networking/external-to-frontend.yaml
kubectl delete -f ~/k8s-practice/networking/allow-dns.yaml
kubectl delete -f ~/k8s-practice/networking/database.yaml
kubectl delete -f ~/k8s-practice/networking/backend.yaml
kubectl delete -f ~/k8s-practice/networking/frontend.yaml
kubectl delete -f ~/k8s-practice/networking/external-service.yaml
kubectl delete pod debug-pod -n advanced-networking
kubectl delete namespace advanced-networking
You need to deploy an application with appropriate resource limits and health checks to ensure it’s scheduled correctly and remains healthy.
# Create a namespace for our resource management exercise
kubectl create namespace resource-demo
# Deploy an application with resource constraints and health checks
cat > ~/k8s-practice/resources/app-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: resource-app
namespace: resource-demo
spec:
replicas: 2
selector:
matchLabels:
app: resource-app
template:
metadata:
labels:
app: resource-app
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 3
volumeMounts:
- name: health-toggle
mountPath: /usr/share/nginx/html/health
volumes:
- name: health-toggle
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: resource-app-service
namespace: resource-demo
spec:
selector:
app: resource-app
ports:
- port: 80
targetPort: 80
type: ClusterIP
EOF
kubectl apply -f ~/k8s-practice/resources/app-deployment.yaml
# Check pod status and resource assignments
kubectl get pods -n resource-demo -o wide
kubectl describe pods -n resource-demo | grep -A 8 "Limits"
# Test health checks
POD_NAME=$(kubectl get pods -n resource-demo -o jsonpath='{.items[0].metadata.name}')
# Check if probes are working properly
kubectl describe pod $POD_NAME -n resource-demo | grep -A 15 "Liveness"
kubectl describe pod $POD_NAME -n resource-demo | grep -A 15 "Readiness"
# Cause the liveness probe to fail (creates a file that will return 404)
kubectl exec $POD_NAME -n resource-demo -- mkdir -p /usr/share/nginx/html/health
kubectl exec $POD_NAME -n resource-demo -- touch /usr/share/nginx/html/health/unhealthy
# Watch the pod being restarted by the kubelet due to failing liveness probe
kubectl get pods -n resource-demo -w
# (Wait for ~15-20 seconds until you see the pod restart, then press Ctrl+C)
# Verify the restart count increased
kubectl get pod $POD_NAME -n resource-demo -o jsonpath='{.status.containerStatuses[0].restartCount}'
# Create a second Pod with excessive resource requests to test scheduling
cat > ~/k8s-practice/resources/large-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: large-pod
namespace: resource-demo
spec:
containers:
- name: memory-hog
image: ubuntu
command: ["sleep", "3600"]
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "2000m"
EOF
kubectl apply -f ~/k8s-practice/resources/large-pod.yaml
# Check if the pod was scheduled or is pending
kubectl get pods -n resource-demo
# Check the pod's events to understand scheduling decisions
kubectl describe pod large-pod -n resource-demo | grep -A 10 "Events"
# Clean up
kubectl delete -f ~/k8s-practice/resources/app-deployment.yaml
kubectl delete -f ~/k8s-practice/resources/large-pod.yaml
kubectl delete namespace resource-demo
Persistent Volumes are cluster resources that provide storage in Kubernetes. They are provisioned by an administrator or dynamically using Storage Classes. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV.
Key characteristics of Persistent Volumes include:
The PV subsystem provides an API abstraction that hides how storage is provided from how it is consumed. This separation enables administrators to manage the storage resources independently from applications that use them.
Persistent Volume Claims are requests for storage by a user or application. They express the desired characteristics of storage, similar to how Pods consume node resources. PVCs request specific size, access modes, and storage classes.
Important aspects of PVCs include:
The PVC acts as an abstraction layer between the Pod and the underlying storage, allowing developers to request storage without needing to know the specifics of the underlying infrastructure.
Storage Classes provide a way to describe different “classes” of storage offered in a cluster. They help to abstract the underlying storage provider, provisioner, and parameters, enabling dynamic provisioning of storage.
Key features of Storage Classes include:
Storage Classes make it possible for administrators to offer various storage options without users needing to know the details of how they are implemented.
Dynamic provisioning allows storage volumes to be created on-demand, eliminating the need for cluster administrators to pre-provision storage. When a user creates a PVC that requests a Storage Class, a new PV is automatically created to fulfill the claim.
Benefits of dynamic provisioning include:
Dynamic provisioning requires a properly configured Storage Class and a storage backend that supports it. Many cloud providers offer Storage Class implementations that integrate with their storage services.
Kubernetes supports many volume types, each providing different storage capabilities:
The selection of volume type depends on factors like data persistence requirements, sharing needs, performance characteristics, and the underlying infrastructure.
Access modes determine how a volume can be mounted on hosts. The three access modes are:
Not all volume types support all access modes. For example, AWS EBS volumes only support ReadWriteOnce, while NFS volumes typically support all three modes.
The access mode is a property of the PV and is requested in the PVC. During binding, the PV and PVC must have compatible access modes.
Reclaim policies determine what happens to a Persistent Volume when its claim is deleted. The available policies are:
The Recycle policy is deprecated and will be removed in a future release. The recommended alternatives are dynamic provisioning or a custom operator for volume lifecycle management.
The reclaim policy is specified when a PV is created or can be part of a Storage Class definition for dynamically provisioned volumes.
The Container Storage Interface (CSI) is a standard for exposing arbitrary block and file storage systems to containerized workloads in Kubernetes. It allows storage vendors to develop a plugin once and have it work across multiple container orchestration systems.
Key aspects of CSI include:
CSI has become the preferred way to integrate storage systems with Kubernetes, replacing the older in-tree volume plugins which required changes to the core Kubernetes code.
Node selectors provide a simple way to constrain pods to nodes with specific labels. This basic form of scheduling ensures that pods run only on nodes that have required labels.
Key aspects include:
nodeSelector field in pod specifications defines required node labelsWhile node selectors are simple to use, they lack the expressiveness of node affinity for more complex scheduling requirements.
Node affinity is a more expressive way to constrain which nodes a pod can be scheduled on, based on node labels. It provides more control than simple node selectors, including “soft” preferences and complex matching expressions.
Node affinity types include:
Node anti-affinity can be achieved by using the NotIn or DoesNotExist operators in node affinity rules to prevent pods from being scheduled on specific nodes.
The “IgnoredDuringExecution” part means that if node labels change after pod scheduling, the pod will not be evicted.
Pod affinity and anti-affinity allow you to constrain which nodes pods can be scheduled on based on the labels of pods already running on the node, rather than the node labels themselves.
This enables scenarios such as:
Like node affinity, pod affinity comes in required and preferred varieties and uses the same “DuringSchedulingIgnoredDuringExecution” suffix.
Pod anti-affinity is particularly useful for ensuring high availability by spreading replicas across failure domains like nodes, racks, or availability zones.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. Taints are applied to nodes, and tolerations are applied to pods.
Key concepts include:
Taints and tolerations are particularly useful for dedicating nodes to specific workloads, creating specialized nodes, or gracefully draining nodes for maintenance.
Resource limits and requests are specifications for CPU and memory usage that help the scheduler make better decisions about pod placement and resource allocation.
Key concepts include:
Properly configuring resource requests and limits is essential for cluster stability and efficient resource utilization. Under-requesting resources can lead to resource contention, while over-requesting can lead to resource waste.
Pod priority and preemption allow higher-priority pods to be scheduled before lower-priority pods, and if necessary, by preempting (evicting) lower-priority pods.
Key aspects include:
This feature is crucial for ensuring that critical workloads get resources even in a constrained cluster, implementing service-level objectives for different workloads, and providing graceful degradation under pressure.
Pod Disruption Budgets protect applications from voluntary disruptions by limiting how many pods can be down simultaneously due to voluntary disruptions like node drains or scaling operations.
Key concepts include:
PDBs are essential for maintaining application availability during cluster maintenance operations, ensuring high availability and meeting service-level objectives.
Topology Spread Constraints allow control over how pods are spread across the cluster among failure domains such as regions, zones, nodes, or other user-defined topology domains.
Important aspects include:
When properly configured, topology spread constraints ensure that replicas of an application are distributed to maximize resilience to domain-level failures.
Create and manage Persistent Volumes, Persistent Volume Claims, and understand the binding process between them.
# Create a directory for our storage exercises
mkdir -p ~/k8s-labs/storage
# Create a simple Persistent Volume using hostPath (for demonstration purposes)
cat > ~/k8s-labs/storage/local-pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv
spec:
capacity:
storage: 1Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: manual
hostPath:
path: /mnt/data
EOF
# Before applying, ensure the directory exists on your node
# Note: In a real cluster environment, you would need to ensure this directory exists on the target node
# For this lab, we'll create it on the control plane node
sudo mkdir -p /mnt/data
# Apply the PV configuration
kubectl apply -f ~/k8s-labs/storage/local-pv.yaml
# Verify the PV has been created and is in Available status
kubectl get pv
kubectl describe pv local-pv
# Create a PVC that matches our PV
cat > ~/k8s-labs/storage/my-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: manual
EOF
# Apply the PVC configuration
kubectl apply -f ~/k8s-labs/storage/my-pvc.yaml
# Verify the PVC has been created and bound to our PV
kubectl get pvc
kubectl describe pvc my-pvc
# Check the PV status to confirm it's now bound
kubectl get pv
# Create a Pod that uses our PVC
cat > ~/k8s-labs/storage/pv-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pv-pod
spec:
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: my-pvc
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/storage/pv-pod.yaml
# Verify the Pod is running
kubectl get pods
kubectl describe pod pv-pod
# Wait for the Pod to be Running
kubectl wait --for=condition=Ready pod/pv-pod
# Write data to the persistent volume
kubectl exec pv-pod -- sh -c "echo 'Hello from Kubernetes storage' > /usr/share/nginx/html/index.html"
# Verify the data is written
kubectl exec pv-pod -- cat /usr/share/nginx/html/index.html
# Delete the Pod
kubectl delete pod pv-pod
# Create another Pod using the same PVC
cat > ~/k8s-labs/storage/pv-pod2.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pv-pod2
spec:
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: my-pvc
EOF
# Apply the second Pod configuration
kubectl apply -f ~/k8s-labs/storage/pv-pod2.yaml
# Wait for the Pod to be Running
kubectl wait --for=condition=Ready pod/pv-pod2
# Verify the data is still there (demonstrating persistence)
kubectl exec pv-pod2 -- cat /usr/share/nginx/html/index.html
# Delete the PVC
kubectl delete pvc my-pvc
# Check the status of the PV (should be Released, but not deleted due to Retain policy)
kubectl get pv
# Change the reclaim policy to Delete
kubectl patch pv local-pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
# Create a new PVC
cat > ~/k8s-labs/storage/new-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: new-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: manual
EOF
# Since our PV is in Released state, we need to manually make it Available again
# In a real environment, an admin would reclaim the volume
kubectl patch pv local-pv -p '{"spec":{"claimRef": null}}'
# Apply the new PVC
kubectl apply -f ~/k8s-labs/storage/new-pvc.yaml
# Verify binding
kubectl get pvc
kubectl get pv
# Delete the PVC to see the Delete policy in action
kubectl delete pvc new-pvc
# Check if the PV is also deleted
kubectl get pv
# Delete any remaining pods
kubectl delete pod pv-pod2 --ignore-not-found
# Delete any remaining PVCs
kubectl delete pvc --all
# Delete the PV
kubectl delete pv local-pv
Configure Storage Classes for dynamic provisioning of Persistent Volumes and understand how to use different provisioners.
# List the StorageClasses in your cluster
kubectl get storageclass
# Examine the default StorageClass (if any)
kubectl get storageclass -o yaml
# Create a StorageClass based on your environment
# The example below is for a generic environment and may need adaptation
# For GKE (Google Kubernetes Engine)
cat > ~/k8s-labs/storage/sc-standard.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
replication-type: none
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For AKS (Azure Kubernetes Service)
cat > ~/k8s-labs/storage/sc-standard.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: Standard_LRS
kind: Managed
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For EKS (Amazon Elastic Kubernetes Service)
cat > ~/k8s-labs/storage/sc-standard.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For Minikube or local development (with hostpath provisioner)
cat > ~/k8s-labs/storage/sc-standard.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: k8s.io/minikube-hostpath
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF
# Apply the StorageClass that matches your environment
kubectl apply -f ~/k8s-labs/storage/sc-standard.yaml
# Verify the StorageClass
kubectl get storageclass
kubectl describe storageclass standard
# Create a PVC that uses the StorageClass
cat > ~/k8s-labs/storage/dynamic-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dynamic-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: standard
EOF
# Apply the PVC
kubectl apply -f ~/k8s-labs/storage/dynamic-pvc.yaml
# Verify the PVC is bound and a PV is dynamically created
kubectl get pvc
kubectl get pv
# Create a Pod that uses our dynamic PVC
cat > ~/k8s-labs/storage/dynamic-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: dynamic-pod
spec:
containers:
- name: dynamic-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: dynamic-storage
volumes:
- name: dynamic-storage
persistentVolumeClaim:
claimName: dynamic-pvc
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/storage/dynamic-pod.yaml
# Verify the Pod is running
kubectl get pods
kubectl describe pod dynamic-pod
# Write data to the dynamically provisioned volume
kubectl wait --for=condition=Ready pod/dynamic-pod
kubectl exec dynamic-pod -- sh -c "echo 'This is from a dynamically provisioned volume' > /usr/share/nginx/html/index.html"
# Verify the data is written
kubectl exec dynamic-pod -- cat /usr/share/nginx/html/index.html
# Create a StorageClass with different parameters (example for a premium class)
# For GKE
cat > ~/k8s-labs/storage/sc-premium.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: none
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For AKS
cat > ~/k8s-labs/storage/sc-premium.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: Premium_LRS
kind: Managed
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For EKS
cat > ~/k8s-labs/storage/sc-premium.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "10"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
EOF
# For local development (with different reclaim policy)
cat > ~/k8s-labs/storage/sc-premium.yaml << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium
provisioner: k8s.io/minikube-hostpath
reclaimPolicy: Retain
volumeBindingMode: Immediate
EOF
# Apply the premium StorageClass
kubectl apply -f ~/k8s-labs/storage/sc-premium.yaml
# Create a PVC using the premium StorageClass
cat > ~/k8s-labs/storage/premium-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: premium-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: premium
EOF
kubectl apply -f ~/k8s-labs/storage/premium-pvc.yaml
# Verify the PVC is bound and a PV is dynamically created
kubectl get pvc
kubectl get pv
# Check current default StorageClass
kubectl get storageclass
# Remove the default annotation from any existing default StorageClass
CURRENT_DEFAULT=$(kubectl get storageclass -o jsonpath='{.items[?(@.metadata.annotations.storageclass\.kubernetes\.io/is-default-class=="true")].metadata.name}')
if [ -n "$CURRENT_DEFAULT" ]; then
kubectl patch storageclass $CURRENT_DEFAULT -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}'
fi
# Set our standard StorageClass as the default
kubectl patch storageclass standard -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'
# Verify the change
kubectl get storageclass
# Create a PVC without specifying a StorageClass (should use the default)
cat > ~/k8s-labs/storage/default-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: default-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
EOF
kubectl apply -f ~/k8s-labs/storage/default-pvc.yaml
# Verify which StorageClass was used
kubectl get pvc default-pvc -o jsonpath='{.spec.storageClassName}'
# Delete all the pods
kubectl delete pod dynamic-pod --ignore-not-found
# Delete all the PVCs
kubectl delete pvc dynamic-pvc premium-pvc default-pvc --ignore-not-found
# Delete the StorageClasses
kubectl delete storageclass standard premium --ignore-not-found
Configure advanced scheduling constraints for Pods using node selectors, node affinity, pod affinity, and anti-affinity.
# List nodes in the cluster
kubectl get nodes
# Add labels to your nodes (adjust the node names to match your cluster)
# If you have only one node, apply different labels to the same node
NODE1=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
NODE2=$(kubectl get nodes -o jsonpath='{.items[1].metadata.name}' 2>/dev/null || echo $NODE1)
kubectl label nodes $NODE1 disktype=ssd zone=us-east-1a
kubectl label nodes $NODE2 disktype=hdd zone=us-east-1b
# Verify node labels
kubectl get nodes --show-labels
# Create a directory for scheduling exercises
mkdir -p ~/k8s-labs/scheduling
# Create a Pod that uses nodeSelector
cat > ~/k8s-labs/scheduling/nodeselector-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: nginx-nodeselector
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
nodeSelector:
disktype: ssd
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/nodeselector-pod.yaml
# Verify the Pod got scheduled on the node with disktype=ssd
kubectl get pod nginx-nodeselector -o wide
# Create a Pod with required node affinity
cat > ~/k8s-labs/scheduling/nodeaffinity-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: nginx-nodeaffinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- us-east-1a
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: disktype
operator: In
values:
- ssd
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/nodeaffinity-pod.yaml
# Verify the Pod got scheduled on the node with zone=us-east-1a
kubectl get pod nginx-nodeaffinity -o wide
# First, create a Pod that other Pods will reference
cat > ~/k8s-labs/scheduling/webapp.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: webapp
labels:
app: webapp
spec:
containers:
- name: webapp
image: nginx
ports:
- containerPort: 80
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/webapp.yaml
# Wait for the webapp Pod to be Running
kubectl wait --for=condition=Ready pod/webapp
# Create a Pod with pod affinity (wants to run on the same node as the webapp)
cat > ~/k8s-labs/scheduling/podaffinity-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: redis-cache
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- webapp
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis
ports:
- containerPort: 6379
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/podaffinity-pod.yaml
# Create a Pod with pod anti-affinity (doesn't want to run on the same node as the webapp)
cat > ~/k8s-labs/scheduling/podantiaffinity-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: analytics
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- webapp
topologyKey: kubernetes.io/hostname
containers:
- name: analytics
image: nginx
ports:
- containerPort: 80
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/podantiaffinity-pod.yaml
# Verify where the Pods got scheduled
kubectl get pods -o wide
# The redis-cache Pod should be on the same node as webapp
# The analytics Pod should be on a different node than webapp
# If you have only one node, the analytics Pod will remain in Pending state
# Create a Deployment that spreads Pods across nodes with preferred anti-affinity
cat > ~/k8s-labs/scheduling/deployment-antiaffinity.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
replicas: 3
selector:
matchLabels:
app: web-server
template:
metadata:
labels:
app: web-server
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-server
topologyKey: kubernetes.io/hostname
containers:
- name: web-app
image: nginx
ports:
- containerPort: 80
EOF
# Apply the Deployment configuration
kubectl apply -f ~/k8s-labs/scheduling/deployment-antiaffinity.yaml
# Verify the Pods distribution across nodes
kubectl get pods -l app=web-server -o wide
# The Pods should be distributed across different nodes if possible
# If you have only one node, all Pods will still be scheduled but on the same node
# Delete all the created Pods and Deployments
kubectl delete -f ~/k8s-labs/scheduling/nodeselector-pod.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/scheduling/nodeaffinity-pod.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/scheduling/webapp.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/scheduling/podaffinity-pod.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/scheduling/podantiaffinity-pod.yaml --ignore-not-found
kubectl delete -f ~/k8s-labs/scheduling/deployment-antiaffinity.yaml --ignore-not-found
# Remove the labels from the nodes
kubectl label nodes $NODE1 disktype- zone-
kubectl label nodes $NODE2 disktype- zone-
Understand how taints and tolerations work to control which Pods can be scheduled on which nodes.
# List nodes in the cluster
kubectl get nodes
# Add a taint to a node (adjust the node name to match your cluster)
NODE1=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl taint nodes $NODE1 key=value:NoSchedule
# Verify the taint
kubectl describe node $NODE1 | grep Taints
# Create a Pod without tolerations
cat > ~/k8s-labs/scheduling/pod-without-toleration.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-without-toleration
spec:
containers:
- name: nginx
image: nginx
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/pod-without-toleration.yaml
# Check if the Pod was scheduled
kubectl get pods
kubectl describe pod pod-without-toleration
# If you have multiple nodes, the Pod should be scheduled on a node without the taint
# If you have only one node, the Pod will remain in Pending state
# Create a Pod with a toleration that matches the taint
cat > ~/k8s-labs/scheduling/pod-with-toleration.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-with-toleration
spec:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/pod-with-toleration.yaml
# Check if the Pod was scheduled
kubectl get pods
kubectl describe pod pod-with-toleration
# The Pod should be scheduled, even on the tainted node
# Create a Pod with a toleration that doesn't match the taint
cat > ~/k8s-labs/scheduling/pod-with-different-toleration.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-with-different-toleration
spec:
tolerations:
- key: "different-key"
operator: "Equal"
value: "different-value"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/pod-with-different-toleration.yaml
# Check if the Pod was scheduled
kubectl get pods
kubectl describe pod pod-with-different-toleration
# If you have multiple nodes, the Pod should be scheduled on a node without the taint
# If you have only one node, the Pod will remain in Pending state
# Add a NoExecute taint to the node (this will evict existing Pods without matching tolerations)
kubectl taint nodes $NODE1 another-key=another-value:NoExecute
# Create a Pod with a toleration for the NoExecute taint
cat > ~/k8s-labs/scheduling/pod-with-noexecute-toleration.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: pod-with-noexecute-toleration
spec:
tolerations:
- key: "another-key"
operator: "Equal"
value: "another-value"
effect: "NoExecute"
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/pod-with-noexecute-toleration.yaml
# Check if the Pod was scheduled and other Pods were evicted
kubectl get pods
kubectl describe pod pod-with-noexecute-toleration
# The new Pod should be scheduled, while Pods without the NoExecute toleration
# should have been evicted (or never scheduled if they were created after the taint)
# Delete all the created Pods
kubectl delete pod pod-without-toleration pod-with-toleration pod-with-different-toleration pod-with-noexecute-toleration --ignore-not-found
# Remove the taints from the nodes
kubectl taint nodes $NODE1 key=value:NoSchedule- another-key=another-value:NoExecute-
Configure resource requests and limits for Pods to ensure proper scheduling and resource allocation.
# Create a Pod with minimal resource requests
cat > ~/k8s-labs/scheduling/small-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: small-pod
spec:
containers:
- name: small-container
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/small-pod.yaml
# Create a Pod with medium resource requests
cat > ~/k8s-labs/scheduling/medium-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: medium-pod
spec:
containers:
- name: medium-container
image: nginx
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/medium-pod.yaml
# Create a Pod with large resource requests
cat > ~/k8s-labs/scheduling/large-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: large-pod
spec:
containers:
- name: large-container
image: nginx
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu: "2"
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/large-pod.yaml
# Check the status of the Pods
kubectl get pods
# Check where the Pods were scheduled
kubectl get pods -o wide
# Get detailed information on each Pod
kubectl describe pod small-pod
kubectl describe pod medium-pod
kubectl describe pod large-pod
# Check resource usage on the nodes
kubectl describe nodes | grep -A 5 "Allocated resources"
# Create a Pod that tries to use more memory than its limit
cat > ~/k8s-labs/scheduling/memory-stress-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: memory-stress-pod
spec:
containers:
- name: memory-stress
image: polinux/stress
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "200Mi"
cpu: "200m"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/memory-stress-pod.yaml
# Watch the Pod status (it should be OOMKilled)
kubectl get pods -w
# (Press Ctrl+C to exit the watch after a while)
# Check the events associated with the Pod
kubectl describe pod memory-stress-pod
# Create a Pod with resource requests that exceed node capacity
cat > ~/k8s-labs/scheduling/impossible-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: impossible-pod
spec:
containers:
- name: impossible-container
image: nginx
resources:
requests:
memory: "100Gi"
cpu: "100"
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/impossible-pod.yaml
# Check the status of the Pod (should remain in Pending state)
kubectl get pods
kubectl describe pod impossible-pod
# You should see events indicating that the Pod cannot be scheduled due to insufficient resources
# Create a Pod with multiple containers, each with its own resource specifications
cat > ~/k8s-labs/scheduling/multi-container-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: multi-container-pod
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
- name: redis
image: redis
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-labs/scheduling/multi-container-pod.yaml
# Check the status of the Pod
kubectl get pods
kubectl describe pod multi-container-pod
# The total resource request for the Pod is the sum of all containers' requests
# Delete all the created Pods
kubectl delete pod small-pod medium-pod large-pod memory-stress-pod impossible-pod multi-container-pod --ignore-not-found
You need to set up a log collection system where one container generates logs and another container processes them. Both containers need to share a volume for this purpose.
# Create a namespace for our exercises
kubectl create namespace storage-exercises
# Create a multi-container Pod with emptyDir volume
cat > ~/k8s-practice/storage/logging-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: logging-pod
namespace: storage-exercises
spec:
containers:
- name: log-generator
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
i=0;
while true;
do
echo "$(date) - Log entry $i" >> /var/log/shared/app.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: shared-logs
mountPath: /var/log/shared
- name: log-processor
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
tail -f /var/log/shared/app.log | while read line;
do
echo "Processed: $line" >> /var/log/shared/processed.log;
done
volumeMounts:
- name: shared-logs
mountPath: /var/log/shared
volumes:
- name: shared-logs
emptyDir: {}
EOF
# Apply the Pod configuration
kubectl apply -f ~/k8s-practice/storage/logging-pod.yaml
# Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/logging-pod -n storage-exercises
# Verify logs are being generated
kubectl exec -n storage-exercises logging-pod -c log-generator -- ls -la /var/log/shared
kubectl exec -n storage-exercises logging-pod -c log-generator -- cat /var/log/shared/app.log | tail -5
# Verify logs are being processed
kubectl exec -n storage-exercises logging-pod -c log-processor -- ls -la /var/log/shared
kubectl exec -n storage-exercises logging-pod -c log-processor -- cat /var/log/shared/processed.log | tail -5
# You can also check container logs
kubectl logs -n storage-exercises logging-pod -c log-generator
kubectl logs -n storage-exercises logging-pod -c log-processor
You need to understand the different access modes for PersistentVolumes and how they affect your applications.
# First, create directories on the node for our volumes
# Note: In a multi-node cluster, this would only work for RWO volumes
# RWX would require a proper network filesystem like NFS
sudo mkdir -p /mnt/data-rwo /mnt/data-rox /mnt/data-rwx
# Create PVs with different access modes
cat > ~/k8s-practice/storage/access-modes-pvs.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rwo
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: manual
hostPath:
path: /mnt/data-rwo
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rox
spec:
capacity:
storage: 1Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Recycle
storageClassName: manual
hostPath:
path: /mnt/data-rox
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rwx
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
storageClassName: manual
hostPath:
path: /mnt/data-rwx
EOF
kubectl apply -f ~/k8s-practice/storage/access-modes-pvs.yaml
# Create PVCs with matching access modes
cat > ~/k8s-practice/storage/access-modes-pvcs.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-rwo
namespace: storage-exercises
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: manual
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-rox
namespace: storage-exercises
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Gi
storageClassName: manual
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-rwx
namespace: storage-exercises
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: manual
EOF
kubectl apply -f ~/k8s-practice/storage/access-modes-pvcs.yaml
# Verify the PVCs are bound
kubectl get pvc -n storage-exercises
kubectl get pv
# Create Pods that use the PVCs
cat > ~/k8s-practice/storage/rwo-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: rwo-pod
namespace: storage-exercises
spec:
containers:
- name: container
image: nginx
volumeMounts:
- name: rwo-volume
mountPath: /usr/share/nginx/html
volumes:
- name: rwo-volume
persistentVolumeClaim:
claimName: pvc-rwo
EOF
kubectl apply -f ~/k8s-practice/storage/rwo-pod.yaml
cat > ~/k8s-practice/storage/rox-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: rox-pod
namespace: storage-exercises
spec:
containers:
- name: container
image: nginx
volumeMounts:
- name: rox-volume
mountPath: /usr/share/nginx/html
readOnly: true
volumes:
- name: rox-volume
persistentVolumeClaim:
claimName: pvc-rox
EOF
kubectl apply -f ~/k8s-practice/storage/rox-pod.yaml
cat > ~/k8s-practice/storage/rwx-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: rwx-pod
namespace: storage-exercises
spec:
containers:
- name: container
image: nginx
volumeMounts:
- name: rwx-volume
mountPath: /usr/share/nginx/html
volumes:
- name: rwx-volume
persistentVolumeClaim:
claimName: pvc-rwx
EOF
kubectl apply -f ~/k8s-practice/storage/rwx-pod.yaml
# Wait for the Pods to be running
kubectl wait --for=condition=Ready pod/rwo-pod -n storage-exercises
kubectl wait --for=condition=Ready pod/rox-pod -n storage-exercises
kubectl wait --for=condition=Ready pod/rwx-pod -n storage-exercises
# Test writing to each volume
kubectl exec -n storage-exercises rwo-pod -- sh -c 'echo "RWO test" > /usr/share/nginx/html/test.txt'
kubectl exec -n storage-exercises rox-pod -- sh -c 'echo "ROX test" > /usr/share/nginx/html/test.txt' || echo "Failed as expected - volume is read-only"
kubectl exec -n storage-exercises rwx-pod -- sh -c 'echo "RWX test" > /usr/share/nginx/html/test.txt'
# Verify the content
kubectl exec -n storage-exercises rwo-pod -- cat /usr/share/nginx/html/test.txt
kubectl exec -n storage-exercises rox-pod -- cat /usr/share/nginx/html/test.txt 2>/dev/null || echo "No file found in ROX volume"
kubectl exec -n storage-exercises rwx-pod -- cat /usr/share/nginx/html/test.txt
You need to deploy a highly available application that should be distributed evenly across different topology domains (nodes, zones, etc.) to ensure resilience.
# Create a namespace for topology exercises
kubectl create namespace topology-exercises
# Label nodes with topology information
# Get list of nodes
NODES=$(kubectl get nodes -o jsonpath='{.items[*].metadata.name}')
NODE_ARRAY=($NODES)
NODE_COUNT=${#NODE_ARRAY[@]}
# If you have multiple nodes, label them with different zones
# If you have only one node, we'll simulate multiple zones on the same node
if [ "$NODE_COUNT" -gt 1 ]; then
# Label half the nodes as zone-a and half as zone-b
HALF_COUNT=$((NODE_COUNT / 2))
for ((i=0; i<HALF_COUNT; i++)); do
kubectl label node ${NODE_ARRAY[$i]} topology.kubernetes.io/zone=zone-a
done
for ((i=HALF_COUNT; i<NODE_COUNT; i++)); do
kubectl label node ${NODE_ARRAY[$i]} topology.kubernetes.io/zone=zone-b
done
else
# With only one node, just label it as zone-a
kubectl label node ${NODE_ARRAY[0]} topology.kubernetes.io/zone=zone-a
fi
# Verify node labels
kubectl get nodes --show-labels | grep zone
# Create a Deployment with topology spread constraints
cat > ~/k8s-practice/scheduling/topology-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-topology
namespace: topology-exercises
spec:
replicas: 6
selector:
matchLabels:
app: webapp-topology
template:
metadata:
labels:
app: webapp-topology
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: webapp-topology
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: webapp-topology
containers:
- name: nginx
image: nginx:1.21
EOF
kubectl apply -f ~/k8s-practice/scheduling/topology-deployment.yaml
# Wait for the deployment
kubectl rollout status deployment/webapp-topology -n topology-exercises
# Verify the Pod distribution
kubectl get pods -n topology-exercises -o wide
# Analyze the distribution across zones
echo "Distribution across zones:"
kubectl get pods -n topology-exercises -o wide | grep -v NAME | awk '{print $7}' | sort | uniq -c
# Analyze the distribution across nodes
echo "Distribution across nodes:"
kubectl get pods -n topology-exercises -o wide | grep -v NAME | awk '{print $8}' | sort | uniq -c
# Examine a Pod's topology spread constraints
kubectl get pods -n topology-exercises -o name | head -1 | xargs kubectl describe -n topology-exercises
–
You have a critical application that requires high availability. You need to ensure that voluntary disruptions (like node drains or Pod evictions) don’t cause service outages.
# Create a namespace for PDB exercises
kubectl create namespace pdb-exercises
# Create a Deployment with multiple replicas
cat > ~/k8s-practice/scheduling/ha-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-webapp
namespace: pdb-exercises
spec:
replicas: 5
selector:
matchLabels:
app: ha-webapp
template:
metadata:
labels:
app: ha-webapp
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
EOF
kubectl apply -f ~/k8s-practice/scheduling/ha-deployment.yaml
# Wait for the deployment to be ready
kubectl rollout status deployment/ha-webapp -n pdb-exercises
# Create a Pod Disruption Budget
cat > ~/k8s-practice/scheduling/webapp-pdb.yaml << EOF
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ha-webapp-pdb
namespace: pdb-exercises
spec:
minAvailable: 4
selector:
matchLabels:
app: ha-webapp
EOF
kubectl apply -f ~/k8s-practice/scheduling/webapp-pdb.yaml
# Verify the PDB
kubectl get pdb -n pdb-exercises
kubectl describe pdb ha-webapp-pdb -n pdb-exercises
# Simulate node maintenance by cordoning and draining a node
# First, find a node running one of our Pods
POD_NODE=$(kubectl get pods -n pdb-exercises -l app=ha-webapp -o jsonpath='{.items[0].spec.nodeName}')
echo "Cordoning node $POD_NODE"
# Cordon the node (mark it as unschedulable)
kubectl cordon $POD_NODE
# Try to evict Pods from the node
echo "Attempting to drain node $POD_NODE"
kubectl drain $POD_NODE --ignore-daemonsets --delete-emptydir-data --force
# Check the status of our Pods
kubectl get pods -n pdb-exercises -o wide
# Verify the PDB status
kubectl get pdb -n pdb-exercises
# Uncordon the node
kubectl uncordon $POD_NODE
You have different types of workloads with varying levels of importance. You need to ensure that high-priority workloads get scheduled even when the cluster is under resource pressure.
# Create a namespace for priority exercises
kubectl create namespace priority-exercises
# Create PriorityClasses
cat > ~/k8s-practice/scheduling/priority-classes.yaml << EOF
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for high priority service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: medium-priority
value: 100000
globalDefault: false
description: "This priority class should be used for medium priority service pods."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 10000
globalDefault: true
description: "This priority class should be used for low priority service pods."
EOF
kubectl apply -f ~/k8s-practice/scheduling/priority-classes.yaml
# Verify PriorityClasses
kubectl get priorityclasses
# Create a low-priority deployment that consumes resources
cat > ~/k8s-practice/scheduling/low-priority-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: low-priority-app
namespace: priority-exercises
spec:
replicas: 10
selector:
matchLabels:
app: low-priority-app
template:
metadata:
labels:
app: low-priority-app
spec:
priorityClassName: low-priority
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "200Mi"
cpu: "200m"
EOF
kubectl apply -f ~/k8s-practice/scheduling/low-priority-deployment.yaml
# Wait for the deployment
kubectl rollout status deployment/low-priority-app -n priority-exercises
# Create a high-priority Pod with significant resource requests
cat > ~/k8s-practice/scheduling/high-priority-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: high-priority-pod
namespace: priority-exercises
spec:
priorityClassName: high-priority
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "500Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
EOF
kubectl apply -f ~/k8s-practice/scheduling/high-priority-pod.yaml
# Watch what happens to the Pods
kubectl get pods -n priority-exercises -o wide -w
# (Press Ctrl+C after observing for a while)
# Check which Pods were preempted
kubectl get events -n priority-exercises | grep -i preempt
# Check the status of the high-priority Pod
kubectl describe pod high-priority-pod -n priority-exercises
You need to provide scratch space for an application that requires more temporary storage than the container’s writable layer can offer, without persisting the data beyond the Pod’s lifecycle.
# Create a namespace for ephemeral volume exercises
kubectl create namespace ephemeral-exercises
# Create a Pod with an emptyDir volume with a size limit
cat > ~/k8s-practice/storage/ephemeral-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: ephemeral-pod
namespace: ephemeral-exercises
spec:
containers:
- name: main-container
image: ubuntu
command: ["sleep", "3600"]
volumeMounts:
- name: scratch-volume
mountPath: /scratch
volumes:
- name: scratch-volume
emptyDir:
sizeLimit: 500Mi # Limit the size of the volume
EOF
kubectl apply -f ~/k8s-practice/storage/ephemeral-pod.yaml
# Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/ephemeral-pod -n ephemeral-exercises
# Store data in the ephemeral volume
kubectl exec -n ephemeral-exercises ephemeral-pod -- sh -c 'dd if=/dev/zero of=/scratch/testfile bs=1M count=100'
kubectl exec -n ephemeral-exercises ephemeral-pod -- ls -lh /scratch/testfile
# Verify the data exists
kubectl exec -n ephemeral-exercises ephemeral-pod -- sh -c 'ls -la /scratch'
# Delete the Pod
kubectl delete pod ephemeral-pod -n ephemeral-exercises
# Recreate the Pod
kubectl apply -f ~/k8s-practice/storage/ephemeral-pod.yaml
# Wait for the Pod to be running
kubectl wait --for=condition=Ready pod/ephemeral-pod -n ephemeral-exercises
# Verify the data is gone
kubectl exec -n ephemeral-exercises ephemeral-pod -- sh -c 'ls -la /scratch'
After completing all exercises, clean up the resources to avoid unnecessary resource usage:
# Delete namespaces
kubectl delete namespace storage-exercises
kubectl delete namespace topology-exercises
kubectl delete namespace pdb-exercises
kubectl delete namespace priority-exercises
kubectl delete namespace ephemeral-exercises
# Delete PriorityClasses
kubectl delete priorityclass high-priority medium-priority low-priority
# Remove labels from nodes
kubectl label nodes --all topology.kubernetes.io/zone-
# Delete PVs
kubectl delete pv pv-rwo pv-rox pv-rwx
Troubleshooting Kubernetes clusters requires a systematic approach to identify and resolve issues across the distributed system. When problems occur, you need to methodically investigate the cluster’s health from multiple perspectives.
The first step in troubleshooting a cluster is to verify its overall health. This includes:
Control Plane Status: The control plane is the brain of your Kubernetes cluster. Checking its components ensures that the cluster can properly manage workloads.
Node Health: Worker nodes are where your applications run. Unhealthy nodes can cause application disruptions and scheduling issues.
API Server Accessibility: Since all operations go through the API server, ensuring it’s accessible is critical to troubleshooting efforts.
etcd Data Store Integrity: As the persistent store for all cluster data, etcd’s health directly impacts the stability of your entire cluster.
When troubleshooting node-related issues, you should investigate:
Kubelet Status: The kubelet is the primary node agent that ensures containers are running in a Pod. If it fails, the node cannot host workloads.
Container Runtime Health: Issues with container runtimes like containerd or CRI-O can prevent containers from starting or functioning correctly.
System Resource Utilization: CPU, memory, disk space, and I/O pressure can all affect node performance and stability.
Operating System Issues: Kernel parameters, file descriptor limits, network configuration, and other OS-level settings can impact node functionality.
Certificate Problems: Expired or invalid certificates can cause communication failures between nodes and the control plane.
Common problems that occur at the node level include:
Node NotReady Status: This indicates the kubelet is not reporting status to the API server, which can be caused by network issues, kubelet failures, or resource exhaustion.
Unexpected Pod Evictions: High resource utilization can trigger the kubelet to evict Pods to free up resources.
Node Pressure Conditions: These include DiskPressure, MemoryPressure, PIDPressure, and NetworkUnavailable, each requiring specific remediation approaches.
Container Runtime Issues: Problems such as image pull failures, container crash loops, or filesystem errors.
Networking Misconfigurations: Issues with the CNI plugin, Pod networking, or node-level firewall rules can disrupt cluster communications.
Each component in the Kubernetes control plane serves a specific purpose, and understanding their roles is essential for effective troubleshooting.
The API server is the front-end for the Kubernetes control plane and the gateway for all REST operations.
Common API Server Problems:
The controller manager runs controller processes that regulate the state of the cluster, such as node controller, replication controller, and endpoint controller.
Common Controller Manager Problems:
The scheduler watches for newly created Pods with no assigned node and selects a node for them to run on.
Common Scheduler Problems:
etcd is the distributed key-value store used by Kubernetes to store all cluster data.
Common etcd Problems:
The kubelet is an agent that runs on each node and ensures that containers are running in a Pod.
Common kubelet Problems:
kube-proxy maintains network rules on nodes to allow network communication to Pods from inside and outside the cluster.
Common kube-proxy Problems:
Effective monitoring is essential for both troubleshooting immediate issues and preventing future problems through proactive management.
Cluster monitoring provides insights into the health and performance of the Kubernetes cluster infrastructure.
Key Cluster Metrics:
Application monitoring focuses on the resource usage and performance of your workloads running in the cluster.
Key Application Metrics:
Understanding and properly configuring resource constraints is vital for cluster stability.
Quota and Limit Concepts:
Kubernetes provides built-in tools for basic monitoring, while ecosystem solutions offer more comprehensive capabilities.
Built-in Monitoring Tools:
Ecosystem Monitoring Solutions:
Effective log management is crucial for debugging applications and understanding system behavior in a Kubernetes environment.
Understanding how logs flow in Kubernetes helps troubleshoot logging-related issues effectively.
Logging Flow:
Kubernetes provides multiple methods to access container logs, each with different capabilities.
Log Access Methods:
Properly formatted logs are easier to parse, filter, and analyze.
Log Format Considerations:
In production environments, centralized logging systems are essential for effective troubleshooting.
Aggregation Solutions:
Kubernetes networking is complex, with multiple layers that can cause issues. Understanding these layers is key to effective troubleshooting.
Pod networking provides the foundation for application connectivity.
Common Pod Networking Problems:
Services provide stable endpoints for pods, but they can encounter various issues.
Common Service Issues:
Ingress resources manage external access to services, adding another layer of complexity.
Common Ingress Issues:
Network Policies provide security controls for pod communication, but can also cause connectivity issues when misconfigured.
Common Network Policy Issues:
DNS is critical for service discovery in Kubernetes but can encounter various problems.
Common DNS Issues:
External access to cluster services can be complex to troubleshoot.
Common External Connectivity Issues:
Learn how to identify and resolve common issues with Kubernetes nodes, including investigating node status, checking system resources, and diagnosing kubelet problems.
# Create a directory for our troubleshooting exercises
mkdir -p ~/k8s-labs/troubleshooting
# List all nodes and their status
kubectl get nodes
# Get detailed information about a specific node
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl describe node $NODE_NAME
# Check for node conditions (look for "True" vs "False" or "Unknown")
kubectl get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[*].type,CONDITION:.status.conditions[*].status
# Check Kubernetes events related to nodes
kubectl get events --field-selector involvedObject.kind=Node
# Output node resource usage statistics
kubectl top nodes
# SSH into the node (only if you have direct access to nodes)
# Alternatively, you can use a privileged debugging pod
# Create a debugging pod that runs on a specific node
cat > ~/k8s-labs/troubleshooting/node-debugger.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: node-debugger
spec:
containers:
- name: debugger
image: nicolaka/netshoot
command: ["sleep", "3600"]
securityContext:
privileged: true
nodeName: $NODE_NAME
hostNetwork: true
hostPID: true
hostIPC: true
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/node-debugger.yaml
# Wait for the debugging pod to be running
kubectl wait --for=condition=Ready pod/node-debugger
# Check kubelet process status
kubectl exec node-debugger -- systemctl status kubelet
# Check kubelet logs
kubectl exec node-debugger -- journalctl -u kubelet -n 100
# Check kubelet configuration
kubectl exec node-debugger -- cat /var/lib/kubelet/config.yaml
# Verify kubelet certificates
kubectl exec node-debugger -- ls -la /var/lib/kubelet/pki/
# Check if kubelet can reach the API server
kubectl exec node-debugger -- curl -k https://kubernetes.default.svc:443/healthz
# Check CPU, memory, and disk usage on the node
kubectl exec node-debugger -- top -b -n 1
# Check disk usage
kubectl exec node-debugger -- df -h
# Check memory usage details
kubectl exec node-debugger -- free -h
# Check for processes consuming high resources
kubectl exec node-debugger -- ps aux --sort=-%cpu | head -10
kubectl exec node-debugger -- ps aux --sort=-%mem | head -10
# Check system load
kubectl exec node-debugger -- cat /proc/loadavg
# Check if there are any disk I/O issues
kubectl exec node-debugger -- iostat -x 1 5
# Simulate a "NotReady" node scenario (Only do this in a test environment!)
echo "Note: This will temporarily make your node unavailable - only do this in a test environment"
echo "Press Enter to continue or Ctrl+C to abort"
read
# Option 1: Stop the kubelet service
kubectl exec node-debugger -- systemctl stop kubelet
# Verify the node status (should become NotReady after about 30-60 seconds)
kubectl get nodes
kubectl describe node $NODE_NAME
# Resolve the issue by restarting the kubelet
kubectl exec node-debugger -- systemctl start kubelet
# Verify the node returns to Ready status
kubectl get nodes -w
# (Press Ctrl+C when the node shows Ready)
# Option 2: Simulate disk pressure
echo "Simulating disk pressure..."
kubectl exec node-debugger -- mkdir -p /tmp/large-files
kubectl exec node-debugger -- dd if=/dev/zero of=/tmp/large-files/big-file bs=1M count=1000
# Check if the node reports disk pressure
kubectl describe node $NODE_NAME | grep DiskPressure
# Resolve the disk pressure issue
kubectl exec node-debugger -- rm -f /tmp/large-files/big-file
# Delete the debugging pod
kubectl delete pod node-debugger
# Ensure the node is in a healthy state
kubectl get nodes
Learn how to identify and resolve issues with Kubernetes control plane components, and understand how component failures affect cluster operations.
# List all pods in the kube-system namespace
kubectl get pods -n kube-system
# Check the status of control plane components
kubectl get pods -n kube-system -l tier=control-plane -o wide
# Create a function to quickly check control plane logs
view_cp_logs() {
local component=$1
local lines=${2:-50}
local pod=$(kubectl get pods -n kube-system -l component=$component -o jsonpath='{.items[0].metadata.name}')
echo "Viewing logs for $component ($pod)..."
kubectl logs -n kube-system $pod --tail=$lines
}
# Check logs for each control plane component
view_cp_logs kube-apiserver
view_cp_logs kube-controller-manager
view_cp_logs kube-scheduler
view_cp_logs etcd
# Get detailed information about the API server pod
APISERVER_POD=$(kubectl get pods -n kube-system -l component=kube-apiserver -o jsonpath='{.items[0].metadata.name}')
kubectl describe pod $APISERVER_POD -n kube-system
# Check API server's health
kubectl get --raw /healthz
kubectl get --raw /healthz/etcd
kubectl get --raw /livez
kubectl get --raw /readyz
# Examine API server metrics
kubectl get --raw /metrics | grep apiserver_request_total
# Create a debugging pod on the control plane node (if running kubeadm)
CONTROL_PLANE_NODE=$(kubectl get pods -n kube-system $APISERVER_POD -o jsonpath='{.spec.nodeName}')
cat > ~/k8s-labs/troubleshooting/control-plane-debugger.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: control-plane-debugger
spec:
containers:
- name: debugger
image: nicolaka/netshoot
command: ["sleep", "3600"]
securityContext:
privileged: true
nodeName: $CONTROL_PLANE_NODE
hostNetwork: true
hostPID: true
hostIPC: true
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/control-plane-debugger.yaml
# Wait for the debugging pod to be running
kubectl wait --for=condition=Ready pod/control-plane-debugger
# Check if API server manifest has correct settings (for kubeadm clusters)
kubectl exec control-plane-debugger -- cat /etc/kubernetes/manifests/kube-apiserver.yaml
# Check API server certificates
kubectl exec control-plane-debugger -- ls -la /etc/kubernetes/pki/
# Get detailed information about the etcd pod
ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
kubectl describe pod $ETCD_POD -n kube-system
# Check etcd health using etcdctl
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key \
endpoint health
# Check etcd member list
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key \
member list
# Check etcd disk usage
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key \
endpoint status -w table
# Get information about the controller manager
CONTROLLER_POD=$(kubectl get pods -n kube-system -l component=kube-controller-manager -o jsonpath='{.items[0].metadata.name}')
kubectl describe pod $CONTROLLER_POD -n kube-system
# Check controller manager logs
kubectl logs -n kube-system $CONTROLLER_POD
# Get information about the scheduler
SCHEDULER_POD=$(kubectl get pods -n kube-system -l component=kube-scheduler -o jsonpath='{.items[0].metadata.name}')
kubectl describe pod $SCHEDULER_POD -n kube-system
# Check scheduler logs
kubectl logs -n kube-system $SCHEDULER_POD
# For kubeadm clusters, you can check manifest files
kubectl exec control-plane-debugger -- cat /etc/kubernetes/manifests/kube-controller-manager.yaml
kubectl exec control-plane-debugger -- cat /etc/kubernetes/manifests/kube-scheduler.yaml
# IMPORTANT: Only perform these steps in a test environment!
echo "Note: This will temporarily disrupt your control plane - only do this in a test environment"
echo "Press Enter to continue or Ctrl+C to abort"
read
# Simulate API server failure by moving its manifest
kubectl exec control-plane-debugger -- mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
# Wait for the API server to go down (the command should eventually fail)
kubectl get pods -n kube-system
# Restore the API server
kubectl exec control-plane-debugger -- mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
# Wait for the API server to recover
echo "Waiting for API server to recover..."
sleep 30
kubectl get pods -n kube-system
# Delete the debugging pod
kubectl delete pod control-plane-debugger
# Ensure all control plane components are running
kubectl get pods -n kube-system -l tier=control-plane
Learn how to monitor resource usage in a Kubernetes cluster, understand resource constraints, and troubleshoot resource-related issues.
# Check if metrics-server is installed
kubectl get deployments -n kube-system metrics-server 2>/dev/null || echo "metrics-server not found"
# If metrics-server is not installed, install it
if ! kubectl get deployments -n kube-system metrics-server 2>/dev/null; then
echo "Installing metrics-server..."
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Wait for metrics-server to be ready
kubectl rollout status deployment metrics-server -n kube-system
fi
# Verify metrics-server is working
kubectl top nodes || echo "Waiting for metrics-server to collect data..."
sleep 15
kubectl top nodes
# Create a namespace for our monitoring exercise
kubectl create namespace monitoring-demo
# Deploy resource-consuming pods
cat > ~/k8s-labs/troubleshooting/resource-consumer.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: resource-consumer
namespace: monitoring-demo
spec:
replicas: 3
selector:
matchLabels:
app: resource-consumer
template:
metadata:
labels:
app: resource-consumer
spec:
containers:
- name: stress
image: polinux/stress
resources:
requests:
memory: "50Mi"
cpu: "100m"
limits:
memory: "100Mi"
cpu: "200m"
command: ["stress"]
args: ["--cpu", "1", "--vm", "1", "--vm-bytes", "50M", "--timeout", "600s"]
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/resource-consumer.yaml
# Wait for pods to be running
kubectl rollout status deployment/resource-consumer -n monitoring-demo
# Monitor node resource usage
kubectl top nodes
# Monitor pod resource usage
kubectl top pods -n monitoring-demo
# Get detailed resource usage for pods
kubectl top pods -n monitoring-demo --containers
# Check resource usage trends (run multiple times)
echo "Checking resource usage trends..."
for i in {1..5}; do
echo "Check $i:"
kubectl top pods -n monitoring-demo
sleep 5
done
# Deploy pods with different QoS classes
cat > ~/k8s-labs/troubleshooting/qos-pods.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: guaranteed-pod
namespace: monitoring-demo
spec:
containers:
- name: guaranteed
image: nginx
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "100Mi"
cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
name: burstable-pod
namespace: monitoring-demo
spec:
containers:
- name: burstable
image: nginx
resources:
requests:
memory: "50Mi"
cpu: "50m"
limits:
memory: "100Mi"
cpu: "100m"
---
apiVersion: v1
kind: Pod
metadata:
name: besteffort-pod
namespace: monitoring-demo
spec:
containers:
- name: besteffort
image: nginx
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/qos-pods.yaml
# Wait for pods to be running
kubectl wait --for=condition=Ready pod/guaranteed-pod -n monitoring-demo
kubectl wait --for=condition=Ready pod/burstable-pod -n monitoring-demo
kubectl wait --for=condition=Ready pod/besteffort-pod -n monitoring-demo
# Check QoS classes for each pod
kubectl get pod guaranteed-pod -n monitoring-demo -o jsonpath='{.status.qosClass}'; echo
kubectl get pod burstable-pod -n monitoring-demo -o jsonpath='{.status.qosClass}'; echo
kubectl get pod besteffort-pod -n monitoring-demo -o jsonpath='{.status.qosClass}'; echo
# Create a namespace with resource quota
kubectl create namespace quota-demo
# Create a resource quota
cat > ~/k8s-labs/troubleshooting/resource-quota.yaml << EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: quota-demo
spec:
hard:
pods: "5"
requests.cpu: "500m"
requests.memory: "500Mi"
limits.cpu: "1"
limits.memory: "1Gi"
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/resource-quota.yaml
# Check the quota status
kubectl describe resourcequota compute-quota -n quota-demo
# Create pods that fit within the quota
cat > ~/k8s-labs/troubleshooting/quota-pods.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: quota-test
namespace: quota-demo
spec:
replicas: 2
selector:
matchLabels:
app: quota-test
template:
metadata:
labels:
app: quota-test
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
memory: "100Mi"
cpu: "100m"
limits:
memory: "200Mi"
cpu: "200m"
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/quota-pods.yaml
# Check the quota usage
kubectl describe resourcequota compute-quota -n quota-demo
# Try to exceed the quota
kubectl scale deployment quota-test -n quota-demo --replicas=6
# Check if the scaling was limited by the quota
kubectl get pods -n quota-demo
kubectl describe deployment quota-test -n quota-demo
kubectl get events -n quota-demo | grep -i quota
# Create a pod that exceeds its memory limits
cat > ~/k8s-labs/troubleshooting/memory-hogger.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: memory-hogger
namespace: monitoring-demo
spec:
containers:
- name: memory-hogger
image: polinux/stress
resources:
requests:
memory: "50Mi"
limits:
memory: "100Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "60"]
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/memory-hogger.yaml
# Watch the pod status
kubectl get pod memory-hogger -n monitoring-demo -w
# (Wait until you see the pod restart, then press Ctrl+C)
# Check the pod events and status
kubectl describe pod memory-hogger -n monitoring-demo
# Check the container restart count
kubectl get pod memory-hogger -n monitoring-demo -o jsonpath='{.status.containerStatuses[0].restartCount}'; echo
# Check the last state termination reason
kubectl get pod memory-hogger -n monitoring-demo -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'; echo
# Delete the namespaces and resources
kubectl delete namespace monitoring-demo
kubectl delete namespace quota-demo
Learn how to access, analyze, and troubleshoot application logs and container output streams in Kubernetes.
# Create a namespace for our logging exercises
kubectl create namespace logging-demo
# Deploy a simple application that generates logs
cat > ~/k8s-labs/troubleshooting/logging-app.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: logging-app
namespace: logging-demo
spec:
replicas: 1
selector:
matchLabels:
app: logging-app
template:
metadata:
labels:
app: logging-app
spec:
containers:
- name: counter
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
i=0;
while true;
do
echo "$(date) - INFO: Counter: $i";
if [ $((i % 10)) -eq 0 ]; then
echo "$(date) - ERROR: Simulated error at count $i" >&2;
fi
i=$((i+1));
sleep 1;
done
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/logging-app.yaml
# Wait for the pod to be running
kubectl rollout status deployment/logging-app -n logging-demo
# Get the pod name
LOGGING_POD=$(kubectl get pods -n logging-demo -l app=logging-app -o jsonpath='{.items[0].metadata.name}')
# View the most recent logs
kubectl logs -n logging-demo $LOGGING_POD
# Follow the logs in real-time
kubectl logs -n logging-demo $LOGGING_POD -f
# (Press Ctrl+C after a few seconds)
# View only error logs
kubectl logs -n logging-demo $LOGGING_POD | grep ERROR
# View logs with timestamps
kubectl logs -n logging-demo $LOGGING_POD --timestamps
# Specify the number of lines to show
kubectl logs -n logging-demo $LOGGING_POD --tail=20
# Deploy a pod with multiple containers
cat > ~/k8s-labs/troubleshooting/multi-container-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: multi-container-pod
namespace: logging-demo
spec:
containers:
- name: web
image: nginx
ports:
- containerPort: 80
- name: logger
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
while true;
do
echo "$(date) - Logger container is running";
sleep 5;
done
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/multi-container-pod.yaml
# Wait for the pod to be running
kubectl wait --for=condition=Ready pod/multi-container-pod -n logging-demo
# View logs from a specific container
kubectl logs -n logging-demo multi-container-pod -c web
kubectl logs -n logging-demo multi-container-pod -c logger
# Follow logs from multiple containers
kubectl logs -n logging-demo multi-container-pod --all-containers=true -f
# (Press Ctrl+C after a few seconds)
# Deploy a pod with a startup problem
cat > ~/k8s-labs/troubleshooting/startup-problem.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: startup-problem
namespace: logging-demo
spec:
containers:
- name: nginx
image: nginx
command: ["/bin/sh", "-c"]
args:
- >
echo "Starting up...";
echo "Looking for configuration...";
cat /config/nginx.conf;
echo "Failed to find configuration, exiting!";
exit 1;
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/startup-problem.yaml
# Check the pod status
kubectl get pod startup-problem -n logging-demo
# View logs from the failing container
kubectl logs -n logging-demo startup-problem
# Get detailed information about the pod
kubectl describe pod startup-problem -n logging-demo
# Check the container status and restart count
kubectl get pod startup-problem -n logging-demo -o jsonpath='{.status.containerStatuses[0].restartCount}'; echo
# Fix the startup problem by providing the missing configuration
cat > ~/k8s-labs/troubleshooting/startup-fixed.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-config
namespace: logging-demo
data:
nginx.conf: |
events {
worker_connections 1024;
}
http {
server {
listen 80;
location / {
return 200 'Hello, Kubernetes!';
}
}
}
---
apiVersion: v1
kind: Pod
metadata:
name: startup-fixed
namespace: logging-demo
spec:
containers:
- name: nginx
image: nginx
command: ["/bin/sh", "-c"]
args:
- >
echo "Starting up...";
echo "Looking for configuration...";
cat /config/nginx.conf;
echo "Found configuration, starting nginx!";
nginx -c /config/nginx.conf -g "daemon off;";
volumeMounts:
- name: config-volume
mountPath: /config
volumes:
- name: config-volume
configMap:
name: nginx-config
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/startup-fixed.yaml
# Check the pod status
kubectl get pod startup-fixed -n logging-demo
# View logs from the fixed container
kubectl logs -n logging-demo startup-fixed
# Deploy a pod that crashes after a while
cat > ~/k8s-labs/troubleshooting/crashing-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: crashing-pod
namespace: logging-demo
spec:
containers:
- name: crasher
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
echo "Starting up...";
for i in \$(seq 1 5); do
echo "Running iteration \$i";
sleep 2;
done;
echo "Simulating a crash now!";
exit 1;
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/crashing-pod.yaml
# Watch the pod status
kubectl get pod crashing-pod -n logging-demo -w
# (Wait until you see the pod restart a few times, then press Ctrl+C)
# View previous container logs
kubectl logs -n logging-demo crashing-pod --previous
# Get detailed information about the pod
kubectl describe pod crashing-pod -n logging-demo
# Check the container status, state, and reason
kubectl get pod crashing-pod -n logging-demo -o jsonpath='{.status.containerStatuses[0].state}'; echo
# Deploy a pod with init containers
cat > ~/k8s-labs/troubleshooting/init-container-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: init-container-pod
namespace: logging-demo
spec:
initContainers:
- name: init-db-check
image: busybox
command: ["/bin/sh", "-c"]
args:
- >
echo "Checking if database is available...";
if nc -z database-service 3306; then
echo "Database is available!";
exit 0;
else
echo "Database is not available, failing...";
exit 1;
fi
containers:
- name: app
image: nginx
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/init-container-pod.yaml
# Check the pod status
kubectl get pod init-container-pod -n logging-demo
# View logs from the init container
kubectl logs -n logging-demo init-container-pod -c init-db-check
# Get detailed information about the pod
kubectl describe pod init-container-pod -n logging-demo
# Create the missing service to fix the init container
cat > ~/k8s-labs/troubleshooting/database-service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: database-service
namespace: logging-demo
spec:
ports:
- port: 3306
targetPort: 3306
selector:
app: mysql
---
apiVersion: v1
kind: Pod
metadata:
name: mysql-pod
namespace: logging-demo
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/database-service.yaml
# Delete and recreate the init container pod
kubectl delete pod init-container-pod -n logging-demo
kubectl apply -f ~/k8s-labs/troubleshooting/init-container-pod.yaml
# Watch the pod status
kubectl get pod init-container-pod -n logging-demo -w
# (Wait until the pod is Running, then press Ctrl+C)
# Delete the namespace and resources
kubectl delete namespace logging-demo
Learn how to identify and resolve common networking and service-related issues in Kubernetes.
# Create a namespace for our networking exercises
kubectl create namespace networking-demo
# Deploy a simple web application
cat > ~/k8s-labs/troubleshooting/web-app.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: networking-demo
spec:
replicas: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: web-service
namespace: networking-demo
spec:
selector:
app: web-app
ports:
- port: 80
targetPort: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/web-app.yaml
# Wait for the deployment to be ready
kubectl rollout status deployment/web-app -n networking-demo
# Create a debugging pod to test service connectivity
cat > ~/k8s-labs/troubleshooting/debug-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
namespace: networking-demo
spec:
containers:
- name: debug-container
image: nicolaka/netshoot
command: ["sleep", "3600"]
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/debug-pod.yaml
# Wait for the debug pod to be running
kubectl wait --for=condition=Ready pod/debug-pod -n networking-demo
# Test DNS resolution
kubectl exec -n networking-demo debug-pod -- nslookup kubernetes.default.svc.cluster.local
kubectl exec -n networking-demo debug-pod -- nslookup web-service.networking-demo.svc.cluster.local
# Test service connectivity
kubectl exec -n networking-demo debug-pod -- curl -I web-service.networking-demo.svc.cluster.local
# Check service details
kubectl get service web-service -n networking-demo
kubectl describe service web-service -n networking-demo
# Check endpoints
kubectl get endpoints web-service -n networking-demo
# Deploy a service with incorrect selector
cat > ~/k8s-labs/troubleshooting/mismatched-service.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-app
namespace: networking-demo
spec:
replicas: 2
selector:
matchLabels:
app: backend-app
template:
metadata:
labels:
app: backend-app
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
namespace: networking-demo
spec:
selector:
app: backend # Mismatched selector (should be backend-app)
ports:
- port: 80
targetPort: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/mismatched-service.yaml
# Check service and endpoints
kubectl get service backend-service -n networking-demo
kubectl get endpoints backend-service -n networking-demo
# Test connectivity to the mismatched service
kubectl exec -n networking-demo debug-pod -- curl -I backend-service.networking-demo.svc.cluster.local --connect-timeout 5 || echo "Connection failed"
# Fix the service selector
kubectl patch service backend-service -n networking-demo -p '{"spec":{"selector":{"app":"backend-app"}}}'
# Verify the endpoints are now correctly populated
kubectl get endpoints backend-service -n networking-demo
# Test connectivity again
kubectl exec -n networking-demo debug-pod -- curl -I backend-service.networking-demo.svc.cluster.local
# Create pods in different namespaces to test cross-namespace communication
kubectl create namespace secondary-namespace
# Create a pod in the secondary namespace
cat > ~/k8s-labs/troubleshooting/secondary-pod.yaml << EOF
apiVersion: v1
kind: Pod
metadata:
name: secondary-pod
namespace: secondary-namespace
labels:
app: secondary-app
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: secondary-service
namespace: secondary-namespace
spec:
selector:
app: secondary-app
ports:
- port: 80
targetPort: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/secondary-pod.yaml
# Wait for the pod to be running
kubectl wait --for=condition=Ready pod/secondary-pod -n secondary-namespace
# Test cross-namespace DNS resolution and connectivity
kubectl exec -n networking-demo debug-pod -- nslookup secondary-service.secondary-namespace.svc.cluster.local
kubectl exec -n networking-demo debug-pod -- curl -I secondary-service.secondary-namespace.svc.cluster.local
# Create a default deny network policy
cat > ~/k8s-labs/troubleshooting/default-deny.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: networking-demo
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/default-deny.yaml
# Test connectivity after applying the deny policy
kubectl exec -n networking-demo debug-pod -- curl -I web-service.networking-demo.svc.cluster.local --connect-timeout 5 || echo "Connection failed"
kubectl exec -n networking-demo debug-pod -- curl -I secondary-service.secondary-namespace.svc.cluster.local --connect-timeout 5 || echo "Connection failed"
# Create a policy to allow specific communication
cat > ~/k8s-labs/troubleshooting/allow-web-policy.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-web-access
namespace: networking-demo
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
ingress:
- from:
- podSelector: {}
ports:
- protocol: TCP
port: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/allow-web-policy.yaml
cat > ~/k8s-labs/troubleshooting/allow-debug-egress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-debug-egress
namespace: networking-demo
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Egress
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 80
- protocol: TCP
port: 443
- protocol: TCP
port: 53
- protocol: UDP
port: 53
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/allow-debug-egress.yaml
# Label the debug pod to match the egress policy
kubectl label pod debug-pod -n networking-demo app=web-app
# Test connectivity after applying the allow policies
kubectl exec -n networking-demo debug-pod -- curl -I web-service.networking-demo.svc.cluster.local
kubectl exec -n networking-demo debug-pod -- curl -I secondary-service.secondary-namespace.svc.cluster.local
# Deploy an ingress controller if not already installed
# Note: This uses nginx-ingress controller, but the actual controller may vary
# based on your environment
kubectl get namespace ingress-nginx 2>/dev/null || kubectl create namespace ingress-nginx
if ! kubectl get deployment -n ingress-nginx ingress-nginx-controller 2>/dev/null; then
echo "Installing NGINX Ingress Controller..."
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.0/deploy/static/provider/cloud/deploy.yaml
# Wait for the ingress controller to be ready
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=120s
fi
# Create an ingress resource
cat > ~/k8s-labs/troubleshooting/web-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
namespace: networking-demo
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: web-app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/web-ingress.yaml
# Check the ingress status
kubectl get ingress -n networking-demo
kubectl describe ingress web-ingress -n networking-demo
# Test the ingress (requires external connectivity, may not work in all environments)
INGRESS_IP=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [ -n "$INGRESS_IP" ]; then
curl -H "Host: web-app.example.com" http://$INGRESS_IP/
else
echo "Ingress IP not available. If using Minikube, use 'minikube tunnel' in another terminal."
fi
# Create an ingress with a service that doesn't exist
cat > ~/k8s-labs/troubleshooting/broken-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: broken-ingress
namespace: networking-demo
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: broken.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nonexistent-service
port:
number: 80
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/broken-ingress.yaml
# Check the ingress status
kubectl get ingress broken-ingress -n networking-demo
kubectl describe ingress broken-ingress -n networking-demo
# Check CoreDNS deployment status
kubectl get deployment -n kube-system coredns
kubectl describe deployment -n kube-system coredns
# Check CoreDNS pod logs
COREDNS_POD=$(kubectl get pods -n kube-system -l k8s-app=kube-dns -o jsonpath='{.items[0].metadata.name}')
kubectl logs -n kube-system $COREDNS_POD
# Check CoreDNS ConfigMap
kubectl get configmap -n kube-system coredns -o yaml
# Test DNS resolution from a pod
kubectl exec -n networking-demo debug-pod -- nslookup kubernetes.default.svc.cluster.local
kubectl exec -n networking-demo debug-pod -- nslookup web-service.networking-demo.svc.cluster.local
# Test DNS resolution for a non-existent service
kubectl exec -n networking-demo debug-pod -- nslookup nonexistent-service.networking-demo.svc.cluster.local || echo "Not found, as expected"
# Create a custom DNS entry using a ConfigMap
cat > ~/k8s-labs/troubleshooting/custom-dns.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
example.server: |
example.com:53 {
errors
cache 30
forward . 8.8.8.8
}
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/custom-dns.yaml
# Restart CoreDNS to apply the custom config (only in test environments)
kubectl rollout restart deployment -n kube-system coredns
# Wait for CoreDNS to be ready
kubectl rollout status deployment -n kube-system coredns
# Delete all the resources
kubectl delete namespace networking-demo
kubectl delete namespace secondary-namespace
The following practice exercises build upon our troubleshooting labs to help you develop the diagnostic skills essential for the CKA exam. Each exercise presents a realistic scenario you might encounter as a Kubernetes administrator.
You’re the Kubernetes administrator for a production cluster. Users report they can’t access the cluster with kubectl commands, receiving connection refused errors. You need to diagnose and fix the issue with the API server.
# First, verify the cluster is indeed inaccessible
kubectl get nodes
# Check the API server pod specifically
# If kubectl isn't working, SSH to the control plane node
ssh user@control-plane-node
# Check if the API server pod is running (for kubeadm-based clusters)
sudo crictl ps | grep kube-apiserver
# If the API server container isn't running, check static pod manifests
ls -la /etc/kubernetes/manifests/kube-apiserver.yaml
# Check API server logs
sudo crictl logs $(sudo crictl ps -a | grep kube-apiserver | awk '{print $1}')
# Common issues to look for:
# 1. Certificate problems
sudo ls -la /etc/kubernetes/pki/apiserver*
# 2. Configuration issues in the manifest
sudo cat /etc/kubernetes/manifests/kube-apiserver.yaml
# 3. Check if etcd is running (API server depends on it)
sudo crictl ps | grep etcd
sudo crictl logs $(sudo crictl ps | grep etcd | awk '{print $1}')
# 4. Check process status at the node level
sudo systemctl status kubelet
# Fix common issues:
# If the manifest has syntax errors, correct them
sudo vi /etc/kubernetes/manifests/kube-apiserver.yaml
# If certificates are expired or corrupt, you may need to regenerate them
# For kubeadm clusters:
sudo kubeadm init phase certs apiserver
# After fixing the issue, restart the kubelet to pick up changes
sudo systemctl restart kubelet
# Verify that the API server is now running
sudo crictl ps | grep kube-apiserver
# Exit the control plane node and test from your workstation
exit
kubectl get nodes
Your application team has requested to scale a critical deployment from 3 to 10 replicas, but the scaling operation seems stuck. The deployment shows the desired count as 10, but the current count remains at 3.
# First, confirm the issue by checking the deployment
kubectl get deployment critical-app -n production
# Check the replica status and events
kubectl describe deployment critical-app -n production
# Look for events indicating why pods aren't being scheduled
kubectl get events -n production | grep -i critical-app
# Check if any pods are stuck in Pending state
kubectl get pods -n production | grep critical-app
# If pods are pending, examine one to see the exact reason
PENDING_POD=$(kubectl get pods -n production | grep critical-app | grep Pending | head -1 | awk '{print $1}')
kubectl describe pod $PENDING_POD -n production
# Check for resource constraints at the node level
kubectl get nodes
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check for resource quotas at the namespace level
kubectl get resourcequota -n production
kubectl describe resourcequota -n production
# Possible solutions based on cause:
# 1. If there's a node resource constraint, add more nodes or resize existing ones
# For cloud environments:
# - Scale your node group or cluster
# - For managed K8s: Use cloud console or CLI to add nodes
# 2. If there's a quota issue, increase the quota
kubectl edit resourcequota compute-quota -n production
# Modify the spec.hard values as needed
# 3. If there's a scheduling issue due to node selectors, taints, or affinity rules
# Check the deployment's pod template:
kubectl get deployment critical-app -n production -o yaml | grep -A 20 template:
# You might need to modify node selectors, tolerations, or affinity rules:
kubectl edit deployment critical-app -n production
# 4. If there are pod disruption budgets preventing scale-down
kubectl get pdb -n production
kubectl describe pdb -n production
# After fixing the issue, check if pods are now scaling properly
kubectl get deployment critical-app -n production -w
# (Press Ctrl+C after confirming)
A newly deployed application is continuously crashing and restarting. The deployment shows all pods are running, but users can’t access the application. You need to diagnose and fix the root cause of the crashes.
# First, check the status of the pods in the deployment
kubectl get pods -n application
# Look for pods with high restart counts
kubectl get pods -n application -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,RESTARTS:.status.containerStatuses[0].restartCount
# Identify a problematic pod
PROBLEM_POD=$(kubectl get pods -n application | grep -v NAME | sort -rk4 | head -1 | awk '{print $1}')
# Check the pod details
kubectl describe pod $PROBLEM_POD -n application
# Check the current container logs
kubectl logs $PROBLEM_POD -n application
# If the container just restarted, check the previous container's logs
kubectl logs $PROBLEM_POD -n application --previous
# Check container startup and liveness probe configurations
kubectl get pod $PROBLEM_POD -n application -o yaml | grep -A 10 livenessProbe
kubectl get pod $PROBLEM_POD -n application -o yaml | grep -A 10 startupProbe
kubectl get pod $PROBLEM_POD -n application -o yaml | grep -A 10 readinessProbe
# Common issues to look for:
# 1. Configuration errors (missing files or environment variables)
# 2. Incorrect startup commands
# 3. Resource constraints (OOMKilled)
# 4. Incorrect image versions
# 5. Liveness/readiness probes that fail
# Sample fixes based on potential issues:
# If it's a configuration issue, create or fix a ConfigMap:
kubectl create configmap app-config -n application --from-file=config.json
# If it's a resource issue, increase limits:
kubectl edit deployment app-deployment -n application
# Modify the resources.limits section
# If it's a probe issue, modify the probe configuration:
kubectl edit deployment app-deployment -n application
# Adjust the livenessProbe or readinessProbe parameters
# If it's an image issue, update to a correct image:
kubectl set image deployment/app-deployment app=myapp:stable -n application
# After fixing, monitor the pods to ensure they're stable
kubectl get pods -n application -w
# (Press Ctrl+C after confirming stability)
# Verify application functionality
kubectl port-forward service/app-service 8080:80 -n application
# In another terminal: curl localhost:8080
Applications in your cluster are reporting connection timeouts when trying to reach a backend service. The service and pods appear to be running correctly, but connectivity is failing.
# First, check the service configuration
kubectl get service backend-service -n application
# Examine the service details, particularly the selector and endpoints
kubectl describe service backend-service -n application
# Check if endpoints are configured correctly
kubectl get endpoints backend-service -n application
# If no endpoints are found, verify pod labels match the service selector
kubectl get pods -n application --show-labels | grep backend
# Check for network policies that might block traffic
kubectl get networkpolicy -n application
kubectl describe networkpolicy -n application
# Create a debugging pod to test connectivity
kubectl run -n application debug-pod --image=nicolaka/netshoot --restart=Never -- sleep 3600
# Wait for the pod to be ready
kubectl wait --for=condition=Ready pod/debug-pod -n application
# Test DNS resolution
kubectl exec -n application debug-pod -- nslookup backend-service
kubectl exec -n application debug-pod -- nslookup backend-service.application.svc.cluster.local
# Test direct connection to endpoints
kubectl exec -n application debug-pod -- curl -v backend-service:80
ENDPOINT_IP=$(kubectl get endpoints backend-service -n application -o jsonpath='{.subsets[0].addresses[0].ip}')
ENDPOINT_PORT=$(kubectl get endpoints backend-service -n application -o jsonpath='{.subsets[0].ports[0].port}')
kubectl exec -n application debug-pod -- curl -v $ENDPOINT_IP:$ENDPOINT_PORT
# Check kube-proxy status on the nodes
kubectl get pods -n kube-system | grep kube-proxy
PROXY_POD=$(kubectl get pods -n kube-system | grep kube-proxy | head -1 | awk '{print $1}')
kubectl logs $PROXY_POD -n kube-system
# Possible fixes based on identified issues:
# 1. If the service selector doesn't match pod labels, update the service:
kubectl patch service backend-service -n application -p '{"spec":{"selector":{"app":"backend"}}}'
# 2. If network policies are blocking traffic, create or modify a policy:
cat > ~/allow-traffic.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-service-traffic
namespace: application
spec:
podSelector:
matchLabels:
app: backend
ingress:
- from:
- podSelector: {}
ports:
- protocol: TCP
port: 80
EOF
kubectl apply -f ~/allow-traffic.yaml
# 3. If kube-proxy has issues, restart it:
kubectl rollout restart daemonset kube-proxy -n kube-system
# After fixing, verify connectivity again
kubectl exec -n application debug-pod -- curl -v backend-service:80
# Clean up the debug pod
kubectl delete pod debug-pod -n application
Your development team has created a StatefulSet that requires persistent storage, but the Persistent Volume Claims (PVCs) are stuck in a Pending state, preventing the application from starting properly.
# First, check the status of the PVCs
kubectl get pvc -n development
# Examine the details of a pending PVC
PENDING_PVC=$(kubectl get pvc -n development | grep Pending | head -1 | awk '{print $1}')
kubectl describe pvc $PENDING_PVC -n development
# Check available Persistent Volumes
kubectl get pv
# Check storage classes
kubectl get storageclass
kubectl describe storageclass standard
# Check if there's a default storage class
kubectl get storageclass -o custom-columns=NAME:.metadata.name,DEFAULT:.metadata.annotations.storageclass\\.kubernetes\\.io/is-default-class
# Common issues to investigate:
# 1. No matching PVs available for the PVC
# Check the PVC access mode, storage size, and other requirements
# 2. No storage class specified and no default class exists
# 3. Storage class exists but provisioner isn't working
# 4. Volumebindingmode is set to WaitForFirstConsumer
# Possible fixes based on identified issues:
# 1. If no matching PVs exist, create a PV manually:
cat > ~/k8s-labs/troubleshooting/manual-pv.yaml << EOF
apiVersion: v1
kind: PersistentVolume
metadata:
name: manual-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: /mnt/data
EOF
kubectl apply -f ~/k8s-labs/troubleshooting/manual-pv.yaml
# 2. If no default storage class exists, create one or mark an existing one as default:
kubectl patch storageclass standard -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# 3. If the storage class provisioner isn't working, check provisioner pods:
kubectl get pods -n kube-system | grep provisioner
PROVISIONER_POD=$(kubectl get pods -n kube-system | grep provisioner | head -1 | awk '{print $1}')
kubectl logs $PROVISIONER_POD -n kube-system
# 4. If the issue is WaitForFirstConsumer binding mode, create a pod that uses the PVC:
kubectl get statefulset -n development
kubectl describe statefulset data-store -n development
# After fixing the issues, check if PVCs are now bound
kubectl get pvc -n development
# If the StatefulSet isn't automatically recovering, you might need to restart it
kubectl rollout restart statefulset data-store -n development
# Verify the pods are now running correctly
kubectl get pods -n development -l app=data-store
You manage a high-availability Kubernetes cluster with multiple control plane nodes. Users report intermittent API server failures and slow response times. You need to diagnose and resolve the issues with the control plane components.
# Get a list of all control plane nodes
kubectl get nodes -l node-role.kubernetes.io/control-plane=
# Check the status of control plane pods across all nodes
kubectl get pods -n kube-system -o wide | grep -E 'apiserver|controller|scheduler|etcd'
# Examine events related to control plane components
kubectl get events -n kube-system | grep -E 'apiserver|controller|scheduler|etcd'
# Check API server health across all control plane nodes
for node in $(kubectl get nodes -l node-role.kubernetes.io/control-plane= -o jsonpath='{.items[*].metadata.name}'); do
echo "Checking API server on $node"
kubectl get pods -n kube-system -o wide | grep apiserver | grep $node
done
# Check control plane component logs for errors
# For a specific API server pod:
API_POD=$(kubectl get pods -n kube-system | grep apiserver | head -1 | awk '{print $1}')
kubectl logs $API_POD -n kube-system | grep -i error
# Check etcd cluster health (critical for API server)
ETCD_POD=$(kubectl get pods -n kube-system | grep etcd | head -1 | awk '{print $1}')
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
endpoint health --cluster
# Check etcd member list
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
member list -w table
# Common HA cluster issues to look for:
# 1. Leader election conflicts
# 2. Etcd quorum issues
# 3. Network connectivity between control plane nodes
# 4. Certificate expiration on some nodes but not others
# 5. Resource constraints on specific nodes
# Check certificate expiration
CERT_DIR="/etc/kubernetes/pki"
for node in $(kubectl get nodes -l node-role.kubernetes.io/control-plane= -o jsonpath='{.items[*].metadata.name}'); do
echo "Checking certificates on $node"
# You'd need to SSH to each node to check this
# ssh $node "sudo openssl x509 -in $CERT_DIR/apiserver.crt -text -noout | grep -A2 Validity"
done
# Check for leader election status
kubectl get endpoints -n kube-system kube-scheduler -o yaml
kubectl get endpoints -n kube-system kube-controller-manager -o yaml
# Remediation steps based on identified issues:
# 1. If a specific control plane node is problematic, cordon and drain it:
kubectl cordon control-plane-2
kubectl drain control-plane-2 --ignore-daemonsets
# 2. For etcd member issues, you might need to remove and re-add the problematic member:
# First, get the member ID to remove
MEMBER_ID=$(kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
member list | grep control-plane-2 | cut -d',' -f1)
# Remove the problematic etcd member
kubectl exec -n kube-system $ETCD_POD -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
member remove $MEMBER_ID
# 3. For certificate issues, renew certificates:
# On the problematic node:
# sudo kubeadm certs renew all
# 4. If resource constraints are the issue, check and adjust limits:
kubectl top nodes
# After implementing fixes, verify control plane health
kubectl get pods -n kube-system -o wide | grep -E 'apiserver|controller|scheduler|etcd'
# Check API server performance
time kubectl get nodes
The Certified Kubernetes Administrator (CKA) exam is a hands-on, performance-based test that assesses your ability to perform the tasks of a Kubernetes administrator. Understanding the exam format is crucial for effective preparation.
The CKA exam consists of a series of performance-based tasks that must be completed within a 2-hour timeframe. These tasks involve managing real Kubernetes clusters through the command line interface. Unlike traditional multiple-choice exams, the CKA requires you to solve actual problems by implementing solutions in live environments.
The exam includes approximately 15-20 questions of varying difficulty and point values. Questions are weighted differently based on their complexity, with more challenging tasks carrying higher point values. The questions cover the full range of topics in the CKA curriculum, with particular emphasis on troubleshooting (30%), cluster architecture (25%), and services/networking (20%).
You will work in a browser-based terminal environment with access to multiple clusters. Each question specifies which cluster to use, and you must ensure you’re working in the correct context for each task.
During the exam, you have access to:
You cannot access other external resources, personal notes, or additional browser tabs beyond those provided. However, the official documentation should be sufficient for any reference needs during the exam.
The CKA exam requires a passing score of 66%. This means you need to correctly complete enough tasks to earn at least 66% of the total possible points. The certification is valid for three years from the date of certification.
Effective time management is critical for success in the CKA exam, given the 2-hour time constraint and the number of tasks to complete.
Begin by quickly reviewing all questions to understand their complexity and point values. Prioritize tasks based on:
This approach ensures you accumulate as many points as possible within the time constraints. If you encounter a particularly challenging question, it’s sometimes better to temporarily skip it and return later rather than spend too much time on a single task.
Allocate your time based on the weight of each question. As a general guideline:
This proportional allocation helps ensure you have enough time to attempt all questions. Set a mental timer for each question, and if you exceed your allocated time, consider moving on and returning later if time permits.
The exam interface allows you to bookmark questions for later review. Use this feature to mark:
In the last 15 minutes of the exam, return to your bookmarked questions to ensure you haven’t missed any tasks and to verify your solutions.
Mastering kubectl and Linux command line techniques will significantly improve your speed and accuracy during the exam.
Become familiar with these time-saving shortcuts:
k instead of kubectl (often pre-configured in the exam environment)po for pods, svc for services, deploy for deployments-o output flag with various formats like yaml, json, wide, name--dry-run=client -o yaml to generate resource manifests quicklykubectl explain for quick field referenceCreating YAML manifests efficiently is essential for many exam tasks:
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > deployment.yaml
kubectl get deployment existing-deploy -o yaml > new-deploy.yaml
kubectl explain deployment.spec.template.spec.containers
The exam environment provides vim as the primary text editor. Knowing these vim commands will save valuable time:
h (left), j (down), k (up), l (right)dd (delete line), yy (copy line), p (paste)/search_term then n for next occurrence:%s/old/new/g to replace all occurrences:wq:q!:line_numberThe exam involves working with multiple clusters. Master these techniques:
kubectl config current-contextkubectl config get-contextskubectl config use-context context_namealias k1='kubectl --context=cluster1'
alias k2='kubectl --context=cluster2'
Understanding typical exam scenarios and developing systematic approaches will help you tackle tasks efficiently.
Develop a structured approach to troubleshooting issues:
kubectl get all -n namespace
kubectl describe resource resource_name
kubectl logs pod_name
Examine related resources that could impact the issue (services, network policies, etc.)
kubectl get events -n namespace --sort-by='.lastTimestamp'
Verify configurations and relationships between components
For cluster installation and configuration tasks:
Review requirements carefully, noting specific versions, networking plugins, etc.
Use kubeadm config files when possible for repeatable configurations
When deploying applications:
Determine the appropriate workload controller (Deployment, StatefulSet, DaemonSet)
Generate the basic YAML template using imperative commands
Apply the configurations and verify functionality
Proper preparation immediately before and during the exam can significantly impact your performance.
Ensure a stable internet connection, preferably wired
Use a computer with adequate resources (minimum 4GB RAM, modern processor)
Configure a quiet, distraction-free environment
Have a backup device and internet connection available if possible
Test your webcam, microphone, and screen sharing capabilities beforehand
Get adequate rest the night before the exam
Review key concepts and commands, but avoid cramming new material
Practice with a full-length mock exam under timed conditions
Develop a positive mindset and confidence in your preparation
Plan to arrive at your exam setup 15 minutes early to complete proctoring procedures
Stay calm if you encounter difficult questions; remember your strategy to prioritize and allocate time
Use the notepad feature to track your progress and remember which questions you’ve bookmarked
Take short 30-second breaks if needed to maintain focus and reduce stress
Double-check your work, especially context switching between clusters
Submit your exam once complete or when time is nearly expired
This comprehensive understanding of the exam format, time management strategies, command efficiency techniques, and structured approaches to common scenarios will position you for success on the CKA exam. Next, we’ll proceed with a full-length mock exam to apply these concepts in practice.
This mock exam simulates the actual CKA exam format and difficulty level. It consists of 15 questions covering all domains of the curriculum. The total point value is 100%, and you need 66% to pass.
Time: 2 hours Passing Score: 66%
For maximum benefit:
Task: You have a Kubernetes cluster running version 1.24.0 that needs to be upgraded to version 1.25.0. The cluster consists of a control plane node named cp-node and two worker nodes named worker-1 and worker-2. Perform the upgrade with minimal disruption to the workloads.
Task: The API server on the cluster k8s-cluster-2 is not functioning properly. Users are unable to deploy new workloads. Identify the issue and fix it to restore full functionality.
Task: Create a new StorageClass named fast-storage with the provisioner kubernetes.io/no-provisioner and volumeBindingMode set to WaitForFirstConsumer. Then create a 2Gi Persistent Volume (PV) with the name pv-fast that uses this StorageClass. The PV should use a hostPath of /mnt/fast-data and have AccessMode ReadWriteOnce. Finally, create a PersistentVolumeClaim named pvc-fast in the namespace project-tiger that requests 1Gi from this StorageClass.
Task: A Pod named web-app in the production namespace is showing as CrashLoopBackOff. Investigate the issue and fix it so that the Pod runs properly. The Pod should be running an nginx container on port 80.
Task: Create a NetworkPolicy named db-policy in the namespace project-snake that allows Pods with the label role=frontend to connect to Pods with the label role=db only on port 3306. The policy should also allow all outbound traffic from the database pods.
Task: Create a Pod named sidecar-pod in the namespace project-tiger with two containers. The first container, named main, should use the nginx image and expose port 80. The second container, named logger, should use the busybox image and run the command sh -c "while true; do echo 'Logging at $(date)' >> /var/log/output.log; sleep 10; done". Both containers should mount a shared volume named log-volume at /var/log.
Task: Create a new ServiceAccount named deployment-sa in the namespace project-ram. Then, create a ClusterRole named deployment-manager that allows creating, listing, and updating deployments. Create a RoleBinding in the namespace project-ram that binds the ClusterRole to the ServiceAccount. Finally, create a Pod named deploy-manager in the namespace project-ram that uses this ServiceAccount, with the image bitnami/kubectl.
Task: Create a Deployment named frontend in the namespace project-tiger with 3 replicas, using the image nginx. The Pods should be scheduled on nodes with the label disk=ssd. Additionally, configure the Pods to have a node anti-affinity rule that prevents multiple Pods from the deployment from running on the same node.
Task: Perform a backup of the etcd database on the cluster and save it to /opt/etcd-backup.db. Then simulate a failure by stopping the etcd service. Restore the etcd database from the backup file and ensure the cluster is functioning properly.
Task: A Deployment named web-server is running in the namespace project-snake with 3 replicas. Create a Service named web-service that exposes the deployment on port 80 using a ClusterIP. Then create an additional Service named web-nodeport that exposes the same deployment using a NodePort on port 32000.
Task: Analyze the logs for the Pod api-server in the namespace project-lion. Find all ERROR level log entries and save them to the file /opt/api-errors.log. Then identify the most common error message and create a new ConfigMap named error-config in the namespace project-lion with the data most-common-error: <error-message>.
Task: Create a ResourceQuota named project-quota in the namespace project-tiger with the following constraints: pods=10, requests.cpu=2, requests.memory=2Gi, limits.cpu=4, limits.memory=4Gi. Then create a LimitRange named default-limits in the same namespace with default request values of cpu=100m and memory=256Mi, and default limit values of cpu=200m and memory=512Mi.
Task: The development team has requested persistent storage for a new database application. Create a PersistentVolume named db-pv with capacity of 5Gi, using hostPath storage at /mnt/db-data. It should use the ReadWriteOnce access mode. Then create a PersistentVolumeClaim named db-pvc in the namespace project-snake that requests 3Gi from this PersistentVolume. Finally, create a Pod named db-pod in the namespace project-snake that uses the mysql:5.7 image and mounts the PersistentVolumeClaim at /var/lib/mysql. Set the environment variable MYSQL_ROOT_PASSWORD to password.
Task: One of your worker nodes, worker-1, needs maintenance. Safely drain the node to ensure that workloads are moved to other nodes. After the maintenance is completed (simulated), make the node schedulable again. During the process, ensure that Pods without PersistentVolumeClaims are rescheduled to other nodes.
Task: Pods in the namespace project-cobra are unable to communicate with the service auth-service in the namespace project-tiger. Investigate the issue and fix the problem so that Pods in project-cobra can successfully connect to the service. Do not modify the service or the pods in the project-tiger namespace.
The CKA exam requires not just technical knowledge but also strategic time management. Here are specific techniques to maximize your efficiency:
When you first access the exam, spend the first 5 minutes scanning all questions. Create a quick prioritization plan:
Moving between different clusters and contexts can be time-consuming. Optimize this process by:
alias kc1='kubectl --context=cluster1'
alias kc2='kubectl --context=cluster2'
Completing all tasks for one context before switching to another when possible
kubectl config current-context
Develop a consistent process to verify your solutions before moving to the next question:
Re-read the question requirements to ensure you’ve addressed all parts
kubectl get <resource> -n <namespace>
kubectl describe <resource> <name> -n <namespace>
Check resource functionality when applicable (test service connectivity, pod functionality)
Beyond basic shortcuts, these advanced techniques can save critical time during the exam:
Create a library of command templates in a notepad at the start of the exam:
k run nginx --image=nginx --dry-run=client -o yaml > pod.yaml
k create deployment nginx --image=nginx --replicas=3 --dry-run=client -o yaml > deploy.yaml
k expose deployment nginx --port=80 --target-port=80 --dry-run=client -o yaml > svc.yaml
k create configmap my-config --from-literal=key1=value1 --from-literal=key2=value2 --dry-run=client -o yaml > cm.yaml
k create pvc my-pvc --storage-class=standard --access-modes=ReadWriteOnce --storage=1Gi --dry-run=client -o yaml > pvc.yaml
Master these commands to quickly extract and filter information:
k get pods -o jsonpath='{.items[*].metadata.name}'
k get pods -l app=nginx,environment=production
k get pods --sort-by='.status.containerStatuses[0].restartCount'
k get pods --all-namespaces
k get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
These additional vim techniques can significantly speed up editing:
gg - Go to the first lineG - Go to the last line0 - Go to the beginning of the line$ - Go to the end of the lineciw - Change inner worddi( - Delete everything inside parenthesesyi{ - Copy everything inside curly braces>G - Indent from current line to end of file:e filename - Edit a new file:bn - Switch to the next buffer:bp - Switch to the previous bufferBeing aware of frequent mistakes can help you prevent them during the exam:
Problem: Working in the wrong context or namespace is a common source of errors.
Solution:
kubectl config use-context <context-name>
kubectl config set-context --current --namespace=<namespace>
PS1='$(kubectl config current-context):$(kubectl config view --minify --output "jsonpath={..namespace}") \$ '
Problem: Creating resources without verifying they’re running correctly.
Solution:
# Create resource
kubectl apply -f resource.yaml
# Verify creation
kubectl get <resource-type> <resource-name>
# Check details for correctness
kubectl describe <resource-type> <resource-name>
# For pods, check logs if applicable
kubectl logs <pod-name>
Problem: Incorrect indentation or syntax in YAML files causes resource creation failures.
Solution:
Use spaces, not tabs, for indentation (4 spaces per level is recommended)
Maintain consistent indentation level increments
When copying from existing resources, carefully adjust indentation of new fields
For complex resources, generate template YAML with kubectl and modify rather than writing from scratch
Problem: Incorrectly configured RBAC settings that either grant too much or too little access.
Solution:
kubectl auth can-i <verb> <resource> --as=system:serviceaccount:<namespace>:<serviceaccount>
kubectl get rolebinding <binding-name> -o yaml
The day before your exam, focus on these activities:
Review your notes on frequently used commands and workflows
Practice kubectl command generation and resource creation without referring to documentation
Mentally walk through the troubleshooting process for common issues
Focus on understanding the relationship between different Kubernetes components
Test your computer, internet connection, webcam, and microphone
Ensure your exam space is quiet, well-lit, and free from unauthorized materials
Clear your desk except for allowed items
Close all unnecessary applications on your computer
Check that your browser allows for the exam portal requirements
Get a good night’s sleep before the exam
Have a light meal before the exam starts
Prepare water and any other permitted items
Plan to be ready 15 minutes before the scheduled start time
Use relaxation techniques if you feel anxious