1 Kubernetes Autoscaling¶
Objective
- Resource & Limits
- Scheduling
- HPA
- VPA
- Cluster Autoscaling
- Node Auto provisioning (NAP)
0 Create GKE Cluster¶
Step 1 Enable the Google Kubernetes Engine API.
gcloud services enable container.googleapis.com
Step 2 From the cloud shell, run the following command to create a cluster with 1 node:
gcloud container clusters create k8s-scaling \
--zone us-central1-c \
--enable-vertical-pod-autoscaling \
--num-nodes 2
Output:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
k8s-scaling us-central1-c 1.19.9-gke.1400 34.121.222.83 e2-medium 1.19.9-gke.1400 2 RUNNING
Step 3 Authenticate to the cluster.
gcloud container clusters get-credentials k8s-scaling --zone us-central1-c
1.1 Resource and Limits¶
Step 1: Inspecting a node’s capacity
kubectl describe nodes | grep -A15 Capacity:
The output shows two sets of amounts related to the available resources on the node: the node’s capacity and allocatable resources. The capacity represents the total resources of a node, which may not all be available to pods. Certain resources may be reserved for Kubernetes and/or system components. The Scheduler bases its decisions only on the allocatable resource amounts.
Step 2: Show metrics for a given node
kubectl top nodes
kubectl top pods -n kube-system
Result
CPU and Memory information is available for pods and node through the metrics API.
Step 3 Create a deployment
best_effort.yaml
as showed below.
This is regular deployment with resources
configured
cat <<EOF > best_effort.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubia
spec:
selector:
matchLabels:
app: kubia
replicas: 3
template:
metadata:
name: kubia
labels:
app: kubia
spec:
containers:
- image: luksa/kubia:v1
name: nodejs
EOF
Step 4 Deploy application
kubectl create -f best_effort.yaml
Step 5 Verify what is the QOS for this pod:
kubectl describe pods | grep QoS
Result
If you don't specify request/limits K8s provides Best Effort
QOS
Step 6 Cleanup
kubectl delete -f best_effort.yaml
Step 7 Create a deployment
guaranteed.yaml
as showed below.
This is regular deployment with resources
configured
cat <<EOF > guaranteed.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubia
spec:
selector:
matchLabels:
app: kubia
replicas: 3
template:
metadata:
name: kubia
labels:
app: kubia
spec:
containers:
- image: luksa/kubia:v1
name: nodejs
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 100m
memory: 200Mi
EOF
Step 8 Deploy application
kubectl create -f guaranteed.yaml
Step 9 Verify what is the QOS for this pod:
kubectl describe pods | grep QoS
Result
If you request = limits K8s provides guaranteed
QOS
Step 10 Cleanup
kubectl delete -f guaranteed.yaml
Step 11 Create a deployment
burstable.yaml
as showed below.
This is regular deployment with resources
configured
cat <<EOF > burstable.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubia
spec:
selector:
matchLabels:
app: kubia
replicas: 3
template:
metadata:
name: kubia
labels:
app: kubia
spec:
containers:
- image: luksa/kubia:v1
name: nodejs
resources:
requests:
cpu: 3000
EOF
Step 12 Deploy application
kubectl create -f burstable.yaml
Step 13 Verify what is the QOS for this pod:
kubectl describe pods | grep QoS
Result
If you specify request > or < limits
K8s provides Burstable
QOS
Step 14 Check status of the Pods
kubectl get pods
Pending
Why the deployment failed ???
Step 15 Cleanup
kubectl delete -f burstable.yaml
1.2 Creating a Horizontal Pod Autoscaler based on CPU usage¶
Prerequisites: Ensure metrics api is running in your cluster.
kubectl get pod -n kube-system
Check the status of metrics-server-*****
pod status. It should be Running
kubectl top nodes
kubectl top pods -n kube-system
Result
CPU and Memory information is available for pods and node through the metrics API.
Let’s create a horizontal pod autoscaler now and configure it to scale pods based on their CPU utilization.
Step 1 Create a deployment.yaml
as showed below.
This is regular deployment with resources
configured
cat <<EOF > deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubia
spec:
selector:
matchLabels:
app: kubia
replicas: 3
template:
metadata:
name: kubia
labels:
app: kubia
spec:
containers:
- image: luksa/kubia:v1
name: nodejs
resources:
requests:
cpu: 100m
EOF
Step 2 Deploy application
kubectl create -f deployment.yaml
Step 3 After creating the deployment, to enable horizontal autoscaling of its pods, you need to create a HorizontalPodAutoscaler (HPA) object and point it to the deployment.
kubectl autoscale deployment kubia --cpu-percent=30 --min=1 --max=5
Note
This creates the HPA object for us and sets the deployment called kubia
as the scaling target. We’re setting the target CPU utilization of the pods to 30% and specifying the minimum and maximum number of replicas. The autoscaler will thus constantly keep adjusting the number of replicas to keep their CPU utilization around 30%, but it will never scale down to less than 1 or scale up to more than 5 replicas.
Step 4 Verify definition of the Horizontal Pod Autoscaler resource to gain a better understanding of it:
kubectl get hpa kubia -o yaml
Result:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
...
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: kubia
targetCPUUtilizationPercentage: 30
status:
currentReplicas: 0
desiredReplicas: 0
Step 5 Take a closer look at the HPA and notice that it is still not ready to do the autoscaling.
kubectl describe hpa kubia
Results
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 2m29s horizontal-pod-autoscaler unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedComputeMetricsReplicas 2m29s horizontal-pod-autoscaler failed to compute desired number of replicas based on listed metrics for Deployment/default/kubia: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Warning FailedGetResourceMetric 118s (x3 over 2m14s) horizontal-pod-autoscaler did not receive metrics for any ready pods
Warning FailedComputeMetricsReplicas 118s (x3 over 2m14s) horizontal-pod-autoscaler failed to compute desired number of replicas based on listed metrics for Deployment/default/kubia: invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: did not receive metrics for any ready pods
Given that historical data is not available yet, you will see the above in the events section.
Give it a minute or so and try again. Eventually, you will see the following in the Events
section.
Normal SuccessfulRescale 41s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
If you take a look at the kubia
deployment, you will see it was scaled down from 3 pods to 1 pod.
Step 6 Create a service
kubectl expose deployment kubia --port=80 --target-port=8080
Step 7 Start another terminal session and run:
watch -n 1 kubectl get hpa,deployment
Step 8 Generate load to the Application
kubectl run -it --rm --restart=Never loadgenerator --image=busybox \
-- sh -c "while true; do wget -O - -q http://kubia.default; done"
Step 9 Observe autoscaling In the other terminal you will start noticing that the deployment is being scaled up.
Step 10 Terminate both sessions by pressing Ctrl+c
1.3 Scale size of pods with Vertical Pod Autoscaling¶
Step 1 Verify that Vertical Pod Autoscaling has already been enabled on the cluster. We enabled VPA when we created the cluster, by using --enable-vertical-pod-autoscaling
. This command can be handy if you want to check VPA on an existing cluster.
gcloud container clusters describe k8s-scaling --zone us-central1-c | grep ^verticalPodAutoscaling -A 1
Step 2 Apply the hello-server deployment to your cluster
kubectl create deployment hello-server --image=gcr.io/google-samples/hello-app:2.0
Step 3 Ensure the deployment was successfully created
kubectl get deployment hello-server
Step 4 Assign a CPU resource request of 100m to the deployment
kubectl set resources deployment hello-server --requests=cpu=100m
Step 5 Inspect the container specifics of the hello-server
pods, find Requests
section, and notice that this pod is currently requesting the 450m CPU we assigned.
kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
Output
Containers:
hello-app:
Image: gcr.io/google-samples/hello-app:2.0
Port: <none>
Host Port: <none>
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rw2gr (ro)
Conditions:
Containers:
hello-app:
Container ID: containerd://e9bb428186f5d6a6572e81a5c0a9c37118fd2855f22173aa791d8429f35169a6
Image: gcr.io/google-samples/hello-app:2.0
Image ID: gcr.io/google-samples/hello-app@sha256:37e5287945774f27b418ce567cd77f4bbc9ef44a1bcd1a2312369f31f9cce567
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 09 Jun 2021 11:34:15 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rw2gr (ro)
Conditions:
Step 6 Create a manifest for you Vertical Pod Autoscale
cat << EOF > hello-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: hello-server-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hello-server
updatePolicy:
updateMode: "Off"
EOF
Step 7 Apply the manifest for hello-vpa
kubectl apply -f hello-vpa.yaml
Step 8 Wait a minute, and then view the VerticalPodAutoscaler
kubectl describe vpa hello-server-vpa
Step 9 Locate the "Container Recommendations" at the end of the output from the describe
command. If you don't see it, wait a little longer and try the previous command again. When it appears, you'll see several different recommendation types, each with values for CPU and memory:
- Lower Bound: this is the lower bound number VPA looks at for triggering a resize. If your pod utilization goes below this, VPA will delete the pod and scale it down.
- Target: this is the value VPA will use when resizing the pod.
- Uncapped Target: if no minimum or maximum capacity is assigned to the VPA, this will be the target utilization for VPA.
- Upper Bound: this is the upper bound number VPA looks at for triggering a resize. If your pod utilization goes above this, VPA will delete the pod and scale it up.
Notice that the VPA is recommending new values for CPU instead of what we set, and also giving you a suggested number for how much memory should be requested. We can at this point manually apply these suggestions, or allow VPA to apply them.
Step 10 Update the manifest to set the policy to Auto and apply the configuration
sed -i 's/Off/Auto/g' hello-vpa.yaml
kubectl apply -f hello-vpa.yaml
In order to resize a pod, Vertical Pod Autoscaler will need to delete that pod and recreate it with the new size. By default, to avoid downtime, VPA will not delete and resize the last active pod. Because of this, you will need at least 2 replicas to see VPA make any changes.
Step 11 Scale hello-server deployment to 2 replicas:
kubectl scale deployment hello-server --replicas=2
Step 12 Watch your pods
kubectl get pods -w
Step 13 The VPA should have resized your pods in the hello-server deployment. Inspect your pods:
kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
1.7 Cleaning Up¶
Step 1 Delete the cluster
gcloud container clusters delete k8s-scaling