Improve Kubernetes Posture: Limit Container Memory and CPU resources

Jul 12, 2024

3 min read

Request and Limit is a way in Kubernetes to allocate resources for your workload. Defining it properly will help you maximize your cluster capability, minimize unused resources, and ensure that your workload is scheduled based on what you need. To understand further, let's define what Request and Limit are in Kubernetes.

Request defines the reserved resource for your workload. It is the resource you reserve for your workload, like a pod, which is used to determine if your workload is schedulable to a certain available node or not. Once your workload is scheduled, the scheduler checks the best node on which it will satisfy your defined CPU and Memory resources. Once identified, your workload is now scheduled to that node and is guaranteed to use your defined request resource. Request guarantees/reserves how much resource your workload needs.

On the other hand, limits are where you define how many resources your workload can consume. For instance, you might have a workload with a request of 512Mi and a limit of up to 1Gi. This allows your workload to consume resources up to your defined limit if available and can be satisfied by the Node where it runs. A limit is a way to restrict what your workload is allowed to do. For instance, if a certain workload uses more memory than its limit, it will kill the process, resulting in a status of "killed" and "OOM" as the reason.

Now that we understand what requests and limits are, let's discuss them in contrast to the Horizontal Pod Autoscaler (HPA). HPA is a mechanism to automatically scale your pod once it reaches a certain utilization threshold. You can define a specific percentage of your pod's utilization to determine whether the HPA will scale out or scale in your pod replicas. HPA also defines the maximum allowed number of resources to prevent excessive scaling of your pod. To avoid confusion, the maximum percentage of pod use is based on the requested resource you have defined.

Let's see that in an example:

Let's say you have an existing workload that has 2 replicas and is using 100 milicores of CPU and 512Mi of Memory as its requested resources. You have defined an HPA to scale out once you reach 80% of your utilization and scale in when your workload reaches 40% of utilization for both CPU and Memory resources. Once your pod's CPU utilization reaches around 80 milicores (thus giving 80% of your requested CPU), the HPA will scale out your replicas and add 1 replica to distribute requests and accompany your existing pod. On the other hand, once either the Memory or CPU usage goes below 40%, for instance, if CPU usage goes down to 40 milicores, the HPA will scale in your pod and remove 1 replica.

In the event of maximizing your HPA limit, let's say peak traffic goes to your application and HPA scales out up to a maximum of 10 pods defined. Your existing pods can still use more resources up to the amount defined in your resource limit.

Request and Limit play an important role in utilizing your cluster efficiently and ensuring the schedulability of your workloads. On the other hand, not defining this configuration will cause you trouble. By not having a request, your workload won't have a guaranteed resource once it runs on your node and might be killed or throttle your workload's process.

Not defining limits for your workload will result in unlimited resources consumption, potentially using up a lot of resources and causing scheduling problems for your other workloads in the cluster. Defining limits also helps you to prevent unwanted consumption of resources from security breaches. For instance, when an unwanted process is executed by a command and control or malware on your pod, you ensure that it won't use too many resources in the cluster and ultimately affect other workloads in your cluster as well.

How can you Improve Kubernetes Posture and accurately define request and limit? The most effective way to do this is by examining your current workload usage. You can leverage various monitoring tools provided by your cloud provider or integrated with Kubernetes, such as Prometheus and Grafana, to assess and comprehend how to adjust your workload's resource requirements according to its consumption patterns. At Cloudeights, we can assist you in this process. Together, we will determine the optimal model for your workload and fine-tune the resource request and limit to best suit your specific requirements. Please inform us how we can proceed with this discussion.