In Kubernetes, a container might get terminated with an “out of memory” (OOM) error. We use the cute word “OOMKilled” for that, as shown here:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
conbench-deployment-8786544b9-2nfsk 0/1 OOMKilled 0 15h
conbench-deployment-8786544b9-57s5z 0/1 Completed 0 26h
conbench-deployment-8786544b9-brrpm 0/1 OOMKilled 0 2d17h
conbench-deployment-8786544b9-btqhb 0/1 OOMKilled 0 2d17h
conbench-deployment-8786544b9-d798x 0/1 Completed 0 15h
conbench-deployment-8786544b9-drb4m 0/1 Completed 0 26h
conbench-deployment-8786544b9-fpnmb 0/1 Completed 0 3d8h
conbench-deployment-8786544b9-gffz6 0/1 Completed 0 2d5h
conbench-deployment-8786544b9-nsq5h 0/1 OOMKilled 0 2d17h
conbench-deployment-8786544b9-szjj6 0/1 OOMKilled 0 2d18h
A prominent reason for a container to be OOMKilled is when it exceeds its designated memory limit. I probably don’t need to tell you about that one :).
But what if you didn’t set a memory limit? What if you only set a memory request? We might think that a memory request only applies for scheduling.
A container might in fact get OOMKilled based on the amount of memory requested. Quoting from the docs:
If a container exceeds its memory request and the node that it runs on becomes short of memory overall, it is likely that the Pod the container belongs to will be evicted.
(emphasis mine).
So. There’s a special condition in which suddenly it becomes important how much memory it uses vs how much memory it requested.
More interesting detail: when this condition is met, you will not always see an OOMKilled
status. The pod’s status might also show as Completed
in the kubectl get pods
output. Let’s have a look:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
conbench-deployment-f7f7b74bc-4j5cg 0/1 Completed 0 43m
Let’s look in more detail:
$ kubectl describe pods/conbench-deployment-f7f7b74bc-4j5cg
Name: conbench-deployment-f7f7b74bc-4j5cg
...
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Container conbench was using 3975828Ki, which exceeds its request of 2500Mi.
...
Okay, that summary is useful. More output from the above’s describe
command (I removed parts of the output and marked those places with ...
):
Containers:
conbench:
Container ID: docker://491ba29680bc1ca30c6d9140d56f6387f227d8f2c8f1cb2e96e5a0788ae2d727
...
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 06 Jun 2023 14:37:02 +0000
Finished: Tue, 06 Jun 2023 14:45:15 +0000
Ready: False
Restart Count: 0
Here we see the slightly confusing Terminated/Completed
.
Then:
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Yes, well, Ready being False makes sense.
Now maybe the most interesting bit, the chain of events which shows the memory pressure, and a Started -> Evicted -> Killing
sequence:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38m (x6 over 43m) default-scheduler 0/2 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
Warning FailedScheduling 36m (x2 over 37m) default-scheduler 0/2 nodes are available: 2 Insufficient memory.
Normal Scheduled 36m default-scheduler Successfully assigned default/conbench-deployment-f7f7b74bc-4j5cg to ip-172-31-43-148.us-east-2.compute.internal
Normal Pulling 36m kubelet Pulling image "41xxx99.dkr.ecr.us-east-2.amazonaws.com/conbench:067f1645b061ffb742e0aac7a138072a5b000cbb"
Normal Pulled 36m kubelet Successfully pulled image "41xxx99.dkr.ecr.us-east-2.amazonaws.com/conbench:067f1645b061ffb742e0aac7a138072a5b000cbb" in 185.704537ms
Normal Created 36m kubelet Created container conbench
Normal Started 36m kubelet Started container conbench
Warning Evicted 28m kubelet The node was low on resource: memory. Container conbench was using 3975828Ki, which exceeds its request of 2500Mi.
Normal Killing 28m kubelet Stopping container conbench
Warning ExceededGracePeriod 28m kubelet Container runtime did not kill the pod within specified grace period.
Leave a Reply