Kubernetes: OOMkilled but no limit set?

In Kubernetes, a container might get terminated with an “out of memory” (OOM) error. We use the cute word “OOMKilled” for that, as shown here:

$ kubectl get pods
NAME                                           READY   STATUS      RESTARTS         AGE
conbench-deployment-8786544b9-2nfsk            0/1     OOMKilled   0                15h
conbench-deployment-8786544b9-57s5z            0/1     Completed   0                26h
conbench-deployment-8786544b9-brrpm            0/1     OOMKilled   0                2d17h
conbench-deployment-8786544b9-btqhb            0/1     OOMKilled   0                2d17h
conbench-deployment-8786544b9-d798x            0/1     Completed   0                15h
conbench-deployment-8786544b9-drb4m            0/1     Completed   0                26h
conbench-deployment-8786544b9-fpnmb            0/1     Completed   0                3d8h
conbench-deployment-8786544b9-gffz6            0/1     Completed   0                2d5h
conbench-deployment-8786544b9-nsq5h            0/1     OOMKilled   0                2d17h
conbench-deployment-8786544b9-szjj6            0/1     OOMKilled   0                2d18h

A prominent reason for a container to be OOMKilled is when it exceeds its designated memory limit. I probably don’t need to tell you about that one :).

But what if you didn’t set a memory limit? What if you only set a memory request? We might think that a memory request only applies for scheduling.

A container might in fact get OOMKilled based on the amount of memory requested. Quoting from the docs:

If a container exceeds its memory request and the node that it runs on becomes short of memory overall, it is likely that the Pod the container belongs to will be evicted.

(emphasis mine).

So. There’s a special condition in which suddenly it becomes important how much memory it uses vs how much memory it requested.

More interesting detail: when this condition is met, you will not always see an OOMKilled status. The pod’s status might also show as Completed in the kubectl get pods output. Let’s have a look:

$ kubectl get pods
NAME                                           READY   STATUS      RESTARTS         AGE
conbench-deployment-f7f7b74bc-4j5cg            0/1     Completed   0                43m

Let’s look in more detail:

$ kubectl describe pods/conbench-deployment-f7f7b74bc-4j5cg
Name:             conbench-deployment-f7f7b74bc-4j5cg
Status:           Failed
Reason:           Evicted
Message:          The node was low on resource: memory. Container conbench was using 3975828Ki, which exceeds its request of 2500Mi. 

Okay, that summary is useful. More output from the above’s describe command (I removed parts of the output and marked those places with ...):

    Container ID:  docker://491ba29680bc1ca30c6d9140d56f6387f227d8f2c8f1cb2e96e5a0788ae2d727
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 06 Jun 2023 14:37:02 +0000
      Finished:     Tue, 06 Jun 2023 14:45:15 +0000
    Ready:          False
    Restart Count:  0

Here we see the slightly confusing Terminated/Completed.


  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 

Yes, well, Ready being False makes sense.

Now maybe the most interesting bit, the chain of events which shows the memory pressure, and a Started -> Evicted -> Killing sequence:

  Type     Reason               Age                From               Message
  ----     ------               ----               ----               -------
  Warning  FailedScheduling     38m (x6 over 43m)  default-scheduler  0/2 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
  Warning  FailedScheduling     36m (x2 over 37m)  default-scheduler  0/2 nodes are available: 2 Insufficient memory.
  Normal   Scheduled            36m                default-scheduler  Successfully assigned default/conbench-deployment-f7f7b74bc-4j5cg to ip-172-31-43-148.us-east-2.compute.internal
  Normal   Pulling              36m                kubelet            Pulling image "41xxx99.dkr.ecr.us-east-2.amazonaws.com/conbench:067f1645b061ffb742e0aac7a138072a5b000cbb"
  Normal   Pulled               36m                kubelet            Successfully pulled image "41xxx99.dkr.ecr.us-east-2.amazonaws.com/conbench:067f1645b061ffb742e0aac7a138072a5b000cbb" in 185.704537ms
  Normal   Created              36m                kubelet            Created container conbench
  Normal   Started              36m                kubelet            Started container conbench
  Warning  Evicted              28m                kubelet            The node was low on resource: memory. Container conbench was using 3975828Ki, which exceeds its request of 2500Mi.
  Normal   Killing              28m                kubelet            Stopping container conbench
  Warning  ExceededGracePeriod  28m                kubelet            Container runtime did not kill the pod within specified grace period.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human? Please fill this out: * Time limit is exhausted. Please reload CAPTCHA.