OOMKilled in Kubernetes: Causes and Quick Fix

What the OOMKilled Error Means

OOMKilled (Out Of Memory Killed) is a container state in Kubernetes indicating that a process inside the container was forcibly terminated by the Linux kernel due to exceeding the allocated memory limit (cgroup memory limit). In the kubectl get pods output, the pod typically transitions to a CrashLoopBackOff or Error status, and in the logs or event from kubectl describe pod, you will see the last termination reason: State: Terminated Reason: OOMKilled.

This is a system-level error, not an application error. Kubernetes (via the container runtime, e.g., containerd) is simply enforcing the resource limit policy you set in the manifest.

Common Causes

Memory limit (limits.memory) set too low. The most frequent cause. The manifest specifies, for example, 512Mi, while the application peaks at 700Mi.
Memory leak in the application. The application gradually consumes more RAM until it hits the limit.
Mismatch between requests and limits. If requests is significantly lower than limits, the pod might be scheduled on a node with little free memory, leading to rapid limit exhaustion.
Unaccounted processes inside the container. Background processes (e.g., cron, sidecar containers) can consume additional memory.
JVM/runtime configuration issues. For Java applications, an incorrectly set -Xmx can cause the JVM to attempt to reserve more memory than the container limit allows, resulting in immediate termination.
Kernel memory or page cache usage. In some configurations, the cgroup accounts for not just user-space memory (RSS) but also cache. Intensive file operations can "eat" the limit.

Solutions

Solution 1: Increase Memory Limit in the Manifest

This is the most direct and often fastest fix.

Locate the manifest managing the pod (Deployment, StatefulSet, DaemonSet).

In the spec.template.spec.containers[].resources section, increase both limits.memory and, importantly, requests.memory.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp-container
        image: myapp:latest
        resources:
          requests:
            memory: "512Mi"   # Increase this value
          limits:
            memory: "1Gi"     # Increase this value

Apply the changes: kubectl apply -f deployment.yaml.
Monitor the pod restart: kubectl rollout status deployment/myapp.

💡 Tip: Set limits to 1.5-2x higher than the peak usage you estimated via kubectl top. requests should be close to average usage for proper scheduling.

Solution 2: Optimize the Application and Image

If you don't want to endlessly increase limits, you need to reduce consumption.

Memory profiling. Run the container in a debug mode.
- For Java: jcmd <PID> VM.native_memory summary or jmap -histo.
- For Go: go tool pprof http://<pod-ip>:6060/debug/pprof/heap.
- For Python: tracemalloc or memory-profiler.
JVM tuning. If the app is Java-based, explicitly set the maximum heap size via -Xmx, e.g., -Xmx800m. Ensure -Xmx + metaspace + thread stack < limits.memory.
Reduce image size. Use multi-stage builds and lightweight base images (alpine, distroless). Fewer layers mean less memory overhead.
Optimize code and configuration. Reduce cache sizes (Redis, embedded), tune connection pools, check for leaks (open file descriptors, unreleased objects).

Solution 3: Check and Adjust Node and cgroup Configuration

Sometimes the issue isn't the pod but the node configuration.

Check node swap. Kubernetes does not recommend using swap by default. If swap is enabled, the system may "thrash" instead of killing processes, which is harder to diagnose. It's better to disable it: sudo swapoff -a.
Check cgroup version. stat -fc %T /sys/fs/cgroup/. If it's cgroup2, ensure Docker/containerd is correctly configured for it. Some older runtime versions may incorrectly account for memory in cgroup v2.
Check overall node load. The node might have too many pods with high limits, and physical memory is simply insufficient. Use kubectl top node. Consider adding more nodes or configuring a ResourceQuota.

Solution 4: Configure `oomScoreAdj` (Survival During Node Memory Pressure)

This step won't fix a problem inside the container but can help critical pods survive if node memory runs out and the kernel starts killing processes.

Add securityContext to the pod manifest:

spec:
  containers:
  - name: myapp
    # ...
  securityContext:
    oomScoreAdj: -999  # Lowest kill priority (Linux only)

The lower the oomScoreAdj value (range -1000 to 1000), the less likely the process is to be killed first. The default for a container is 0. kubelet and system processes have negative values.

⚠️ Important: If the pod still gets OOMKilled, it means it exceeded its own limit, not that it was a victim of general node shortage. oomScoreAdj will not help in this case.

Prevention

Always set both requests and limits. Never leave limits as unlimited.
Configure monitoring. Use Prometheus + kube-state-metrics to collect metrics like kube_pod_container_resource_limits and container_memory_usage_bytes. Set up alerts (Alertmanager) for when usage approaches the limit (e.g., >85%).
Perform load testing. Before production, test the application with tools like stress-ng or hey inside the container to find real memory usage peaks.
Use Vertical Pod Autoscaler (VPA) in Auto or Initial mode. VPA can automatically analyze historical usage and recommend (or apply) optimal requests and limits values. Caution: VPA should not be used simultaneously with HPA on CPU/Memory without careful consideration.
Regularly update images. New versions of applications and their dependencies often contain memory leak fixes and optimizations.

F.A.Q.

What's the difference between memory requests and limits, and how do they affect OOMKilled?

Can memory limits be completely disabled to avoid OOMKilled?

Why does an application in a container crash with OOMKilled even when there's plenty of free RAM on the host?

What's the best tool for analyzing memory leaks in a pod?

Hints

Identify the problematic pod and check its status

Analyze current memory limits

Assess actual memory consumption

Adjust limits in the manifest

Optimize the application inside the container

Configure OOM Score Adjustment (optional)