What the OOMKilled Error Means
OOMKilled (Out Of Memory Killed) is a container state in Kubernetes indicating that a process inside the container was forcibly terminated by the Linux kernel due to exceeding the allocated memory limit (cgroup memory limit). In the kubectl get pods output, the pod typically transitions to a CrashLoopBackOff or Error status, and in the logs or event from kubectl describe pod, you will see the last termination reason: State: Terminated Reason: OOMKilled.
This is a system-level error, not an application error. Kubernetes (via the container runtime, e.g., containerd) is simply enforcing the resource limit policy you set in the manifest.
Common Causes
- Memory limit (
limits.memory) set too low. The most frequent cause. The manifest specifies, for example,512Mi, while the application peaks at700Mi. - Memory leak in the application. The application gradually consumes more RAM until it hits the limit.
- Mismatch between
requestsandlimits. Ifrequestsis significantly lower thanlimits, the pod might be scheduled on a node with little free memory, leading to rapid limit exhaustion. - Unaccounted processes inside the container. Background processes (e.g.,
cron,sidecarcontainers) can consume additional memory. - JVM/runtime configuration issues. For Java applications, an incorrectly set
-Xmxcan cause the JVM to attempt to reserve more memory than the container limit allows, resulting in immediate termination. - Kernel memory or page cache usage. In some configurations, the cgroup accounts for not just user-space memory (RSS) but also cache. Intensive file operations can "eat" the limit.
Solutions
Solution 1: Increase Memory Limit in the Manifest
This is the most direct and often fastest fix.
- Locate the manifest managing the pod (Deployment, StatefulSet, DaemonSet).
- In the
spec.template.spec.containers[].resourcessection, increase bothlimits.memoryand, importantly,requests.memory.apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: template: spec: containers: - name: myapp-container image: myapp:latest resources: requests: memory: "512Mi" # Increase this value limits: memory: "1Gi" # Increase this value - Apply the changes:
kubectl apply -f deployment.yaml. - Monitor the pod restart:
kubectl rollout status deployment/myapp.
💡 Tip: Set
limitsto 1.5-2x higher than the peak usage you estimated viakubectl top.requestsshould be close to average usage for proper scheduling.
Solution 2: Optimize the Application and Image
If you don't want to endlessly increase limits, you need to reduce consumption.
- Memory profiling. Run the container in a debug mode.
- For Java:
jcmd <PID> VM.native_memory summaryorjmap -histo. - For Go:
go tool pprof http://<pod-ip>:6060/debug/pprof/heap. - For Python:
tracemallocormemory-profiler.
- For Java:
- JVM tuning. If the app is Java-based, explicitly set the maximum heap size via
-Xmx, e.g.,-Xmx800m. Ensure-Xmx+ metaspace + thread stack <limits.memory. - Reduce image size. Use multi-stage builds and lightweight base images (
alpine,distroless). Fewer layers mean less memory overhead. - Optimize code and configuration. Reduce cache sizes (Redis, embedded), tune connection pools, check for leaks (open file descriptors, unreleased objects).
Solution 3: Check and Adjust Node and cgroup Configuration
Sometimes the issue isn't the pod but the node configuration.
- Check node swap. Kubernetes does not recommend using swap by default. If swap is enabled, the system may "thrash" instead of killing processes, which is harder to diagnose. It's better to disable it:
sudo swapoff -a. - Check cgroup version.
stat -fc %T /sys/fs/cgroup/. If it'scgroup2, ensure Docker/containerd is correctly configured for it. Some older runtime versions may incorrectly account for memory in cgroup v2. - Check overall node load. The node might have too many pods with high limits, and physical memory is simply insufficient. Use
kubectl top node. Consider adding more nodes or configuring aResourceQuota.
Solution 4: Configure oomScoreAdj (Survival During Node Memory Pressure)
This step won't fix a problem inside the container but can help critical pods survive if node memory runs out and the kernel starts killing processes.
Add securityContext to the pod manifest:
spec:
containers:
- name: myapp
# ...
securityContext:
oomScoreAdj: -999 # Lowest kill priority (Linux only)
The lower the oomScoreAdj value (range -1000 to 1000), the less likely the process is to be killed first. The default for a container is 0. kubelet and system processes have negative values.
⚠️ Important: If the pod still gets OOMKilled, it means it exceeded its own
limit, not that it was a victim of general node shortage.oomScoreAdjwill not help in this case.
Prevention
- Always set both
requestsandlimits. Never leavelimitsasunlimited. - Configure monitoring. Use Prometheus +
kube-state-metricsto collect metrics likekube_pod_container_resource_limitsandcontainer_memory_usage_bytes. Set up alerts (Alertmanager) for when usage approaches the limit (e.g., >85%). - Perform load testing. Before production, test the application with tools like
stress-ngorheyinside the container to find real memory usage peaks. - Use Vertical Pod Autoscaler (VPA) in
AutoorInitialmode. VPA can automatically analyze historical usage and recommend (or apply) optimalrequestsandlimitsvalues. Caution: VPA should not be used simultaneously with HPA on CPU/Memory without careful consideration. - Regularly update images. New versions of applications and their dependencies often contain memory leak fixes and optimizations.