OOM Killer in Linux: How to Detect and Prevent

What is OOM Killer and Why It Appears

OOM Killer (Out-of-Memory Killer) is a Linux kernel mechanism that automatically terminates processes when the system has exhausted available RAM and swap space. Its goal is to free up memory so the kernel and critical system processes can continue operating, preventing a complete system crash.

Typically, OOM Killer activates when:

Physical RAM and swap are 100% full.
An application has a memory leak.
Too many memory-intensive processes are running on the server.
Memory limits in containers (Docker/Kubernetes) are misconfigured.

If you see Killed process in logs or an application suddenly exits with code 137 (SIGKILL), OOM Killer is likely the culprit.

How OOM Killer Works

The Linux kernel calculates an oom_score for each process based on:

The proportion of memory consumed by the process (primary factor).
Process privileges (root processes are less likely to be killed).
Process lifetime (long-running processes may have a higher score).

The process with the highest oom_score is selected for termination. However, this isn't always optimal: OOM Killer might kill an important service while leaving a background process with a leak.

Diagnosing the Problem

Before taking any action, confirm that OOM Killer is the cause.

Check kernel logs:

dmesg | grep -i kill

Example output:

[12345.678] Out of memory: Kill process 1234 (nginx) score 500 or sacrifice child
[12345.680] Killed process 1234 (nginx) total-vm:1234567kB, anon-rss:456789kB, file-rss:0kB

Here, process nginx with PID 1234 was killed.

Assess total memory:
```
free -h
```
Pay attention to total, used, available, and Swap columns. If available is near zero and Swap is also full, the system is in a critical state.
Find the memory-consuming process:
```
top -b -n 1 | head -20
```
Or use htop sorted by memory (press F6 → MEM%).

Check processes' oom_score:

for pid in $(ps -e | awk '{print $1}' | tail -n +2); do
  echo "PID $pid: $(cat /proc/$pid/oom_score 2>/dev/null) (adj: $(cat /proc/$pid/oom_score_adj 2>/dev/null))"
done | sort -k3 -n -r | head -10

This shows the top 10 processes with the highest oom_score.

Problem Resolution Methods

Step 1: Configure `oom_score_adj` to Protect Key Processes

Each process can be assigned an adj value from -1000 (maximum protection) to +1000 (maximum kill priority). This is the fastest way to protect a process.

For a one-time setting (until reboot):

# Replace <PID> with the process ID
echo -1000 > /proc/<PID>/oom_score_adj

For a permanent setting via systemd (recommended): Create or edit the unit file:

# /etc/systemd/system/your-service.service.d/oom-protect.conf
[Service]
OOMScoreAdjust=-1000

Then reload and restart the service: systemctl daemon-reload && systemctl restart your-service.

Important: Do not set oom_score_adj=-1000 for all processes — this may prevent OOM Killer from freeing memory and cause the system to hang.

Step 2: Use cgroups to Limit Memory

cgroups (control groups) allow setting hard memory limits for process groups. This is the best approach for containers and isolated services.

Via systemd (modern distributions):

# Run a command with a 500 MB limit
systemd-run --scope -p MemoryMax=500M /path/to/command

# Or for an existing service, create a drop-in:
# /etc/systemd/system/your-service.service.d/limits.conf
[Service]
MemoryMax=1G
MemorySwapMax=2G  # if swap is needed

Manually via cgroup v2:

# Create a cgroup
sudo mkdir /sys/fs/cgroup/mylimit
# Set a 1 GB limit
echo $((1*1024*1024*1024)) | sudo tee /sys/fs/cgroup/mylimit/memory.max
# Start a process in this group
sudo echo $$ > /sys/fs/cgroup/mylimit/cgroup.procs && /path/to/your/app

Step 3: Tune Kernel Parameters

Adjust OOM Killer behavior at the kernel level.

Option A: Disable memory overcommit (strict control): In /etc/sysctl.conf, add:

vm.overcommit_memory = 2
vm.overcommit_ratio = 100  # allow commit only up to 100% of RAM+swap

Apply: sudo sysctl -p. This prevents allocating non-existent memory but may cause fork: Cannot allocate memory errors in applications.

Option B: Panic mode instead of killing (for debugging):

vm.panic_on_oom = 2

On memory shortage, the system will kernel panic, useful for dump collection but unsuitable for production.

Option C: Change OOM Killer aggressiveness (rarely used):

vm.oom_kill_allocating_task = 1  # kill the process that allocated memory, not a random one

Step 4: Optimize the Application or Increase Resources

If the issue is caused by a memory leak:

Use profilers: valgrind --leak-check=full, heaptrack, perf.
For Java apps: tune -Xmx and -Xms in the JVM.
For Python: check for leaks (e.g., via tracemalloc).

If the load is legitimate:

Increase the server's RAM.

Add a swap file (temporary fix, not a panacea):

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Preventing OOM Killer

Memory monitoring:
- Use Prometheus + node_exporter or Zabbix.
- Set alerts for RAM usage > 80%.
- Quick check command: awk '/MemAvailable/ {print $2/1024" GB available"}' /proc/meminfo.

Log memory usage:

# Log every 5 minutes via cron
*/5 * * * * /usr/bin/free -h >> /var/log/memory.log

Regular process audits:
- Look for processes with abnormally high oom_score.
- Verify container limits are appropriate.

Container-Specific Behavior (Docker/Kubernetes)

In containers, OOM Killer operates within cgroup isolation, but if a container exhausts its limit, the kernel kills processes inside it.

Docker:

# Run with 512 MB RAM and 1 GB swap limit
docker run -d --memory=512m --memory-swap=1g your-image

# Check limits
docker stats

Kubernetes:

resources:
  limits:
    memory: "512Mi"
    cpu: "500m"
  requests:
    memory: "256Mi"

Ensure requests and limits are set appropriately. When a limit is exceeded, the pod will be killed (OOMKilled).

Common Configuration Mistakes

Protecting all processes with oom_score_adj=-1000: This effectively disables OOM Killer entirely, which can lead to a complete system lockup during memory shortages.
Setting cgroup limits higher than physical RAM: Even with high limits, OOM Killer will still trigger at the host level.
Ignoring swap: Swap slows the system but can buy reaction time. Completely disabling swap (swapoff -a) accelerates OOM Killer triggering.
Misinterpreting logs: Killed process can also come from a manual kill -9. Always check dmesg and the process exit code (137 = SIGKILL, often from OOM).

What's Next?

After applying measures, verify:

Stability under load (test with stress-ng or real traffic).
Absence of new OOM Killer entries in logs.
Protected processes are functioning correctly (not consuming excessive memory at others' expense).

If the problem persists, consider architectural changes: sharding, caching, or using more efficient data processing algorithms.

Remember: OOM Killer is the system's last line of defense. The best strategy is to prevent it from triggering through monitoring and prudent resource planning.

F.A.Q.

Why does OOM Killer kill important system processes?

How to temporarily disable OOM Killer?

How to protect a specific process from OOM Killer?

Does OOM Killer work inside Docker containers?

Hints

Analyzing kernel logs for OOM Killer entries

Checking current memory usage

Configuring oom_score_adj to protect a process

Using cgroups to limit memory

Configuring kernel parameters

Optimizing the application or increasing RAM

OOM Killer in Linux: How to Detect and Prevent

F.A.Q.

Hints

Related