Out of Memory (OOM) Error in Linux: Causes and Solutions

What the Out of Memory (OOM) Error Means

The Out of Memory (OOM) error in Linux is not a Windows-style message but an action of the operating system kernel. When the system physically exhausts all available RAM and swap file/partition space, the kernel activates the OOM killer mechanism.

Its goal is to maintain overall system operability by forcefully terminating one or more processes that consume the largest amount of memory. A typical symptom: a process (e.g., java, python, mysqld, a Docker container) suddenly terminates with a message in the logs:

[12345.678] Out of memory: Kill process 1234 (some_process) score 500 or sacrifice child
[12345.679] Killed process 1234 (some_process) total-vm:1234567kB, anon-rss:987654kB, file-rss:0kB

The system may become unresponsive, and after the "culprit" is terminated, it will return to a normal state.

Common Causes

The reasons for memory shortage usually fall into several categories:

Memory leaks in applications. A program (e.g., written in Java, Python, C++) gradually allocates memory for objects but does not release it after use. Over time, consumption grows to critical levels.
Misconfiguration of applications. An excessively large heap size for the JVM (-Xmx), caching too much data without limits, suboptimal web server settings (e.g., worker_processes + worker_connections in Nginx).
Insufficient physical RAM. Running multiple demanding applications (virtual machines, databases, heavy IDEs) on a server with limited memory.
Missing or insufficient swap space. Swap acts as a "safety cushion." If it's absent or too small, the first significant RAM shortage will trigger an OOM.
An attack or malware. For example, a DDoS attack causing a flood of connections, or a script infinitely creating processes/objects in memory.
Zombie processes or unreleasable kernel resources. Although less common, some kernel structures (e.g., unreleased inodes or dentries) can accumulate.

Resolution Methods

Method 1: Diagnosis and Monitoring (First Step Always)

Before changing anything, accurately identify the source of the problem.

Check system logs for OOM entries.

# Search in systemd logs (journalctl)
journalctl -k | grep -i -E "killed process|out of memory"

# Or via dmesg
dmesg | grep -i oom

The output will contain the PID and process name (some_process) that was killed. This is your primary suspect.

Assess current memory usage.

# Install the utility if missing (for Debian/Ubuntu)
sudo apt-get install htop

# Launch htop (press F6 to sort by MEM%)
htop

# Or use built-in commands
free -h  # Shows overall RAM+Swap picture
ps aux --sort=-%mem | head -10  # Top 10 processes by memory

For Docker/Kubernetes containers:

# Show containers with their memory consumption
docker stats --no-stream

# Check container logs for OOM
docker logs <container_id> 2>&1 | grep -i oom

In Kubernetes, an OOM event will be visible in kubectl describe pod <pod_name>.

Method 2: Immediate Actions for System Stabilization

If the system is already in crisis but still responsive:

Manually terminate the memory "hog" (use the PID from logs or htop).
```
sudo kill -9 <PID>
```
Caution: -9 (SIGKILL) is a blunt instrument. First try a regular kill <PID>.
Clear the filesystem cache (may help if the issue is with caches). This is safe; the kernel will repopulate them as needed.
```
# Clear pagecache, dentries, and inodes
sudo sync  # Flush data to disk
echo 3 | sudo tee /proc/sys/vm/drop_caches
```
Do not run this command too frequently or on a production server without understanding the consequences.

Temporarily increase swap space (if it's small or missing) to give the system "breathing room."

# Create a 2GB swap file
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# To make it persistent after reboot, add to /etc/fstab:
# /swapfile none swap sw 0 0

Verify with: free -h.

Method 3: Configuring OOM Killer Behavior (OOMPolicy)

You can influence which process OOM killer targets first.

Use oom_score_adj for critical processes. Every process has an oom_score (0 to 1000), calculated based on memory consumption. Higher scores mean higher likelihood of being killed. You can lower this score for important services.
```
# View the current oom_score for a process
cat /proc/<PID>/oom_score

# Set a low kill priority (e.g., -500) for PID 1234
sudo echo -500 > /proc/1234/oom_score_adj

# For permanent settings, use a systemd unit file
# In the service file /etc/systemd/system/<service>.service add:
# [Service]
# OOMScoreAdjust=-500
```
Important: Do not set oom_score_adj = -1000 (kill ban) for all processes. This can cause a complete system freeze.
Adjust memory overcommit policy (vm.overcommit_memory). By default (0), the kernel uses heuristics. Mode 1 (Always overcommit) allows unlimited memory allocation but increases OOM risk. Mode 2 (Don't overcommit) strictly checks if enough memory+swap is available.
```
# Check current value
cat /proc/sys/vm/overcommit_memory

# Temporarily set "no overcommit" mode (2)
sudo sysctl vm.overcommit_memory=2
sudo sysctl vm.overcommit_ratio=100  # % of RAM+Swap that can be allocated

# For permanent settings, add to /etc/sysctl.conf:
# vm.overcommit_memory = 2
# vm.overcommit_ratio = 100
```
Caution: Mode 2 may cause applications to fail with Cannot allocate memory before OOM killer triggers, which can sometimes be easier for debugging.

Method 4: Addressing the Root Cause

This is the most important and long-term step.

For Java applications: Check and reduce heap size (-Xmx, -Xms) in JVM parameters. Use tools like jstat, jmap, VisualVM to analyze heap dumps.
```
# Example launch with heap limited to 2GB
java -Xmx2g -jar your_app.jar
```
For web servers (Nginx/Apache): Optimize the number of worker processes/threads for available memory. Ensure there is no infinite caching.
Search for memory leaks:
- For C/C++: Use valgrind --leak-check=yes.
- For Python: tracemalloc, objgraph, memory_profiler.
- For Node.js: heapdump, clinic.js.
- General utilities: smem -t -p (shows PSS — Proportional Set Size), cat /proc/<PID>/smaps.
Hardware upgrade. If the load is legitimate and the application is optimized, you may simply lack sufficient physical RAM for the workload.
For containers (Docker/K8s): Correctly set memory limits (--memory, --memory-swap in Docker; resources.limits.memory in K8s). Without limits, a container can "eat" all host memory.

Prevention

Implement monitoring. Set up alerts (in Zabbix, Prometheus/Grafana, Nagios) for key metrics:
- node_memory_MemAvailable_bytes (available memory) < 10-15%
- node_memory_SwapTotal_bytes and swap usage > 50%
- container_spec_memory_limit_bytes (for containers)
Regularly analyze logs. Add log parsing for "Killed process" messages to your monitoring.
Configure limits properly. For all services (especially in Docker/K8s), define both requests (guaranteed minimum) and limits (hard ceiling).
Test under load. Conduct load testing (e.g., with stress-ng, memtester) in a staging environment to see application behavior as memory consumption grows.
Keep software updated. Memory leaks are often fixed in updates. Regularly update your OS and critical applications.

Conclusion

The Out of Memory error in Linux is a signal that the system has exhausted its resources. The OOM killer is a last line of defense, not a solution. Your task is to diagnose the memory "hog" via logs, stabilize the system (by terminating the process or adding swap), and then eliminate the root cause: a leak, misconfiguration, or physical RAM shortage. Prevention through monitoring and proper limit allocation is the best way to avoid unexpected service outages in the future.

F.A.Q.

What is OOM killer and why does it terminate my processes?

Can OOM killer be completely disabled?

How to identify which process was killed by OOM killer?

Does increasing swap space always solve the OOM problem?

Hints

Diagnosis: Check logs for OOM events

Monitoring: Assess current memory usage

Short-term solution: Free up memory

Medium-term solution: Configure OOM parameters

Long-term solution: Address the root cause