What the Out of Memory (OOM) Error Means
The Out of Memory (OOM) error in Linux is not a Windows-style message but an action of the operating system kernel. When the system physically exhausts all available RAM and swap file/partition space, the kernel activates the OOM killer mechanism.
Its goal is to maintain overall system operability by forcefully terminating one or more processes that consume the largest amount of memory. A typical symptom: a process (e.g., java, python, mysqld, a Docker container) suddenly terminates with a message in the logs:
[12345.678] Out of memory: Kill process 1234 (some_process) score 500 or sacrifice child
[12345.679] Killed process 1234 (some_process) total-vm:1234567kB, anon-rss:987654kB, file-rss:0kB
The system may become unresponsive, and after the "culprit" is terminated, it will return to a normal state.
Common Causes
The reasons for memory shortage usually fall into several categories:
- Memory leaks in applications. A program (e.g., written in Java, Python, C++) gradually allocates memory for objects but does not release it after use. Over time, consumption grows to critical levels.
- Misconfiguration of applications. An excessively large heap size for the JVM (
-Xmx), caching too much data without limits, suboptimal web server settings (e.g.,worker_processes+worker_connectionsin Nginx). - Insufficient physical RAM. Running multiple demanding applications (virtual machines, databases, heavy IDEs) on a server with limited memory.
- Missing or insufficient swap space. Swap acts as a "safety cushion." If it's absent or too small, the first significant RAM shortage will trigger an OOM.
- An attack or malware. For example, a DDoS attack causing a flood of connections, or a script infinitely creating processes/objects in memory.
- Zombie processes or unreleasable kernel resources. Although less common, some kernel structures (e.g., unreleased inodes or dentries) can accumulate.
Resolution Methods
Method 1: Diagnosis and Monitoring (First Step Always)
Before changing anything, accurately identify the source of the problem.
- Check system logs for OOM entries.
# Search in systemd logs (journalctl) journalctl -k | grep -i -E "killed process|out of memory" # Or via dmesg dmesg | grep -i oom
The output will contain the PID and process name (some_process) that was killed. This is your primary suspect. - Assess current memory usage.
# Install the utility if missing (for Debian/Ubuntu) sudo apt-get install htop # Launch htop (press F6 to sort by MEM%) htop # Or use built-in commands free -h # Shows overall RAM+Swap picture ps aux --sort=-%mem | head -10 # Top 10 processes by memory - For Docker/Kubernetes containers:
# Show containers with their memory consumption docker stats --no-stream # Check container logs for OOM docker logs <container_id> 2>&1 | grep -i oom
In Kubernetes, an OOM event will be visible inkubectl describe pod <pod_name>.
Method 2: Immediate Actions for System Stabilization
If the system is already in crisis but still responsive:
- Manually terminate the memory "hog" (use the PID from logs or
htop).sudo kill -9 <PID>
Caution:-9(SIGKILL) is a blunt instrument. First try a regularkill <PID>. - Clear the filesystem cache (may help if the issue is with caches). This is safe; the kernel will repopulate them as needed.
# Clear pagecache, dentries, and inodes sudo sync # Flush data to disk echo 3 | sudo tee /proc/sys/vm/drop_caches
Do not run this command too frequently or on a production server without understanding the consequences. - Temporarily increase swap space (if it's small or missing) to give the system "breathing room."
# Create a 2GB swap file sudo fallocate -l 2G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile # To make it persistent after reboot, add to /etc/fstab: # /swapfile none swap sw 0 0
Verify with:free -h.
Method 3: Configuring OOM Killer Behavior (OOMPolicy)
You can influence which process OOM killer targets first.
- Use
oom_score_adjfor critical processes. Every process has anoom_score(0 to 1000), calculated based on memory consumption. Higher scores mean higher likelihood of being killed. You can lower this score for important services.# View the current oom_score for a process cat /proc/<PID>/oom_score # Set a low kill priority (e.g., -500) for PID 1234 sudo echo -500 > /proc/1234/oom_score_adj # For permanent settings, use a systemd unit file # In the service file /etc/systemd/system/<service>.service add: # [Service] # OOMScoreAdjust=-500
Important: Do not setoom_score_adj = -1000(kill ban) for all processes. This can cause a complete system freeze. - Adjust memory overcommit policy (
vm.overcommit_memory). By default (0), the kernel uses heuristics. Mode1(Always overcommit) allows unlimited memory allocation but increases OOM risk. Mode2(Don't overcommit) strictly checks if enough memory+swap is available.# Check current value cat /proc/sys/vm/overcommit_memory # Temporarily set "no overcommit" mode (2) sudo sysctl vm.overcommit_memory=2 sudo sysctl vm.overcommit_ratio=100 # % of RAM+Swap that can be allocated # For permanent settings, add to /etc/sysctl.conf: # vm.overcommit_memory = 2 # vm.overcommit_ratio = 100
Caution: Mode2may cause applications to fail withCannot allocate memorybefore OOM killer triggers, which can sometimes be easier for debugging.
Method 4: Addressing the Root Cause
This is the most important and long-term step.
- For Java applications: Check and reduce heap size (
-Xmx,-Xms) in JVM parameters. Use tools likejstat,jmap,VisualVMto analyze heap dumps.# Example launch with heap limited to 2GB java -Xmx2g -jar your_app.jar - For web servers (Nginx/Apache): Optimize the number of worker processes/threads for available memory. Ensure there is no infinite caching.
- Search for memory leaks:
- For C/C++: Use
valgrind --leak-check=yes. - For Python:
tracemalloc,objgraph,memory_profiler. - For Node.js:
heapdump,clinic.js. - General utilities:
smem -t -p(shows PSS β Proportional Set Size),cat /proc/<PID>/smaps.
- For C/C++: Use
- Hardware upgrade. If the load is legitimate and the application is optimized, you may simply lack sufficient physical RAM for the workload.
- For containers (Docker/K8s): Correctly set memory limits (
--memory,--memory-swapin Docker;resources.limits.memoryin K8s). Without limits, a container can "eat" all host memory.
Prevention
- Implement monitoring. Set up alerts (in Zabbix, Prometheus/Grafana, Nagios) for key metrics:
node_memory_MemAvailable_bytes(available memory) < 10-15%node_memory_SwapTotal_bytesand swap usage > 50%container_spec_memory_limit_bytes(for containers)
- Regularly analyze logs. Add log parsing for "Killed process" messages to your monitoring.
- Configure limits properly. For all services (especially in Docker/K8s), define both
requests(guaranteed minimum) andlimits(hard ceiling). - Test under load. Conduct load testing (e.g., with
stress-ng,memtester) in a staging environment to see application behavior as memory consumption grows. - Keep software updated. Memory leaks are often fixed in updates. Regularly update your OS and critical applications.
Conclusion
The Out of Memory error in Linux is a signal that the system has exhausted its resources. The OOM killer is a last line of defense, not a solution. Your task is to diagnose the memory "hog" via logs, stabilize the system (by terminating the process or adding swap), and then eliminate the root cause: a leak, misconfiguration, or physical RAM shortage. Prevention through monitoring and proper limit allocation is the best way to avoid unexpected service outages in the future.