2.9 KiB
service, symptoms, tags
| service | symptoms | tags |
|---|---|---|
| kernel | OOM kill, out of memory, high load average, kernel panic, segfault, soft lockup, CPU steal, system unresponsive, zombie processes, NMI watchdog | kernel, oom, memory, load, cpu, panic, dmesg, segfault, lockup, swap, zombie |
Symptoms
Out of memory: Kill process <pid>in dmesg — OOM killer fired- Load average far above CPU count — system overloaded or I/O blocked
kernel: BUG: soft lockup— CPU stuck in kernel codesegfault at ...in dmesg — process crashed due to invalid memory accesskernel panic— unrecoverable kernel error (visible only on console or serial)- Many zombie (
Z) processes inpsoutput - High
%stealintop/vmstat— hypervisor CPU contention
Diagnostics
Recent kernel messages
dmesg -T | tail -100
dmesg -T | grep -iE 'error|warn|oom|kill|panic|oops|fault|hung|lockup'
journalctl -k -n 200
OOM events
dmesg -T | grep -i 'out of memory\|oom_kill\|killed process'
The log shows which process was killed, its RSS at time of kill, and available memory.
Memory usage
free -h
cat /proc/meminfo | head -30
vmstat -s
MemAvailable is the key metric. If it is near zero and swap is also exhausted, OOM kills are imminent.
Swap
swapon --show
cat /proc/swaps
vmstat 1 5
High si/so (swap-in/swap-out) in vmstat indicates active swapping and likely memory pressure.
Load average and CPU
uptime
top -b -n1 | head -30
mpstat -P ALL 1 3
Load average above 2× CPU count sustained over 15 minutes is concerning.
High %iowait indicates processes blocked on disk I/O, not CPU-bound load.
Process memory usage
ps aux --sort=-%mem | head -20
ps aux --sort=-%cpu | head -20
Zombie processes
ps aux | awk '$8=="Z"'
Zombies cannot be killed; the parent must wait() for them or be killed itself.
I/O wait and disk health
iostat -x 1 3
dmesg -T | grep -iE 'i/o error|hard resetting link|ata.*error|blk_update_request'
Persistent I/O errors alongside high load suggest failing storage.
Remediation
Memory pressure / frequent OOM kills:
Identify the largest memory consumers from ps aux --sort=-%mem.
Consider increasing swap, adding RAM, tuning vm.overcommit_memory, or scaling the workload.
Do NOT just raise vm.overcommit_ratio without understanding the root consumer.
Adjust OOM killer scoring for critical services (temporary, resets on reboot):
echo -17 > /proc/<pid>/oom_adj # legacy
echo -1000 > /proc/<pid>/oom_score_adj # current kernels
Swap exhausted — add a swapfile:
fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
High I/O wait — find the I/O-heavy process:
iotop -a -o -b -n3
Zombie reaping — if parent is stuck: Kill the parent process (it will reap children on exit), then verify zombies disappear.