feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
106
runbooks/disk.md
Normal file
106
runbooks/disk.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
service: disk
|
||||
symptoms: no space left on device, disk full, inode exhaustion, df shows 100%, du large files, write failed, cannot create file, filesystem read-only, ext4 error
|
||||
tags: disk, filesystem, storage, inodes, df, du, ext4, xfs, lvm, partition, full, space
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `No space left on device` — disk or inode exhaustion
|
||||
- `df -h` shows a filesystem at 100% (or near 100%)
|
||||
- `df -i` shows inode usage at 100% — file count exhausted even if byte space is free
|
||||
- Filesystem remounted read-only — kernel detected errors and protected itself
|
||||
- Services failing to write logs, create temp files, or open sockets
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Overall disk usage
|
||||
|
||||
```
|
||||
df -h
|
||||
df -i
|
||||
```
|
||||
|
||||
`df -h` shows byte space; `df -i` shows inode usage. Both can be independently exhausted.
|
||||
Note which filesystem is full (`/`, `/var`, `/tmp`, `/home`, etc.).
|
||||
|
||||
### Find the large directories
|
||||
|
||||
```
|
||||
du -sh /* 2>/dev/null | sort -rh | head -20
|
||||
du -sh /var/* 2>/dev/null | sort -rh | head -20
|
||||
du -sh /var/log/* 2>/dev/null | sort -rh | head -20
|
||||
```
|
||||
|
||||
### Find large individual files
|
||||
|
||||
```
|
||||
find / -xdev -type f -size +100M 2>/dev/null | sort -k5 -rn
|
||||
find /var/log -type f -size +50M 2>/dev/null
|
||||
```
|
||||
|
||||
### Find deleted-but-open files holding space
|
||||
|
||||
```
|
||||
lsof +L1 2>/dev/null | grep -v "^COMMAND"
|
||||
```
|
||||
|
||||
Files deleted while a process still has them open do not free space until the process releases the file descriptor.
|
||||
|
||||
### Inode exhaustion — find directories with many small files
|
||||
|
||||
```
|
||||
find / -xdev -printf '%h\n' 2>/dev/null | sort | uniq -c | sort -rn | head -20
|
||||
```
|
||||
|
||||
### Filesystem errors (after a crash or read-only remount)
|
||||
|
||||
```
|
||||
dmesg | grep -i 'ext4\|xfs\|btrfs\|error\|corrupt'
|
||||
journalctl -k | grep -i 'filesystem\|disk\|io error'
|
||||
```
|
||||
|
||||
### LVM / partition layout
|
||||
|
||||
```
|
||||
lsblk
|
||||
pvs
|
||||
vgs
|
||||
lvs
|
||||
```
|
||||
|
||||
## Remediation
|
||||
|
||||
**Large log files — truncate safely (do NOT rm while in use):**
|
||||
```
|
||||
truncate -s 0 /var/log/<logfile>
|
||||
```
|
||||
Or configure log rotation in `/etc/logrotate.d/`.
|
||||
|
||||
**Old journal logs eating space:**
|
||||
```
|
||||
journalctl --disk-usage
|
||||
journalctl --vacuum-size=500M
|
||||
journalctl --vacuum-time=30d
|
||||
```
|
||||
|
||||
**Deleted-but-open files — restart the holding process to release space:**
|
||||
Identify the PID from `lsof +L1`, then:
|
||||
```
|
||||
systemctl restart <service>
|
||||
```
|
||||
|
||||
**Inode exhaustion — remove many small files:**
|
||||
Common culprits: PHP session files in `/var/lib/php/sessions/`, old apt cache, tmp dirs.
|
||||
```
|
||||
find /var/lib/php/sessions -type f -mtime +7 -delete
|
||||
apt-get clean
|
||||
find /tmp -type f -mtime +3 -delete
|
||||
```
|
||||
|
||||
**Extend LVM volume (if free extents exist in the volume group):**
|
||||
```
|
||||
lvextend -l +100%FREE /dev/<vg>/<lv>
|
||||
resize2fs /dev/<vg>/<lv> # ext4
|
||||
xfs_growfs /mountpoint # xfs
|
||||
```
|
||||
Reference in New Issue
Block a user