feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s

This commit is contained in:
2026-05-06 04:48:41 +02:00
parent 450de24d28
commit 57f4c0efaa
26 changed files with 2510 additions and 137 deletions

106
runbooks/disk.md Normal file
View File

@@ -0,0 +1,106 @@
---
service: disk
symptoms: no space left on device, disk full, inode exhaustion, df shows 100%, du large files, write failed, cannot create file, filesystem read-only, ext4 error
tags: disk, filesystem, storage, inodes, df, du, ext4, xfs, lvm, partition, full, space
---
## Symptoms
- `No space left on device` — disk or inode exhaustion
- `df -h` shows a filesystem at 100% (or near 100%)
- `df -i` shows inode usage at 100% — file count exhausted even if byte space is free
- Filesystem remounted read-only — kernel detected errors and protected itself
- Services failing to write logs, create temp files, or open sockets
## Diagnostics
### Overall disk usage
```
df -h
df -i
```
`df -h` shows byte space; `df -i` shows inode usage. Both can be independently exhausted.
Note which filesystem is full (`/`, `/var`, `/tmp`, `/home`, etc.).
### Find the large directories
```
du -sh /* 2>/dev/null | sort -rh | head -20
du -sh /var/* 2>/dev/null | sort -rh | head -20
du -sh /var/log/* 2>/dev/null | sort -rh | head -20
```
### Find large individual files
```
find / -xdev -type f -size +100M 2>/dev/null | sort -k5 -rn
find /var/log -type f -size +50M 2>/dev/null
```
### Find deleted-but-open files holding space
```
lsof +L1 2>/dev/null | grep -v "^COMMAND"
```
Files deleted while a process still has them open do not free space until the process releases the file descriptor.
### Inode exhaustion — find directories with many small files
```
find / -xdev -printf '%h\n' 2>/dev/null | sort | uniq -c | sort -rn | head -20
```
### Filesystem errors (after a crash or read-only remount)
```
dmesg | grep -i 'ext4\|xfs\|btrfs\|error\|corrupt'
journalctl -k | grep -i 'filesystem\|disk\|io error'
```
### LVM / partition layout
```
lsblk
pvs
vgs
lvs
```
## Remediation
**Large log files — truncate safely (do NOT rm while in use):**
```
truncate -s 0 /var/log/<logfile>
```
Or configure log rotation in `/etc/logrotate.d/`.
**Old journal logs eating space:**
```
journalctl --disk-usage
journalctl --vacuum-size=500M
journalctl --vacuum-time=30d
```
**Deleted-but-open files — restart the holding process to release space:**
Identify the PID from `lsof +L1`, then:
```
systemctl restart <service>
```
**Inode exhaustion — remove many small files:**
Common culprits: PHP session files in `/var/lib/php/sessions/`, old apt cache, tmp dirs.
```
find /var/lib/php/sessions -type f -mtime +7 -delete
apt-get clean
find /tmp -type f -mtime +3 -delete
```
**Extend LVM volume (if free extents exist in the volume group):**
```
lvextend -l +100%FREE /dev/<vg>/<lv>
resize2fs /dev/<vg>/<lv> # ext4
xfs_growfs /mountpoint # xfs
```