feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
86
runbooks/apparmor.md
Normal file
86
runbooks/apparmor.md
Normal file
@@ -0,0 +1,86 @@
|
||||
---
|
||||
service: apparmor
|
||||
symptoms: permission denied despite correct unix permissions, apparmor deny logs, service blocked by profile, executable transition denied, path access denied, snap confinement issue, profile in complain mode
|
||||
tags: apparmor, security, profile, aa-status, audit, confinement, complain, enforce, snap
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Application gets `Permission denied` even though Unix permissions look correct
|
||||
- Service starts in complain mode but fails in enforce mode
|
||||
- Log shows AppArmor `DENIED` entries
|
||||
- Binary works when profile is disabled but fails when confinement is enabled
|
||||
- Snap or packaged app cannot access expected files or sockets
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check AppArmor status and loaded profiles
|
||||
|
||||
```
|
||||
aa-status
|
||||
systemctl status apparmor
|
||||
```
|
||||
|
||||
Confirm whether the profile is loaded and whether it is in enforce or complain mode.
|
||||
|
||||
### Check denial logs
|
||||
|
||||
```
|
||||
journalctl -k | grep -i apparmor
|
||||
journalctl -b | grep -i DENIED
|
||||
dmesg | grep -i apparmor
|
||||
```
|
||||
|
||||
AppArmor denials usually identify the profile, operation, and path that was blocked.
|
||||
|
||||
### Inspect the active profile
|
||||
|
||||
```
|
||||
find /etc/apparmor.d -maxdepth 2 -type f | sort
|
||||
cat /etc/apparmor.d/<profile>
|
||||
```
|
||||
|
||||
Look for missing file path rules, capability rules, and `ix`/`px` execution transitions.
|
||||
|
||||
### Check complain vs enforce mode
|
||||
|
||||
```
|
||||
aa-status | grep complain
|
||||
```
|
||||
|
||||
If the issue only occurs in enforce mode, the profile is too restrictive rather than the app being broken.
|
||||
|
||||
### Check profile parser and reload
|
||||
|
||||
```
|
||||
apparmor_parser -r /etc/apparmor.d/<profile>
|
||||
aa-status
|
||||
```
|
||||
|
||||
Syntax or include errors can prevent an updated profile from loading.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Profile too restrictive:**
|
||||
Add the missing path, capability, or network rule to the profile, then reload AppArmor.
|
||||
|
||||
If the denial pattern is repetitive, use AppArmor tooling to review and refine the profile instead of disabling confinement globally.
|
||||
|
||||
**Need to observe without blocking:**
|
||||
Temporarily switch the profile to complain mode:
|
||||
```
|
||||
aa-complain /etc/apparmor.d/<profile>
|
||||
```
|
||||
|
||||
**Return to enforcement after fixing rules:**
|
||||
```
|
||||
aa-enforce /etc/apparmor.d/<profile>
|
||||
```
|
||||
|
||||
**Profile reload after changes:**
|
||||
```
|
||||
apparmor_parser -r /etc/apparmor.d/<profile>
|
||||
systemctl reload apparmor
|
||||
```
|
||||
|
||||
Do not disable AppArmor globally when the issue is isolated to a single profile.
|
||||
106
runbooks/disk.md
Normal file
106
runbooks/disk.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
service: disk
|
||||
symptoms: no space left on device, disk full, inode exhaustion, df shows 100%, du large files, write failed, cannot create file, filesystem read-only, ext4 error
|
||||
tags: disk, filesystem, storage, inodes, df, du, ext4, xfs, lvm, partition, full, space
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `No space left on device` — disk or inode exhaustion
|
||||
- `df -h` shows a filesystem at 100% (or near 100%)
|
||||
- `df -i` shows inode usage at 100% — file count exhausted even if byte space is free
|
||||
- Filesystem remounted read-only — kernel detected errors and protected itself
|
||||
- Services failing to write logs, create temp files, or open sockets
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Overall disk usage
|
||||
|
||||
```
|
||||
df -h
|
||||
df -i
|
||||
```
|
||||
|
||||
`df -h` shows byte space; `df -i` shows inode usage. Both can be independently exhausted.
|
||||
Note which filesystem is full (`/`, `/var`, `/tmp`, `/home`, etc.).
|
||||
|
||||
### Find the large directories
|
||||
|
||||
```
|
||||
du -sh /* 2>/dev/null | sort -rh | head -20
|
||||
du -sh /var/* 2>/dev/null | sort -rh | head -20
|
||||
du -sh /var/log/* 2>/dev/null | sort -rh | head -20
|
||||
```
|
||||
|
||||
### Find large individual files
|
||||
|
||||
```
|
||||
find / -xdev -type f -size +100M 2>/dev/null | sort -k5 -rn
|
||||
find /var/log -type f -size +50M 2>/dev/null
|
||||
```
|
||||
|
||||
### Find deleted-but-open files holding space
|
||||
|
||||
```
|
||||
lsof +L1 2>/dev/null | grep -v "^COMMAND"
|
||||
```
|
||||
|
||||
Files deleted while a process still has them open do not free space until the process releases the file descriptor.
|
||||
|
||||
### Inode exhaustion — find directories with many small files
|
||||
|
||||
```
|
||||
find / -xdev -printf '%h\n' 2>/dev/null | sort | uniq -c | sort -rn | head -20
|
||||
```
|
||||
|
||||
### Filesystem errors (after a crash or read-only remount)
|
||||
|
||||
```
|
||||
dmesg | grep -i 'ext4\|xfs\|btrfs\|error\|corrupt'
|
||||
journalctl -k | grep -i 'filesystem\|disk\|io error'
|
||||
```
|
||||
|
||||
### LVM / partition layout
|
||||
|
||||
```
|
||||
lsblk
|
||||
pvs
|
||||
vgs
|
||||
lvs
|
||||
```
|
||||
|
||||
## Remediation
|
||||
|
||||
**Large log files — truncate safely (do NOT rm while in use):**
|
||||
```
|
||||
truncate -s 0 /var/log/<logfile>
|
||||
```
|
||||
Or configure log rotation in `/etc/logrotate.d/`.
|
||||
|
||||
**Old journal logs eating space:**
|
||||
```
|
||||
journalctl --disk-usage
|
||||
journalctl --vacuum-size=500M
|
||||
journalctl --vacuum-time=30d
|
||||
```
|
||||
|
||||
**Deleted-but-open files — restart the holding process to release space:**
|
||||
Identify the PID from `lsof +L1`, then:
|
||||
```
|
||||
systemctl restart <service>
|
||||
```
|
||||
|
||||
**Inode exhaustion — remove many small files:**
|
||||
Common culprits: PHP session files in `/var/lib/php/sessions/`, old apt cache, tmp dirs.
|
||||
```
|
||||
find /var/lib/php/sessions -type f -mtime +7 -delete
|
||||
apt-get clean
|
||||
find /tmp -type f -mtime +3 -delete
|
||||
```
|
||||
|
||||
**Extend LVM volume (if free extents exist in the volume group):**
|
||||
```
|
||||
lvextend -l +100%FREE /dev/<vg>/<lv>
|
||||
resize2fs /dev/<vg>/<lv> # ext4
|
||||
xfs_growfs /mountpoint # xfs
|
||||
```
|
||||
120
runbooks/docker.md
Normal file
120
runbooks/docker.md
Normal file
@@ -0,0 +1,120 @@
|
||||
---
|
||||
service: docker
|
||||
symptoms: cannot connect to docker daemon, docker daemon failed to start, docker socket permission denied, containers cannot resolve dns, docker network broken, daemon.json conflict, docker oom, unable to remove filesystem
|
||||
tags: docker, dockerd, containerd, container, daemon, daemon.json, cgroup, dns, docker0, socket, compose
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `Cannot connect to the Docker daemon. Is the docker daemon running on this host?`
|
||||
- `permission denied` on `/var/run/docker.sock`
|
||||
- `dockerd` fails to start after a `daemon.json` change
|
||||
- Containers cannot resolve DNS or pull images
|
||||
- Docker bridge/network disappears or container networking breaks after boot
|
||||
- Container or daemon is killed by the kernel OOM killer
|
||||
- `Error: Unable to remove filesystem` when removing a container
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check daemon health and client target
|
||||
|
||||
```
|
||||
docker info
|
||||
systemctl is-active docker
|
||||
systemctl status docker
|
||||
ps -ef | grep dockerd
|
||||
env | grep DOCKER_HOST
|
||||
```
|
||||
|
||||
If `DOCKER_HOST` is set incorrectly, the CLI may be talking to the wrong daemon.
|
||||
|
||||
### Check daemon logs and startup failures
|
||||
|
||||
```
|
||||
journalctl -u docker -n 200
|
||||
journalctl -u containerd -n 100
|
||||
cat /etc/docker/daemon.json
|
||||
systemctl cat docker
|
||||
```
|
||||
|
||||
Look for conflicts between `daemon.json` keys and systemd startup flags, especially duplicate `hosts` settings.
|
||||
|
||||
### Check socket permissions and group access
|
||||
|
||||
```
|
||||
ls -la /var/run/docker.sock
|
||||
id
|
||||
getent group docker
|
||||
ls -la ~/.docker/
|
||||
```
|
||||
|
||||
If the user was added to the `docker` group recently, a new login shell may be required.
|
||||
|
||||
### Check kernel, cgroups, and memory pressure
|
||||
|
||||
```
|
||||
uname -r
|
||||
free -h
|
||||
dmesg | grep -i -E 'docker|cgroup|oom|killed process'
|
||||
```
|
||||
|
||||
Low memory, missing kernel features, or cgroup issues can stop containers or the daemon.
|
||||
|
||||
### Check Docker networking and DNS
|
||||
|
||||
```
|
||||
docker network ls
|
||||
ip addr show docker0
|
||||
sysctl net.ipv4.ip_forward
|
||||
cat /etc/resolv.conf
|
||||
ps aux | grep dnsmasq
|
||||
```
|
||||
|
||||
Loopback DNS resolvers in `/etc/resolv.conf` often break container DNS unless Docker is given explicit nameservers.
|
||||
|
||||
### Check storage and stuck mounts
|
||||
|
||||
```
|
||||
df -h /var/lib/docker
|
||||
docker system df
|
||||
lsof /var/lib/docker
|
||||
```
|
||||
|
||||
Bind-mounting `/var/lib/docker` into other containers can keep container filesystems busy and block removal.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Daemon not running or client aimed at the wrong host:**
|
||||
Unset an incorrect `DOCKER_HOST`, then start the daemon:
|
||||
```
|
||||
unset DOCKER_HOST
|
||||
systemctl restart docker
|
||||
```
|
||||
|
||||
**`daemon.json` conflicts with systemd flags:**
|
||||
Remove duplicate settings or create a systemd override so `dockerd` is started without conflicting flags.
|
||||
|
||||
**Permission denied on Docker socket:**
|
||||
Add the user to the `docker` group, then re-login:
|
||||
```
|
||||
usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
If `~/.docker/` was created by `sudo`, fix ownership:
|
||||
```
|
||||
sudo chown "$USER":"$USER" "$HOME/.docker" -R
|
||||
sudo chmod g+rwx "$HOME/.docker" -R
|
||||
```
|
||||
|
||||
**Container DNS broken:**
|
||||
Configure explicit DNS servers in `/etc/docker/daemon.json`, then restart Docker.
|
||||
|
||||
**Docker networking disappears after boot:**
|
||||
Stop the host network manager from managing Docker interfaces and confirm `net.ipv4.ip_forward=1`.
|
||||
|
||||
**OOM kills:**
|
||||
Treat this as host memory pressure first; reduce workload, add memory, or enforce container memory limits.
|
||||
|
||||
**Unable to remove filesystem:**
|
||||
Find the process holding the path open with `lsof`, then stop that process or the container bind-mounting `/var/lib/docker`.
|
||||
117
runbooks/kernel.md
Normal file
117
runbooks/kernel.md
Normal file
@@ -0,0 +1,117 @@
|
||||
---
|
||||
service: kernel
|
||||
symptoms: OOM kill, out of memory, high load average, kernel panic, segfault, soft lockup, CPU steal, system unresponsive, zombie processes, NMI watchdog
|
||||
tags: kernel, oom, memory, load, cpu, panic, dmesg, segfault, lockup, swap, zombie
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `Out of memory: Kill process <pid>` in dmesg — OOM killer fired
|
||||
- Load average far above CPU count — system overloaded or I/O blocked
|
||||
- `kernel: BUG: soft lockup` — CPU stuck in kernel code
|
||||
- `segfault at ...` in dmesg — process crashed due to invalid memory access
|
||||
- `kernel panic` — unrecoverable kernel error (visible only on console or serial)
|
||||
- Many zombie (`Z`) processes in `ps` output
|
||||
- High `%steal` in `top`/`vmstat` — hypervisor CPU contention
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Recent kernel messages
|
||||
|
||||
```
|
||||
dmesg -T | tail -100
|
||||
dmesg -T | grep -iE 'error|warn|oom|kill|panic|oops|fault|hung|lockup'
|
||||
journalctl -k -n 200
|
||||
```
|
||||
|
||||
### OOM events
|
||||
|
||||
```
|
||||
dmesg -T | grep -i 'out of memory\|oom_kill\|killed process'
|
||||
```
|
||||
|
||||
The log shows which process was killed, its RSS at time of kill, and available memory.
|
||||
|
||||
### Memory usage
|
||||
|
||||
```
|
||||
free -h
|
||||
cat /proc/meminfo | head -30
|
||||
vmstat -s
|
||||
```
|
||||
|
||||
`MemAvailable` is the key metric. If it is near zero and swap is also exhausted, OOM kills are imminent.
|
||||
|
||||
### Swap
|
||||
|
||||
```
|
||||
swapon --show
|
||||
cat /proc/swaps
|
||||
vmstat 1 5
|
||||
```
|
||||
|
||||
High `si`/`so` (swap-in/swap-out) in `vmstat` indicates active swapping and likely memory pressure.
|
||||
|
||||
### Load average and CPU
|
||||
|
||||
```
|
||||
uptime
|
||||
top -b -n1 | head -30
|
||||
mpstat -P ALL 1 3
|
||||
```
|
||||
|
||||
Load average above 2× CPU count sustained over 15 minutes is concerning.
|
||||
High `%iowait` indicates processes blocked on disk I/O, not CPU-bound load.
|
||||
|
||||
### Process memory usage
|
||||
|
||||
```
|
||||
ps aux --sort=-%mem | head -20
|
||||
ps aux --sort=-%cpu | head -20
|
||||
```
|
||||
|
||||
### Zombie processes
|
||||
|
||||
```
|
||||
ps aux | awk '$8=="Z"'
|
||||
```
|
||||
|
||||
Zombies cannot be killed; the parent must `wait()` for them or be killed itself.
|
||||
|
||||
### I/O wait and disk health
|
||||
|
||||
```
|
||||
iostat -x 1 3
|
||||
dmesg -T | grep -iE 'i/o error|hard resetting link|ata.*error|blk_update_request'
|
||||
```
|
||||
|
||||
Persistent I/O errors alongside high load suggest failing storage.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Memory pressure / frequent OOM kills:**
|
||||
Identify the largest memory consumers from `ps aux --sort=-%mem`.
|
||||
Consider increasing swap, adding RAM, tuning `vm.overcommit_memory`, or scaling the workload.
|
||||
Do NOT just raise `vm.overcommit_ratio` without understanding the root consumer.
|
||||
|
||||
**Adjust OOM killer scoring for critical services (temporary, resets on reboot):**
|
||||
```
|
||||
echo -17 > /proc/<pid>/oom_adj # legacy
|
||||
echo -1000 > /proc/<pid>/oom_score_adj # current kernels
|
||||
```
|
||||
|
||||
**Swap exhausted — add a swapfile:**
|
||||
```
|
||||
fallocate -l 2G /swapfile
|
||||
chmod 600 /swapfile
|
||||
mkswap /swapfile
|
||||
swapon /swapfile
|
||||
```
|
||||
|
||||
**High I/O wait — find the I/O-heavy process:**
|
||||
```
|
||||
iotop -a -o -b -n3
|
||||
```
|
||||
|
||||
**Zombie reaping — if parent is stuck:**
|
||||
Kill the parent process (it will reap children on exit), then verify zombies disappear.
|
||||
99
runbooks/nginx.md
Normal file
99
runbooks/nginx.md
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
service: nginx
|
||||
symptoms: 502 Bad Gateway, 504 Gateway Timeout, upstream connection refused, nginx not starting, failed to bind socket, permission denied reading config, configuration test failed
|
||||
tags: nginx, web, http, https, proxy, upstream, reverse-proxy, load-balancer
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `502 Bad Gateway` — nginx reached the upstream but got an invalid response, or upstream is down
|
||||
- `504 Gateway Timeout` — upstream took too long to respond
|
||||
- `111: Connection refused` in nginx error log — upstream process is not running or not on the expected port
|
||||
- `nginx.service: Start request repeated too quickly` — crash-loop; check error log
|
||||
- `[emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)` — port conflict
|
||||
- `[emerg] open() ... failed (13: Permission denied)` — file permission issue
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Service status
|
||||
|
||||
```
|
||||
systemctl status nginx
|
||||
```
|
||||
|
||||
### Config test
|
||||
|
||||
```
|
||||
nginx -t
|
||||
```
|
||||
|
||||
A config error is the most common reason for nginx failing to start or reload.
|
||||
|
||||
### Error log
|
||||
|
||||
```
|
||||
journalctl -u nginx -n 100
|
||||
tail -n 100 /var/log/nginx/error.log
|
||||
```
|
||||
|
||||
For 502/504 errors look for: `connect() failed`, `upstream timed out`, `no live upstreams`.
|
||||
|
||||
### Access log — recent requests
|
||||
|
||||
```
|
||||
tail -n 50 /var/log/nginx/access.log
|
||||
```
|
||||
|
||||
### Check upstream services
|
||||
|
||||
For `proxy_pass` targets, verify the upstream is running:
|
||||
```
|
||||
systemctl status <upstream-service>
|
||||
ss -tlnp | grep <upstream-port>
|
||||
```
|
||||
|
||||
Common upstreams: `gunicorn`, `uwsgi`, `node`, `puma`, `php-fpm`.
|
||||
|
||||
### Port binding conflicts
|
||||
|
||||
```
|
||||
ss -tlnp | grep ':80\|:443'
|
||||
```
|
||||
|
||||
### Config files
|
||||
|
||||
```
|
||||
cat /etc/nginx/nginx.conf
|
||||
ls /etc/nginx/sites-enabled/
|
||||
cat /etc/nginx/sites-enabled/<vhost>
|
||||
```
|
||||
|
||||
Check `proxy_pass`, `upstream` blocks, `proxy_connect_timeout`, `proxy_read_timeout`.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Upstream service not running:**
|
||||
Start the upstream service, then verify nginx resumes proxying.
|
||||
|
||||
**Config syntax error:**
|
||||
Fix the error shown by `nginx -t`, then:
|
||||
```
|
||||
systemctl reload nginx
|
||||
```
|
||||
|
||||
**Port already in use:**
|
||||
Find the conflicting process with `ss -tlnp | grep :80`, stop it, then restart nginx.
|
||||
|
||||
**Upstream timeouts — increase timeouts (caution: treat the slow upstream as the root cause):**
|
||||
```nginx
|
||||
proxy_connect_timeout 10s;
|
||||
proxy_read_timeout 60s;
|
||||
proxy_send_timeout 60s;
|
||||
```
|
||||
|
||||
**Permission denied on log or socket file:**
|
||||
```
|
||||
ls -la /var/log/nginx/
|
||||
ls -la /run/nginx.pid
|
||||
chown -R www-data:www-data /var/log/nginx/
|
||||
```
|
||||
107
runbooks/postgres.md
Normal file
107
runbooks/postgres.md
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
service: postgres
|
||||
symptoms: connection refused port 5432, FATAL password authentication failed, replication lag, disk full, out of shared memory, too many connections, relation does not exist, could not connect to the primary
|
||||
tags: postgres, postgresql, database, replication, pg, psql, disk, connections
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `could not connect to server: Connection refused` — postgres not running or not on port 5432
|
||||
- `FATAL: password authentication failed for user "<user>"` — wrong credentials or pg_hba mismatch
|
||||
- `FATAL: too many connections` — connection pool exhausted
|
||||
- `ERROR: could not resize shared memory segment` / `out of shared memory` — shared_buffers too high for system
|
||||
- `PANIC: could not write to file "pg_wal/..."` — disk full on WAL directory
|
||||
- Replication lag growing — standby falling behind primary
|
||||
- `FATAL: could not connect to the primary server` — standby cannot reach primary
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Service status
|
||||
|
||||
```
|
||||
systemctl status postgresql
|
||||
systemctl status postgresql@<version>-main
|
||||
```
|
||||
|
||||
### PostgreSQL logs
|
||||
|
||||
```
|
||||
journalctl -u postgresql -n 100
|
||||
tail -n 100 /var/log/postgresql/postgresql-*.log
|
||||
```
|
||||
|
||||
### Is postgres listening?
|
||||
|
||||
```
|
||||
ss -tlnp | grep 5432
|
||||
```
|
||||
|
||||
### Disk space (WAL and data directory are the critical paths)
|
||||
|
||||
```
|
||||
df -h
|
||||
du -sh /var/lib/postgresql/
|
||||
du -sh /var/lib/postgresql/*/main/pg_wal/
|
||||
```
|
||||
|
||||
A full disk on the pg_wal partition causes a PANIC and hard crash.
|
||||
|
||||
### Connection count
|
||||
|
||||
```sql
|
||||
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
|
||||
SELECT setting FROM pg_settings WHERE name = 'max_connections';
|
||||
```
|
||||
|
||||
### Replication lag (run on primary)
|
||||
|
||||
```sql
|
||||
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
|
||||
(sent_lsn - replay_lsn) AS lag_bytes
|
||||
FROM pg_stat_replication;
|
||||
```
|
||||
|
||||
### pg_hba.conf — authentication rules
|
||||
|
||||
```
|
||||
cat /etc/postgresql/*/main/pg_hba.conf
|
||||
```
|
||||
|
||||
Entries are matched top-to-bottom. `reject` or missing entry for the client IP causes auth failure even with correct credentials.
|
||||
|
||||
### Shared memory / kernel settings
|
||||
|
||||
```
|
||||
cat /proc/sys/kernel/shmmax
|
||||
cat /etc/postgresql/*/main/postgresql.conf | grep shared_buffers
|
||||
```
|
||||
|
||||
`shared_buffers` must not exceed ~40% of RAM; kernel `shmmax` must accommodate it.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Postgres not running:**
|
||||
```
|
||||
systemctl start postgresql
|
||||
```
|
||||
Check logs immediately after start for the failure reason.
|
||||
|
||||
**Authentication failure (pg_hba mismatch):**
|
||||
Add or update the correct entry in `pg_hba.conf`, then reload:
|
||||
```
|
||||
systemctl reload postgresql
|
||||
```
|
||||
|
||||
**Too many connections — increase limit (requires restart):**
|
||||
In `postgresql.conf`:
|
||||
```
|
||||
max_connections = 200
|
||||
```
|
||||
Or deploy a connection pooler (`pgbouncer`).
|
||||
|
||||
**Disk full on WAL:**
|
||||
Identify and remove old base backups or archived WAL segments under `/var/lib/postgresql/*/main/pg_wal/`.
|
||||
Do NOT delete pg_wal files directly — use `pg_archivecleanup` or let archiving catch up.
|
||||
|
||||
**Replication lag — standby too far behind:**
|
||||
Check network bandwidth and I/O on standby. If `wal_receiver_status_interval` lag is large, increase `wal_sender_timeout` temporarily.
|
||||
112
runbooks/selinux.md
Normal file
112
runbooks/selinux.md
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
service: selinux
|
||||
symptoms: permission denied despite correct unix permissions, service blocked by selinux, avc denied, file context mismatch, port binding denied, boolean missing, domain transition failure
|
||||
tags: selinux, avc, enforcing, security, policy, restorecon, audit, sealert, semanage
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Service gets `Permission denied` even though file ownership and mode look correct
|
||||
- Process cannot bind to a port or open a file after a config change
|
||||
- AVC denials appear in audit logs
|
||||
- App works when SELinux is permissive but fails in enforcing mode
|
||||
- Newly created files under custom paths are inaccessible to a confined service
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Confirm SELinux mode and policy
|
||||
|
||||
```
|
||||
getenforce
|
||||
sestatus
|
||||
cat /etc/selinux/config
|
||||
```
|
||||
|
||||
If SELinux is `Permissive`, denials are logged but not enforced.
|
||||
|
||||
### Check AVC denials
|
||||
|
||||
```
|
||||
auditctl -s
|
||||
ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts recent
|
||||
journalctl -t setroubleshoot -n 50
|
||||
dmesg | grep -i -e type=1300 -e type=1400
|
||||
```
|
||||
|
||||
AVC denials are the primary source of truth for SELinux policy failures.
|
||||
|
||||
If AVCs are missing but SELinux still appears involved, temporarily disable `dontaudit` rules to expose hidden denials:
|
||||
```
|
||||
semodule -DB
|
||||
```
|
||||
Re-enable them after reproducing the issue:
|
||||
```
|
||||
semodule -B
|
||||
```
|
||||
|
||||
### Inspect file contexts
|
||||
|
||||
```
|
||||
ls -lZ /path/to/file
|
||||
ps -eZ | grep <service>
|
||||
matchpathcon -V /path/to/file
|
||||
```
|
||||
|
||||
A service can have correct Unix permissions and still fail if the SELinux context is wrong.
|
||||
|
||||
### Check port labeling and booleans
|
||||
|
||||
```
|
||||
semanage port -l | grep <port>
|
||||
getsebool -a | grep <service-or-feature>
|
||||
semanage boolean -l | grep <service-or-feature>
|
||||
```
|
||||
|
||||
Custom ports often require explicit SELinux port labels.
|
||||
|
||||
### Check for relabeling needs
|
||||
|
||||
```
|
||||
restorecon -nRv /path
|
||||
matchpathcon /path/to/file
|
||||
sealert -l "*"
|
||||
```
|
||||
|
||||
`restorecon -n` shows what would change without modifying labels.
|
||||
|
||||
`sealert` is often the fastest way to turn a raw AVC into a concrete fix, but treat `audit2allow` suggestions as a last resort, not a first response.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Wrong file context:**
|
||||
Restore the default context:
|
||||
```
|
||||
restorecon -Rv /path
|
||||
```
|
||||
|
||||
**Custom application path needs persistent labeling:**
|
||||
```
|
||||
semanage fcontext -a -t <type> '/custom/path(/.*)?'
|
||||
restorecon -Rv /custom/path
|
||||
```
|
||||
|
||||
**Custom port binding denied:**
|
||||
Add the port label required by the service type:
|
||||
```
|
||||
semanage port -a -t <port_type> -p tcp <port>
|
||||
```
|
||||
|
||||
**Boolean disabled:**
|
||||
Enable the needed boolean persistently:
|
||||
```
|
||||
setsebool -P <boolean_name> on
|
||||
```
|
||||
|
||||
**Still unsure whether SELinux is the blocker:**
|
||||
Temporarily switch to permissive mode and reproduce the issue:
|
||||
```
|
||||
setenforce 0
|
||||
```
|
||||
If the problem still occurs, SELinux is not the root cause.
|
||||
|
||||
Do not disable SELinux or generate custom policy modules as a first response. Fix labels, booleans, or port mappings first.
|
||||
100
runbooks/ssh.md
Normal file
100
runbooks/ssh.md
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
service: ssh
|
||||
symptoms: connection refused, authentication failed, host key mismatch, permission denied, timeout connecting, no route to host
|
||||
tags: ssh, sshd, openssh, authentication, network, connectivity
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `ssh: connect to host <hostname> port 22: Connection refused`
|
||||
- `Permission denied (publickey)` — key not accepted or wrong user
|
||||
- `WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!` — host key mismatch
|
||||
- `Connection timed out` — firewall blocking or host unreachable
|
||||
- `No route to host` — routing issue or host is down
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Is sshd running?
|
||||
|
||||
```
|
||||
systemctl status sshd
|
||||
systemctl status ssh
|
||||
```
|
||||
|
||||
A stopped or failed sshd is the most common cause of "connection refused".
|
||||
|
||||
### Check sshd configuration
|
||||
|
||||
```
|
||||
sshd -t
|
||||
cat /etc/ssh/sshd_config
|
||||
```
|
||||
|
||||
Look for: `PasswordAuthentication`, `PubkeyAuthentication yes`, `AuthorizedKeysFile`.
|
||||
|
||||
### Check authorised keys
|
||||
|
||||
```
|
||||
ls -la ~/.ssh/
|
||||
cat ~/.ssh/authorized_keys
|
||||
```
|
||||
|
||||
Permissions must be: `~/.ssh` → `700`, `authorized_keys` → `600`.
|
||||
Wrong permissions cause silent auth failure even with the correct key.
|
||||
|
||||
### Check sshd logs
|
||||
|
||||
```
|
||||
journalctl -u sshd -n 100
|
||||
journalctl -u ssh -n 100
|
||||
grep sshd /var/log/auth.log | tail -50
|
||||
```
|
||||
|
||||
Look for: `Invalid user`, `Failed publickey`, `Connection reset by peer`, `Too many authentication failures`.
|
||||
|
||||
### Check listening port
|
||||
|
||||
```
|
||||
ss -tlnp | grep sshd
|
||||
netstat -tlnp | grep :22
|
||||
```
|
||||
|
||||
If sshd is running but not listening on the expected port, check `Port` in `/etc/ssh/sshd_config`.
|
||||
|
||||
### Firewall rules
|
||||
|
||||
```
|
||||
iptables -L INPUT -n -v
|
||||
nft list ruleset
|
||||
ufw status verbose
|
||||
```
|
||||
|
||||
A DROP rule on port 22 causes silent timeouts, not "connection refused".
|
||||
|
||||
## Remediation
|
||||
|
||||
**sshd not running:**
|
||||
```
|
||||
systemctl enable --now sshd
|
||||
```
|
||||
|
||||
**Wrong permissions on authorized_keys:**
|
||||
```
|
||||
chmod 700 ~/.ssh
|
||||
chmod 600 ~/.ssh/authorized_keys
|
||||
chown -R $USER:$USER ~/.ssh
|
||||
```
|
||||
|
||||
**sshd config error:**
|
||||
Fix the error reported by `sshd -t`, then:
|
||||
```
|
||||
systemctl restart sshd
|
||||
```
|
||||
|
||||
**Host key mismatch (expected after reinstall/reprovisioning):**
|
||||
Remove the old key from the client:
|
||||
```
|
||||
ssh-keygen -R <hostname>
|
||||
```
|
||||
Only do this if you are certain the host was intentionally reprovisioned.
|
||||
If the key change is unexpected, treat as a potential MITM and investigate before connecting.
|
||||
115
runbooks/sssd.md
Normal file
115
runbooks/sssd.md
Normal file
@@ -0,0 +1,115 @@
|
||||
---
|
||||
service: sssd
|
||||
symptoms: login denied, user not found, id command hangs, sudo rules missing, ldap auth failure, kerberos failure, cache stale, offline authentication not working
|
||||
tags: sssd, ldap, kerberos, ad, identity, auth, pam, nss, sudo
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `id <user>` hangs or returns no such user for a domain account
|
||||
- SSH or console login fails for directory-backed users
|
||||
- Group membership is missing or incomplete
|
||||
- `sudo` rules from LDAP/AD do not appear
|
||||
- Authentication works intermittently or only after cache flush
|
||||
- Offline authentication fails when the directory is unreachable
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check service health
|
||||
|
||||
```
|
||||
systemctl status sssd
|
||||
sssctl domain-list
|
||||
sssctl config-check
|
||||
cat /etc/nsswitch.conf
|
||||
```
|
||||
|
||||
A running daemon with a valid config and `sss` present in `nsswitch.conf` are the first prerequisites.
|
||||
|
||||
### Check identity resolution
|
||||
|
||||
```
|
||||
id <user>
|
||||
getent passwd <user>
|
||||
getent group <group>
|
||||
```
|
||||
|
||||
If NSS lookups fail, the issue is often in SSSD configuration, connectivity, or cache.
|
||||
|
||||
### Check SSSD logs
|
||||
|
||||
```
|
||||
journalctl -u sssd -n 100
|
||||
ls -la /var/log/sssd/
|
||||
tail -n 100 /var/log/sssd/*.log
|
||||
sssctl logs-fetch
|
||||
```
|
||||
|
||||
Look for: backend offline, LDAP bind failures, Kerberos errors, TLS problems, and access provider denials.
|
||||
|
||||
If the issue is unclear, raise `debug_level=6` in the relevant `[nss]`, `[pam]`, and `[domain/<name>]` sections. Raising debug only in `[sssd]` is not enough for most real failures.
|
||||
|
||||
### Check domain reachability
|
||||
|
||||
```
|
||||
sssctl domain-status <domain>
|
||||
ping <ldap-or-ad-host>
|
||||
dig -t SRV _ldap._tcp.<domain>
|
||||
cat /etc/resolv.conf
|
||||
```
|
||||
|
||||
If the identity provider is unreachable, SSSD may serve cached data only or fail entirely.
|
||||
|
||||
### Check Kerberos and LDAP configuration
|
||||
|
||||
```
|
||||
cat /etc/sssd/sssd.conf
|
||||
cat /etc/krb5.conf
|
||||
kinit <user>
|
||||
klist
|
||||
ldapsearch -ZZ -x -H ldap://<server> -b <base-dn>
|
||||
```
|
||||
|
||||
Look for wrong realm names, bad server addresses, TLS settings, and access filters.
|
||||
|
||||
For AD or IPA providers, Kerberos and DNS are often the real dependency chain: broken SRV lookup, keytab issues, or a slow KDC will surface as SSSD failures.
|
||||
|
||||
### Check cache and permissions
|
||||
|
||||
```
|
||||
ls -la /var/lib/sss/db/
|
||||
sssctl cache-status
|
||||
sssctl cache-expire -E
|
||||
```
|
||||
|
||||
`/etc/sssd/sssd.conf` must usually be mode `600` or SSSD will refuse to start.
|
||||
|
||||
Do not wipe cache files blindly on an offline system that depends on cached logins.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Config syntax or permission issue:**
|
||||
Fix `sssd.conf`, set secure permissions, then restart:
|
||||
```
|
||||
chmod 600 /etc/sssd/sssd.conf
|
||||
systemctl restart sssd
|
||||
```
|
||||
|
||||
**Stale cache:**
|
||||
Clear cache carefully, then repopulate with a fresh lookup:
|
||||
```
|
||||
sss_cache -E
|
||||
id <user>
|
||||
```
|
||||
|
||||
**Kerberos failure:**
|
||||
Validate time sync, realm, keytab credentials, and KDC reachability before changing LDAP settings.
|
||||
|
||||
**Backend offline or `sdap_async_sys_connect request failed`:**
|
||||
Treat as DNS/network first. Validate SRV records and TLS handshake before increasing `ldap_network_timeout` or `ldap_search_timeout`.
|
||||
|
||||
**Access denied despite successful lookup:**
|
||||
Check `access_provider`, LDAP filters, HBAC rules, or AD group-based access restrictions.
|
||||
|
||||
**No `pam_sss` messages at all:**
|
||||
The PAM stack is likely misconfigured. Fix the PAM/authselect profile before changing SSSD itself.
|
||||
89
runbooks/wayland.md
Normal file
89
runbooks/wayland.md
Normal file
@@ -0,0 +1,89 @@
|
||||
---
|
||||
service: wayland
|
||||
symptoms: wayland session fails, gdm falls back to xorg, black screen on login, fractional scaling broken, screen sharing broken, remote desktop broken, wlroots crash, compositor crash
|
||||
tags: wayland, compositor, gnome, kde, mutter, wlroots, pipewire, xwayland, graphics
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- User selects a Wayland session but is returned to login
|
||||
- GDM or another display manager falls back to Xorg
|
||||
- Screen sharing, remote desktop, or clipboard integration is broken
|
||||
- Apps requiring XWayland fail while native Wayland apps work
|
||||
- Fractional scaling or multi-monitor layout behaves incorrectly
|
||||
- Wayland compositor crashes after login
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Confirm the active session type
|
||||
|
||||
```
|
||||
echo $XDG_SESSION_TYPE
|
||||
loginctl show-session $XDG_SESSION_ID -p Type
|
||||
echo $WAYLAND_DISPLAY
|
||||
```
|
||||
|
||||
If the session type is `x11`, you are not debugging an active Wayland session.
|
||||
|
||||
### Check display manager and compositor logs
|
||||
|
||||
```
|
||||
systemctl status gdm
|
||||
journalctl -b | grep -iE 'wayland|mutter|kwin|wlroots|xwayland'
|
||||
journalctl -b | grep -i 'renderer for'
|
||||
```
|
||||
|
||||
Look for compositor crashes, GPU driver incompatibilities, and forced Xorg fallback messages.
|
||||
|
||||
### Check XWayland and PipeWire components
|
||||
|
||||
```
|
||||
which Xwayland
|
||||
systemctl --user status pipewire
|
||||
systemctl --user status xdg-desktop-portal
|
||||
systemctl --user status xdg-desktop-portal-gnome
|
||||
systemctl --user status xdg-desktop-portal-kde
|
||||
xlsclients -l
|
||||
```
|
||||
|
||||
Broken screen sharing is often a PipeWire or portal issue, not a compositor issue.
|
||||
|
||||
`xlsclients -l` helps identify apps that are actually running under XWayland rather than native Wayland.
|
||||
|
||||
### Check GPU compatibility
|
||||
|
||||
```
|
||||
lspci -k | grep -A3 -E 'VGA|3D|Display'
|
||||
lsmod | grep -E 'nvidia|nouveau|amdgpu|i915'
|
||||
```
|
||||
|
||||
Wayland support quality depends heavily on the GPU driver stack.
|
||||
|
||||
### Check environment and session overrides
|
||||
|
||||
```
|
||||
env | grep -E 'WAYLAND|XDG|GDK_BACKEND|QT_QPA_PLATFORM'
|
||||
cat /etc/gdm/custom.conf
|
||||
wayland-info
|
||||
```
|
||||
|
||||
Environment overrides can force apps onto X11 or disable Wayland entirely.
|
||||
|
||||
For NVIDIA systems, confirm the compositor is using a supported buffer path (GBM on current drivers is the expected default).
|
||||
|
||||
## Remediation
|
||||
|
||||
**Wayland disabled in display manager config:**
|
||||
Check `WaylandEnable=false` or similar settings and remove the override if unintended.
|
||||
|
||||
**Fallback to Xorg on unsupported GPU stack:**
|
||||
Upgrade or change the graphics driver; Wayland stability is often limited by the driver, not the compositor.
|
||||
|
||||
**Screen sharing broken:**
|
||||
Fix PipeWire and `xdg-desktop-portal` services before changing compositor settings.
|
||||
|
||||
**XWayland-only app failures:**
|
||||
Treat them separately from native Wayland issues; confirm `Xwayland` is installed and launching.
|
||||
|
||||
**Remote desktop, VM, or game input grabbing is broken:**
|
||||
This is often a Wayland protocol/compositor support limitation, not a generic keyboard bug. Check compositor support for pointer constraints, relative pointer, and keyboard shortcut inhibit protocols.
|
||||
106
runbooks/x2go.md
Normal file
106
runbooks/x2go.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
service: x2go
|
||||
symptoms: x2go session fails to start, x2go black screen, x2go disconnects immediately, no desktop in session, authentication failure, x2go agent not starting, sound forwarding broken
|
||||
tags: x2go, nx, remote-desktop, x2goserver, x2goclient, session, desktop, xauth
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- X2Go login succeeds but the session immediately disconnects
|
||||
- Black screen after login
|
||||
- Session is created but no desktop appears
|
||||
- `x2goruncommand error` or `X2Go Agent got stuck in state`
|
||||
- Sound, clipboard, or drive sharing fails while login itself works
|
||||
- Authentication works over SSH but X2Go session startup fails
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check X2Go services and packages
|
||||
|
||||
```
|
||||
systemctl status x2goserver
|
||||
systemctl status sshd
|
||||
rpm -qa | grep x2go
|
||||
apt list --installed | grep x2go
|
||||
which x2golistsessions
|
||||
```
|
||||
|
||||
X2Go depends on working SSH plus installed `x2goserver` and `x2goserver-xsession` components.
|
||||
|
||||
### Check X2Go logs
|
||||
|
||||
```
|
||||
journalctl -u x2goserver -n 100
|
||||
journalctl -u sshd -n 100
|
||||
ls -la ~/.x2go/
|
||||
find ~/.x2go -type f -maxdepth 2 -print
|
||||
x2golistsessions
|
||||
```
|
||||
|
||||
Look for session startup failures, agent crashes, and auth helper errors.
|
||||
|
||||
### Check desktop environment startup command
|
||||
|
||||
```
|
||||
cat /etc/x2go/Xsession
|
||||
cat ~/.xsession
|
||||
cat ~/.Xclients
|
||||
```
|
||||
|
||||
A missing or broken desktop session command is a common cause of black screens.
|
||||
|
||||
### Check X11 and xauth availability
|
||||
|
||||
```
|
||||
which xauth
|
||||
xauth -V
|
||||
ls -la ~/.Xauthority
|
||||
which sshfs
|
||||
```
|
||||
|
||||
X2Go requires a working X11 session setup. Missing `xauth` or a bad `.Xauthority` often breaks startup.
|
||||
|
||||
Filesystem and folder-sharing features may also depend on `sshfs` being installed.
|
||||
|
||||
### Check session limits and stale sessions
|
||||
|
||||
```
|
||||
x2golistsessions
|
||||
x2gocleansessions
|
||||
ulimit -a
|
||||
loginctl list-sessions
|
||||
```
|
||||
|
||||
Stale sessions or per-user process limits can prevent a new desktop from starting.
|
||||
|
||||
### Check desktop dependencies
|
||||
|
||||
```
|
||||
which startxfce4
|
||||
which mate-session
|
||||
which startplasma-x11
|
||||
env | grep -E 'DESKTOP|XDG'
|
||||
```
|
||||
|
||||
If the selected desktop command does not exist, X2Go may connect and then terminate immediately.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Missing or broken desktop startup command:**
|
||||
Set the session to a known-good desktop such as XFCE and verify the binary exists.
|
||||
|
||||
**Corrupt Xauthority or stale X2Go session files:**
|
||||
Remove stale session state and regenerate auth files:
|
||||
```
|
||||
rm -f ~/.Xauthority
|
||||
rm -rf ~/.x2go/C-*
|
||||
```
|
||||
|
||||
**Missing `xauth` or X11 helpers:**
|
||||
Install the missing X11 packages, then retry the session.
|
||||
|
||||
**Required server packages missing:**
|
||||
Install `x2goserver` and `x2goserver-xsession` first, then retry before debugging desktop startup.
|
||||
|
||||
**SSH works but X2Go session fails:**
|
||||
Treat it as a desktop startup or X11 auth problem, not an SSH transport problem.
|
||||
94
runbooks/xorg.md
Normal file
94
runbooks/xorg.md
Normal file
@@ -0,0 +1,94 @@
|
||||
---
|
||||
service: xorg
|
||||
symptoms: xorg black screen, display manager loop, no screens found, failed to start X server, GPU driver error, xrandr missing outputs, login screen not appearing
|
||||
tags: xorg, x11, display, gpu, drm, xrandr, gdm, sddm, lightdm
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Black screen after graphical boot
|
||||
- Display manager loops back to login
|
||||
- `no screens found` in Xorg log
|
||||
- External monitors are missing or not detected
|
||||
- X server fails after a driver update
|
||||
- `startx` exits immediately with display or device errors
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check display manager and Xorg service path
|
||||
|
||||
```
|
||||
systemctl status display-manager
|
||||
systemctl status gdm
|
||||
systemctl status sddm
|
||||
systemctl status lightdm
|
||||
```
|
||||
|
||||
If the display manager is failing, inspect its logs before focusing on Xorg itself.
|
||||
|
||||
### Check Xorg logs
|
||||
|
||||
```
|
||||
find /var/log -name 'Xorg*.log' -o -name 'Xorg.*.log'
|
||||
grep -E '\(EE\)|\(WW\)' /var/log/Xorg.0.log
|
||||
journalctl -b | grep -iE 'xorg|gdm|sddm|lightdm'
|
||||
ls -la ~/.local/share/xorg/
|
||||
```
|
||||
|
||||
Look for: `no screens found`, GPU module load failures, and permission/device access errors.
|
||||
|
||||
On rootless Xorg, logs are often under `~/.local/share/xorg/Xorg.0.log` instead of `/var/log/`.
|
||||
|
||||
### Check DRM and GPU driver state
|
||||
|
||||
```
|
||||
lspci -k | grep -A3 -E 'VGA|3D|Display'
|
||||
lsmod | grep -E 'nouveau|nvidia|amdgpu|i915'
|
||||
dmesg | grep -iE 'drm|gpu|nvidia|amdgpu|i915'
|
||||
```
|
||||
|
||||
Driver mismatches after kernel updates are a common cause of X startup failures.
|
||||
|
||||
### Check monitor detection and permissions
|
||||
|
||||
```
|
||||
loginctl session-status
|
||||
xrandr --query
|
||||
ls -la /dev/dri/
|
||||
ps -o user= -C Xorg
|
||||
```
|
||||
|
||||
If `/dev/dri/*` permissions or seat assignment are wrong, X may fail to access the GPU.
|
||||
|
||||
### Check X configuration files
|
||||
|
||||
```
|
||||
find /etc/X11 -maxdepth 3 -type f
|
||||
cat /etc/X11/xorg.conf
|
||||
cat /etc/X11/xorg.conf.d/*.conf
|
||||
ls -la ~/.xinitrc ~/.xserverrc
|
||||
```
|
||||
|
||||
Custom `Device`, `Monitor`, or `Screen` sections often break auto-detection.
|
||||
|
||||
An empty or broken `.xinitrc` can produce a black screen even when the X server itself started correctly.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Bad static Xorg config:**
|
||||
Move custom config aside and let auto-detection work unless the hardware truly needs manual config.
|
||||
|
||||
**Driver mismatch after update:**
|
||||
Reinstall the GPU driver package matching the running kernel and reboot or restart the display manager.
|
||||
|
||||
**`no screens found`:**
|
||||
Check whether the correct DRM module loaded and whether the display manager is running on the expected seat.
|
||||
|
||||
**Display manager loop:**
|
||||
Correlate Xorg errors with PAM/auth logs; some loops are session startup failures, not graphics failures.
|
||||
|
||||
**Framebuffer mode failure:**
|
||||
If X falls back to `fbdev` and errors with framebuffer/bus ID messages, remove the generic `fbdev` driver package and let Xorg use the proper modesetting or vendor driver.
|
||||
|
||||
**`SocketCreateListener() failed`:**
|
||||
Check for stale sockets in `/tmp/.X11-unix`, especially after previous root-run Xorg sessions.
|
||||
Reference in New Issue
Block a user