feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
120
runbooks/docker.md
Normal file
120
runbooks/docker.md
Normal file
@@ -0,0 +1,120 @@
|
||||
---
|
||||
service: docker
|
||||
symptoms: cannot connect to docker daemon, docker daemon failed to start, docker socket permission denied, containers cannot resolve dns, docker network broken, daemon.json conflict, docker oom, unable to remove filesystem
|
||||
tags: docker, dockerd, containerd, container, daemon, daemon.json, cgroup, dns, docker0, socket, compose
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `Cannot connect to the Docker daemon. Is the docker daemon running on this host?`
|
||||
- `permission denied` on `/var/run/docker.sock`
|
||||
- `dockerd` fails to start after a `daemon.json` change
|
||||
- Containers cannot resolve DNS or pull images
|
||||
- Docker bridge/network disappears or container networking breaks after boot
|
||||
- Container or daemon is killed by the kernel OOM killer
|
||||
- `Error: Unable to remove filesystem` when removing a container
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check daemon health and client target
|
||||
|
||||
```
|
||||
docker info
|
||||
systemctl is-active docker
|
||||
systemctl status docker
|
||||
ps -ef | grep dockerd
|
||||
env | grep DOCKER_HOST
|
||||
```
|
||||
|
||||
If `DOCKER_HOST` is set incorrectly, the CLI may be talking to the wrong daemon.
|
||||
|
||||
### Check daemon logs and startup failures
|
||||
|
||||
```
|
||||
journalctl -u docker -n 200
|
||||
journalctl -u containerd -n 100
|
||||
cat /etc/docker/daemon.json
|
||||
systemctl cat docker
|
||||
```
|
||||
|
||||
Look for conflicts between `daemon.json` keys and systemd startup flags, especially duplicate `hosts` settings.
|
||||
|
||||
### Check socket permissions and group access
|
||||
|
||||
```
|
||||
ls -la /var/run/docker.sock
|
||||
id
|
||||
getent group docker
|
||||
ls -la ~/.docker/
|
||||
```
|
||||
|
||||
If the user was added to the `docker` group recently, a new login shell may be required.
|
||||
|
||||
### Check kernel, cgroups, and memory pressure
|
||||
|
||||
```
|
||||
uname -r
|
||||
free -h
|
||||
dmesg | grep -i -E 'docker|cgroup|oom|killed process'
|
||||
```
|
||||
|
||||
Low memory, missing kernel features, or cgroup issues can stop containers or the daemon.
|
||||
|
||||
### Check Docker networking and DNS
|
||||
|
||||
```
|
||||
docker network ls
|
||||
ip addr show docker0
|
||||
sysctl net.ipv4.ip_forward
|
||||
cat /etc/resolv.conf
|
||||
ps aux | grep dnsmasq
|
||||
```
|
||||
|
||||
Loopback DNS resolvers in `/etc/resolv.conf` often break container DNS unless Docker is given explicit nameservers.
|
||||
|
||||
### Check storage and stuck mounts
|
||||
|
||||
```
|
||||
df -h /var/lib/docker
|
||||
docker system df
|
||||
lsof /var/lib/docker
|
||||
```
|
||||
|
||||
Bind-mounting `/var/lib/docker` into other containers can keep container filesystems busy and block removal.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Daemon not running or client aimed at the wrong host:**
|
||||
Unset an incorrect `DOCKER_HOST`, then start the daemon:
|
||||
```
|
||||
unset DOCKER_HOST
|
||||
systemctl restart docker
|
||||
```
|
||||
|
||||
**`daemon.json` conflicts with systemd flags:**
|
||||
Remove duplicate settings or create a systemd override so `dockerd` is started without conflicting flags.
|
||||
|
||||
**Permission denied on Docker socket:**
|
||||
Add the user to the `docker` group, then re-login:
|
||||
```
|
||||
usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
If `~/.docker/` was created by `sudo`, fix ownership:
|
||||
```
|
||||
sudo chown "$USER":"$USER" "$HOME/.docker" -R
|
||||
sudo chmod g+rwx "$HOME/.docker" -R
|
||||
```
|
||||
|
||||
**Container DNS broken:**
|
||||
Configure explicit DNS servers in `/etc/docker/daemon.json`, then restart Docker.
|
||||
|
||||
**Docker networking disappears after boot:**
|
||||
Stop the host network manager from managing Docker interfaces and confirm `net.ipv4.ip_forward=1`.
|
||||
|
||||
**OOM kills:**
|
||||
Treat this as host memory pressure first; reduce workload, add memory, or enforce container memory limits.
|
||||
|
||||
**Unable to remove filesystem:**
|
||||
Find the process holding the path open with `lsof`, then stop that process or the container bind-mounting `/var/lib/docker`.
|
||||
Reference in New Issue
Block a user