Files
tai/runbooks/docker.md
zphinx 57f4c0efaa
Some checks failed
CI / test (push) Failing after 15s
feat: complete RAG runbook workflow and release docs
2026-05-06 04:48:41 +02:00

120 lines
3.3 KiB
Markdown

---
service: docker
symptoms: cannot connect to docker daemon, docker daemon failed to start, docker socket permission denied, containers cannot resolve dns, docker network broken, daemon.json conflict, docker oom, unable to remove filesystem
tags: docker, dockerd, containerd, container, daemon, daemon.json, cgroup, dns, docker0, socket, compose
---
## Symptoms
- `Cannot connect to the Docker daemon. Is the docker daemon running on this host?`
- `permission denied` on `/var/run/docker.sock`
- `dockerd` fails to start after a `daemon.json` change
- Containers cannot resolve DNS or pull images
- Docker bridge/network disappears or container networking breaks after boot
- Container or daemon is killed by the kernel OOM killer
- `Error: Unable to remove filesystem` when removing a container
## Diagnostics
### Check daemon health and client target
```
docker info
systemctl is-active docker
systemctl status docker
ps -ef | grep dockerd
env | grep DOCKER_HOST
```
If `DOCKER_HOST` is set incorrectly, the CLI may be talking to the wrong daemon.
### Check daemon logs and startup failures
```
journalctl -u docker -n 200
journalctl -u containerd -n 100
cat /etc/docker/daemon.json
systemctl cat docker
```
Look for conflicts between `daemon.json` keys and systemd startup flags, especially duplicate `hosts` settings.
### Check socket permissions and group access
```
ls -la /var/run/docker.sock
id
getent group docker
ls -la ~/.docker/
```
If the user was added to the `docker` group recently, a new login shell may be required.
### Check kernel, cgroups, and memory pressure
```
uname -r
free -h
dmesg | grep -i -E 'docker|cgroup|oom|killed process'
```
Low memory, missing kernel features, or cgroup issues can stop containers or the daemon.
### Check Docker networking and DNS
```
docker network ls
ip addr show docker0
sysctl net.ipv4.ip_forward
cat /etc/resolv.conf
ps aux | grep dnsmasq
```
Loopback DNS resolvers in `/etc/resolv.conf` often break container DNS unless Docker is given explicit nameservers.
### Check storage and stuck mounts
```
df -h /var/lib/docker
docker system df
lsof /var/lib/docker
```
Bind-mounting `/var/lib/docker` into other containers can keep container filesystems busy and block removal.
## Remediation
**Daemon not running or client aimed at the wrong host:**
Unset an incorrect `DOCKER_HOST`, then start the daemon:
```
unset DOCKER_HOST
systemctl restart docker
```
**`daemon.json` conflicts with systemd flags:**
Remove duplicate settings or create a systemd override so `dockerd` is started without conflicting flags.
**Permission denied on Docker socket:**
Add the user to the `docker` group, then re-login:
```
usermod -aG docker $USER
newgrp docker
```
If `~/.docker/` was created by `sudo`, fix ownership:
```
sudo chown "$USER":"$USER" "$HOME/.docker" -R
sudo chmod g+rwx "$HOME/.docker" -R
```
**Container DNS broken:**
Configure explicit DNS servers in `/etc/docker/daemon.json`, then restart Docker.
**Docker networking disappears after boot:**
Stop the host network manager from managing Docker interfaces and confirm `net.ipv4.ip_forward=1`.
**OOM kills:**
Treat this as host memory pressure first; reduce workload, add memory, or enforce container memory limits.
**Unable to remove filesystem:**
Find the process holding the path open with `lsof`, then stop that process or the container bind-mounting `/var/lib/docker`.