120 lines
3.3 KiB
Markdown
120 lines
3.3 KiB
Markdown
---
|
|
service: docker
|
|
symptoms: cannot connect to docker daemon, docker daemon failed to start, docker socket permission denied, containers cannot resolve dns, docker network broken, daemon.json conflict, docker oom, unable to remove filesystem
|
|
tags: docker, dockerd, containerd, container, daemon, daemon.json, cgroup, dns, docker0, socket, compose
|
|
---
|
|
|
|
## Symptoms
|
|
|
|
- `Cannot connect to the Docker daemon. Is the docker daemon running on this host?`
|
|
- `permission denied` on `/var/run/docker.sock`
|
|
- `dockerd` fails to start after a `daemon.json` change
|
|
- Containers cannot resolve DNS or pull images
|
|
- Docker bridge/network disappears or container networking breaks after boot
|
|
- Container or daemon is killed by the kernel OOM killer
|
|
- `Error: Unable to remove filesystem` when removing a container
|
|
|
|
## Diagnostics
|
|
|
|
### Check daemon health and client target
|
|
|
|
```
|
|
docker info
|
|
systemctl is-active docker
|
|
systemctl status docker
|
|
ps -ef | grep dockerd
|
|
env | grep DOCKER_HOST
|
|
```
|
|
|
|
If `DOCKER_HOST` is set incorrectly, the CLI may be talking to the wrong daemon.
|
|
|
|
### Check daemon logs and startup failures
|
|
|
|
```
|
|
journalctl -u docker -n 200
|
|
journalctl -u containerd -n 100
|
|
cat /etc/docker/daemon.json
|
|
systemctl cat docker
|
|
```
|
|
|
|
Look for conflicts between `daemon.json` keys and systemd startup flags, especially duplicate `hosts` settings.
|
|
|
|
### Check socket permissions and group access
|
|
|
|
```
|
|
ls -la /var/run/docker.sock
|
|
id
|
|
getent group docker
|
|
ls -la ~/.docker/
|
|
```
|
|
|
|
If the user was added to the `docker` group recently, a new login shell may be required.
|
|
|
|
### Check kernel, cgroups, and memory pressure
|
|
|
|
```
|
|
uname -r
|
|
free -h
|
|
dmesg | grep -i -E 'docker|cgroup|oom|killed process'
|
|
```
|
|
|
|
Low memory, missing kernel features, or cgroup issues can stop containers or the daemon.
|
|
|
|
### Check Docker networking and DNS
|
|
|
|
```
|
|
docker network ls
|
|
ip addr show docker0
|
|
sysctl net.ipv4.ip_forward
|
|
cat /etc/resolv.conf
|
|
ps aux | grep dnsmasq
|
|
```
|
|
|
|
Loopback DNS resolvers in `/etc/resolv.conf` often break container DNS unless Docker is given explicit nameservers.
|
|
|
|
### Check storage and stuck mounts
|
|
|
|
```
|
|
df -h /var/lib/docker
|
|
docker system df
|
|
lsof /var/lib/docker
|
|
```
|
|
|
|
Bind-mounting `/var/lib/docker` into other containers can keep container filesystems busy and block removal.
|
|
|
|
## Remediation
|
|
|
|
**Daemon not running or client aimed at the wrong host:**
|
|
Unset an incorrect `DOCKER_HOST`, then start the daemon:
|
|
```
|
|
unset DOCKER_HOST
|
|
systemctl restart docker
|
|
```
|
|
|
|
**`daemon.json` conflicts with systemd flags:**
|
|
Remove duplicate settings or create a systemd override so `dockerd` is started without conflicting flags.
|
|
|
|
**Permission denied on Docker socket:**
|
|
Add the user to the `docker` group, then re-login:
|
|
```
|
|
usermod -aG docker $USER
|
|
newgrp docker
|
|
```
|
|
|
|
If `~/.docker/` was created by `sudo`, fix ownership:
|
|
```
|
|
sudo chown "$USER":"$USER" "$HOME/.docker" -R
|
|
sudo chmod g+rwx "$HOME/.docker" -R
|
|
```
|
|
|
|
**Container DNS broken:**
|
|
Configure explicit DNS servers in `/etc/docker/daemon.json`, then restart Docker.
|
|
|
|
**Docker networking disappears after boot:**
|
|
Stop the host network manager from managing Docker interfaces and confirm `net.ipv4.ip_forward=1`.
|
|
|
|
**OOM kills:**
|
|
Treat this as host memory pressure first; reduce workload, add memory, or enforce container memory limits.
|
|
|
|
**Unable to remove filesystem:**
|
|
Find the process holding the path open with `lsof`, then stop that process or the container bind-mounting `/var/lib/docker`. |