--- service: postgres symptoms: connection refused port 5432, FATAL password authentication failed, replication lag, disk full, out of shared memory, too many connections, relation does not exist, could not connect to the primary tags: postgres, postgresql, database, replication, pg, psql, disk, connections --- ## Symptoms - `could not connect to server: Connection refused` — postgres not running or not on port 5432 - `FATAL: password authentication failed for user ""` — wrong credentials or pg_hba mismatch - `FATAL: too many connections` — connection pool exhausted - `ERROR: could not resize shared memory segment` / `out of shared memory` — shared_buffers too high for system - `PANIC: could not write to file "pg_wal/..."` — disk full on WAL directory - Replication lag growing — standby falling behind primary - `FATAL: could not connect to the primary server` — standby cannot reach primary ## Diagnostics ### Service status ``` systemctl status postgresql systemctl status postgresql@-main ``` ### PostgreSQL logs ``` journalctl -u postgresql -n 100 tail -n 100 /var/log/postgresql/postgresql-*.log ``` ### Is postgres listening? ``` ss -tlnp | grep 5432 ``` ### Disk space (WAL and data directory are the critical paths) ``` df -h du -sh /var/lib/postgresql/ du -sh /var/lib/postgresql/*/main/pg_wal/ ``` A full disk on the pg_wal partition causes a PANIC and hard crash. ### Connection count ```sql SELECT count(*), state FROM pg_stat_activity GROUP BY state; SELECT setting FROM pg_settings WHERE name = 'max_connections'; ``` ### Replication lag (run on primary) ```sql SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, (sent_lsn - replay_lsn) AS lag_bytes FROM pg_stat_replication; ``` ### pg_hba.conf — authentication rules ``` cat /etc/postgresql/*/main/pg_hba.conf ``` Entries are matched top-to-bottom. `reject` or missing entry for the client IP causes auth failure even with correct credentials. ### Shared memory / kernel settings ``` cat /proc/sys/kernel/shmmax cat /etc/postgresql/*/main/postgresql.conf | grep shared_buffers ``` `shared_buffers` must not exceed ~40% of RAM; kernel `shmmax` must accommodate it. ## Remediation **Postgres not running:** ``` systemctl start postgresql ``` Check logs immediately after start for the failure reason. **Authentication failure (pg_hba mismatch):** Add or update the correct entry in `pg_hba.conf`, then reload: ``` systemctl reload postgresql ``` **Too many connections — increase limit (requires restart):** In `postgresql.conf`: ``` max_connections = 200 ``` Or deploy a connection pooler (`pgbouncer`). **Disk full on WAL:** Identify and remove old base backups or archived WAL segments under `/var/lib/postgresql/*/main/pg_wal/`. Do NOT delete pg_wal files directly — use `pg_archivecleanup` or let archiving catch up. **Replication lag — standby too far behind:** Check network bandwidth and I/O on standby. If `wal_receiver_status_interval` lag is large, increase `wal_sender_timeout` temporarily.