3.0 KiB
service, symptoms, tags
| service | symptoms | tags |
|---|---|---|
| postgres | connection refused port 5432, FATAL password authentication failed, replication lag, disk full, out of shared memory, too many connections, relation does not exist, could not connect to the primary | postgres, postgresql, database, replication, pg, psql, disk, connections |
Symptoms
could not connect to server: Connection refused— postgres not running or not on port 5432FATAL: password authentication failed for user "<user>"— wrong credentials or pg_hba mismatchFATAL: too many connections— connection pool exhaustedERROR: could not resize shared memory segment/out of shared memory— shared_buffers too high for systemPANIC: could not write to file "pg_wal/..."— disk full on WAL directory- Replication lag growing — standby falling behind primary
FATAL: could not connect to the primary server— standby cannot reach primary
Diagnostics
Service status
systemctl status postgresql
systemctl status postgresql@<version>-main
PostgreSQL logs
journalctl -u postgresql -n 100
tail -n 100 /var/log/postgresql/postgresql-*.log
Is postgres listening?
ss -tlnp | grep 5432
Disk space (WAL and data directory are the critical paths)
df -h
du -sh /var/lib/postgresql/
du -sh /var/lib/postgresql/*/main/pg_wal/
A full disk on the pg_wal partition causes a PANIC and hard crash.
Connection count
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
SELECT setting FROM pg_settings WHERE name = 'max_connections';
Replication lag (run on primary)
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
(sent_lsn - replay_lsn) AS lag_bytes
FROM pg_stat_replication;
pg_hba.conf — authentication rules
cat /etc/postgresql/*/main/pg_hba.conf
Entries are matched top-to-bottom. reject or missing entry for the client IP causes auth failure even with correct credentials.
Shared memory / kernel settings
cat /proc/sys/kernel/shmmax
cat /etc/postgresql/*/main/postgresql.conf | grep shared_buffers
shared_buffers must not exceed ~40% of RAM; kernel shmmax must accommodate it.
Remediation
Postgres not running:
systemctl start postgresql
Check logs immediately after start for the failure reason.
Authentication failure (pg_hba mismatch):
Add or update the correct entry in pg_hba.conf, then reload:
systemctl reload postgresql
Too many connections — increase limit (requires restart):
In postgresql.conf:
max_connections = 200
Or deploy a connection pooler (pgbouncer).
Disk full on WAL:
Identify and remove old base backups or archived WAL segments under /var/lib/postgresql/*/main/pg_wal/.
Do NOT delete pg_wal files directly — use pg_archivecleanup or let archiving catch up.
Replication lag — standby too far behind:
Check network bandwidth and I/O on standby. If wal_receiver_status_interval lag is large, increase wal_sender_timeout temporarily.