Files
tai/runbooks/sssd.md
zphinx 57f4c0efaa
Some checks failed
CI / test (push) Failing after 15s
feat: complete RAG runbook workflow and release docs
2026-05-06 04:48:41 +02:00

3.2 KiB

service, symptoms, tags
service symptoms tags
sssd login denied, user not found, id command hangs, sudo rules missing, ldap auth failure, kerberos failure, cache stale, offline authentication not working sssd, ldap, kerberos, ad, identity, auth, pam, nss, sudo

Symptoms

  • id <user> hangs or returns no such user for a domain account
  • SSH or console login fails for directory-backed users
  • Group membership is missing or incomplete
  • sudo rules from LDAP/AD do not appear
  • Authentication works intermittently or only after cache flush
  • Offline authentication fails when the directory is unreachable

Diagnostics

Check service health

systemctl status sssd
sssctl domain-list
sssctl config-check
cat /etc/nsswitch.conf

A running daemon with a valid config and sss present in nsswitch.conf are the first prerequisites.

Check identity resolution

id <user>
getent passwd <user>
getent group <group>

If NSS lookups fail, the issue is often in SSSD configuration, connectivity, or cache.

Check SSSD logs

journalctl -u sssd -n 100
ls -la /var/log/sssd/
tail -n 100 /var/log/sssd/*.log
sssctl logs-fetch

Look for: backend offline, LDAP bind failures, Kerberos errors, TLS problems, and access provider denials.

If the issue is unclear, raise debug_level=6 in the relevant [nss], [pam], and [domain/<name>] sections. Raising debug only in [sssd] is not enough for most real failures.

Check domain reachability

sssctl domain-status <domain>
ping <ldap-or-ad-host>
dig -t SRV _ldap._tcp.<domain>
cat /etc/resolv.conf

If the identity provider is unreachable, SSSD may serve cached data only or fail entirely.

Check Kerberos and LDAP configuration

cat /etc/sssd/sssd.conf
cat /etc/krb5.conf
kinit <user>
klist
ldapsearch -ZZ -x -H ldap://<server> -b <base-dn>

Look for wrong realm names, bad server addresses, TLS settings, and access filters.

For AD or IPA providers, Kerberos and DNS are often the real dependency chain: broken SRV lookup, keytab issues, or a slow KDC will surface as SSSD failures.

Check cache and permissions

ls -la /var/lib/sss/db/
sssctl cache-status
sssctl cache-expire -E

/etc/sssd/sssd.conf must usually be mode 600 or SSSD will refuse to start.

Do not wipe cache files blindly on an offline system that depends on cached logins.

Remediation

Config syntax or permission issue: Fix sssd.conf, set secure permissions, then restart:

chmod 600 /etc/sssd/sssd.conf
systemctl restart sssd

Stale cache: Clear cache carefully, then repopulate with a fresh lookup:

sss_cache -E
id <user>

Kerberos failure: Validate time sync, realm, keytab credentials, and KDC reachability before changing LDAP settings.

Backend offline or sdap_async_sys_connect request failed: Treat as DNS/network first. Validate SRV records and TLS handshake before increasing ldap_network_timeout or ldap_search_timeout.

Access denied despite successful lookup: Check access_provider, LDAP filters, HBAC rules, or AD group-based access restrictions.

No pam_sss messages at all: The PAM stack is likely misconfigured. Fix the PAM/authselect profile before changing SSSD itself.