Files
tai/runbooks/sssd.md
zphinx 57f4c0efaa
Some checks failed
CI / test (push) Failing after 15s
feat: complete RAG runbook workflow and release docs
2026-05-06 04:48:41 +02:00

116 lines
3.2 KiB
Markdown

---
service: sssd
symptoms: login denied, user not found, id command hangs, sudo rules missing, ldap auth failure, kerberos failure, cache stale, offline authentication not working
tags: sssd, ldap, kerberos, ad, identity, auth, pam, nss, sudo
---
## Symptoms
- `id <user>` hangs or returns no such user for a domain account
- SSH or console login fails for directory-backed users
- Group membership is missing or incomplete
- `sudo` rules from LDAP/AD do not appear
- Authentication works intermittently or only after cache flush
- Offline authentication fails when the directory is unreachable
## Diagnostics
### Check service health
```
systemctl status sssd
sssctl domain-list
sssctl config-check
cat /etc/nsswitch.conf
```
A running daemon with a valid config and `sss` present in `nsswitch.conf` are the first prerequisites.
### Check identity resolution
```
id <user>
getent passwd <user>
getent group <group>
```
If NSS lookups fail, the issue is often in SSSD configuration, connectivity, or cache.
### Check SSSD logs
```
journalctl -u sssd -n 100
ls -la /var/log/sssd/
tail -n 100 /var/log/sssd/*.log
sssctl logs-fetch
```
Look for: backend offline, LDAP bind failures, Kerberos errors, TLS problems, and access provider denials.
If the issue is unclear, raise `debug_level=6` in the relevant `[nss]`, `[pam]`, and `[domain/<name>]` sections. Raising debug only in `[sssd]` is not enough for most real failures.
### Check domain reachability
```
sssctl domain-status <domain>
ping <ldap-or-ad-host>
dig -t SRV _ldap._tcp.<domain>
cat /etc/resolv.conf
```
If the identity provider is unreachable, SSSD may serve cached data only or fail entirely.
### Check Kerberos and LDAP configuration
```
cat /etc/sssd/sssd.conf
cat /etc/krb5.conf
kinit <user>
klist
ldapsearch -ZZ -x -H ldap://<server> -b <base-dn>
```
Look for wrong realm names, bad server addresses, TLS settings, and access filters.
For AD or IPA providers, Kerberos and DNS are often the real dependency chain: broken SRV lookup, keytab issues, or a slow KDC will surface as SSSD failures.
### Check cache and permissions
```
ls -la /var/lib/sss/db/
sssctl cache-status
sssctl cache-expire -E
```
`/etc/sssd/sssd.conf` must usually be mode `600` or SSSD will refuse to start.
Do not wipe cache files blindly on an offline system that depends on cached logins.
## Remediation
**Config syntax or permission issue:**
Fix `sssd.conf`, set secure permissions, then restart:
```
chmod 600 /etc/sssd/sssd.conf
systemctl restart sssd
```
**Stale cache:**
Clear cache carefully, then repopulate with a fresh lookup:
```
sss_cache -E
id <user>
```
**Kerberos failure:**
Validate time sync, realm, keytab credentials, and KDC reachability before changing LDAP settings.
**Backend offline or `sdap_async_sys_connect request failed`:**
Treat as DNS/network first. Validate SRV records and TLS handshake before increasing `ldap_network_timeout` or `ldap_search_timeout`.
**Access denied despite successful lookup:**
Check `access_provider`, LDAP filters, HBAC rules, or AD group-based access restrictions.
**No `pam_sss` messages at all:**
The PAM stack is likely misconfigured. Fix the PAM/authselect profile before changing SSSD itself.