feat: complete RAG runbook workflow and release docs
Some checks failed
CI / test (push) Failing after 15s
Some checks failed
CI / test (push) Failing after 15s
This commit is contained in:
115
runbooks/sssd.md
Normal file
115
runbooks/sssd.md
Normal file
@@ -0,0 +1,115 @@
|
||||
---
|
||||
service: sssd
|
||||
symptoms: login denied, user not found, id command hangs, sudo rules missing, ldap auth failure, kerberos failure, cache stale, offline authentication not working
|
||||
tags: sssd, ldap, kerberos, ad, identity, auth, pam, nss, sudo
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
- `id <user>` hangs or returns no such user for a domain account
|
||||
- SSH or console login fails for directory-backed users
|
||||
- Group membership is missing or incomplete
|
||||
- `sudo` rules from LDAP/AD do not appear
|
||||
- Authentication works intermittently or only after cache flush
|
||||
- Offline authentication fails when the directory is unreachable
|
||||
|
||||
## Diagnostics
|
||||
|
||||
### Check service health
|
||||
|
||||
```
|
||||
systemctl status sssd
|
||||
sssctl domain-list
|
||||
sssctl config-check
|
||||
cat /etc/nsswitch.conf
|
||||
```
|
||||
|
||||
A running daemon with a valid config and `sss` present in `nsswitch.conf` are the first prerequisites.
|
||||
|
||||
### Check identity resolution
|
||||
|
||||
```
|
||||
id <user>
|
||||
getent passwd <user>
|
||||
getent group <group>
|
||||
```
|
||||
|
||||
If NSS lookups fail, the issue is often in SSSD configuration, connectivity, or cache.
|
||||
|
||||
### Check SSSD logs
|
||||
|
||||
```
|
||||
journalctl -u sssd -n 100
|
||||
ls -la /var/log/sssd/
|
||||
tail -n 100 /var/log/sssd/*.log
|
||||
sssctl logs-fetch
|
||||
```
|
||||
|
||||
Look for: backend offline, LDAP bind failures, Kerberos errors, TLS problems, and access provider denials.
|
||||
|
||||
If the issue is unclear, raise `debug_level=6` in the relevant `[nss]`, `[pam]`, and `[domain/<name>]` sections. Raising debug only in `[sssd]` is not enough for most real failures.
|
||||
|
||||
### Check domain reachability
|
||||
|
||||
```
|
||||
sssctl domain-status <domain>
|
||||
ping <ldap-or-ad-host>
|
||||
dig -t SRV _ldap._tcp.<domain>
|
||||
cat /etc/resolv.conf
|
||||
```
|
||||
|
||||
If the identity provider is unreachable, SSSD may serve cached data only or fail entirely.
|
||||
|
||||
### Check Kerberos and LDAP configuration
|
||||
|
||||
```
|
||||
cat /etc/sssd/sssd.conf
|
||||
cat /etc/krb5.conf
|
||||
kinit <user>
|
||||
klist
|
||||
ldapsearch -ZZ -x -H ldap://<server> -b <base-dn>
|
||||
```
|
||||
|
||||
Look for wrong realm names, bad server addresses, TLS settings, and access filters.
|
||||
|
||||
For AD or IPA providers, Kerberos and DNS are often the real dependency chain: broken SRV lookup, keytab issues, or a slow KDC will surface as SSSD failures.
|
||||
|
||||
### Check cache and permissions
|
||||
|
||||
```
|
||||
ls -la /var/lib/sss/db/
|
||||
sssctl cache-status
|
||||
sssctl cache-expire -E
|
||||
```
|
||||
|
||||
`/etc/sssd/sssd.conf` must usually be mode `600` or SSSD will refuse to start.
|
||||
|
||||
Do not wipe cache files blindly on an offline system that depends on cached logins.
|
||||
|
||||
## Remediation
|
||||
|
||||
**Config syntax or permission issue:**
|
||||
Fix `sssd.conf`, set secure permissions, then restart:
|
||||
```
|
||||
chmod 600 /etc/sssd/sssd.conf
|
||||
systemctl restart sssd
|
||||
```
|
||||
|
||||
**Stale cache:**
|
||||
Clear cache carefully, then repopulate with a fresh lookup:
|
||||
```
|
||||
sss_cache -E
|
||||
id <user>
|
||||
```
|
||||
|
||||
**Kerberos failure:**
|
||||
Validate time sync, realm, keytab credentials, and KDC reachability before changing LDAP settings.
|
||||
|
||||
**Backend offline or `sdap_async_sys_connect request failed`:**
|
||||
Treat as DNS/network first. Validate SRV records and TLS handshake before increasing `ldap_network_timeout` or `ldap_search_timeout`.
|
||||
|
||||
**Access denied despite successful lookup:**
|
||||
Check `access_provider`, LDAP filters, HBAC rules, or AD group-based access restrictions.
|
||||
|
||||
**No `pam_sss` messages at all:**
|
||||
The PAM stack is likely misconfigured. Fix the PAM/authselect profile before changing SSSD itself.
|
||||
Reference in New Issue
Block a user