I tend to agree, nscd is a huge pain. Unfortunately, nscd does speed things up when it works. I think we found the problem, the ldap servers were moved behind a firewall that times out connections after one hour. We reset the timout for ldap to 10 hours. So far, the server has let me log in after 6 hours. The network people assure me that as long as there is traffic, the connection will stay open. I think I would like to dump the fire wall along with nscd. I have not looked at the source code for nss_ldap, but I would guess that keepalive is not set on the socket. Nscd and/or nss_ldap also have problems reconnecting after a network failure. Fedora core 3 did not have any trouble, suggesting that these problems were fixed in later versions of nss_ldap and glibc. The server is on its way to processing several thousand email messages a day for quite a few users. I doubt it will be possible without nscd. I would prefer to upgrade to Enterprise 4, but we have an ISCSI based san from Left Hand Networks. Last work was iscsi is not supported in Enterprise 4. I hope they will get that fixed soon. Matt On Thu, 2005-08-11 at 16:26 -0500, Chris St. Pierre wrote: > This may be off-base, but is there any reason you can't just kill > nscd, remove it from your init scripts, and never speak the name of > the grotesque beast again? That's probably one of my least favorite > programs *ever* (right up there with Microsoft Bob and netinfo) and is > one of the first things I get rid of. Do you need it? > > Chris St. Pierre > Unix Systems Administrator > Nebraska Wesleyan University > > On Wed, 10 Aug 2005, Matthew B. Brookover wrote: > > >I have a host running RedHat Enterprise 3 AS Release 5 academic version. > > > >NSCD seems to hang after a cache entry has timed out. If you bounce > >ncsd, all of the hung processes will continue like there was no problem. > > > >Nscd is configured to time out entries after 1 hour. To recreate the > >hang, bounce nscd to get it working, log in to the host, wait for 1 > >hour, then try an ls -l or any other command that will call getpw*. > >There are times when it does not hang, but most of the time nscd is > >hung. > > > >We are using openldap 2.2.26, Kerberos 1.4.1, and sasl 2.1.21 on > >dedicated ldap and kerberos servers. > > > >Other clients running Fedora Core 3 work fine. > > > >The client running redhat enterprise 3 AS release 5 is using the > >versions of sasl, nss-ldap, nscd, etc that came with the release: > >cyrus-sasl-2.1.15-10 > >pam_krb5-1.75-1 > >krb5-devel-1.2.7-47 > >cyrus-sasl-gssapi-2.1.15-10 > >openldap-clients-2.0.27-17 > >openldap-devel-2.0.27-17 > >nss_ldap-207-15 > >krb5-workstation-1.2.7-47 > >krb5-libs-1.2.7-47 > >nscd-2.3.2-95.33 > > > >nss-ldap and nscd log these errors in /var/log/messages: > >Aug 8 10:36:04 imagine nscd: nss_ldap: reconnecting to LDAP server... > >Aug 8 10:36:04 imagine nscd: nss_ldap: reconnected to LDAP server after > >1 attempt(s) > > > >Kerberos, GSSAPI, SASL, etc all work correctly. > > > >When nscd is hung, any program that calls getpwuid, getpwnam or getpwent > >will hang. I presume other functions that would cause a lookup through > >nscd and nss_ldap will also hang. > > > >The server running RHEL 3.5 was originally installed with 3.4 and then > >upgraded to 3.5. After the upgrade, Kerberos, ldap, etc were > >configured. This may be a problem that is new to 3.5. I did not test > >ldap, kerberos, sasl, etc under 3.4. > > > >When nscd is hung, you can log in as root and run an ldapsearch. The > >results are returned correctly. I followed these steps to test the ldap > >and kerberos servers: > >1) rebooted the RHEL 3 release 5 ldap/kerberos client > >2) logged in as my self > >3) logged off > >4) waited an hour for nscd's cache to time out > >5) logged in as my self (the login hung before printing the password > >prompt) I waited several minutes to make sure that it was not going to > >continue > >6) logged in as root on another terminal. > >7) ran an ldap search for my user and ran kinit (both worked) > >8) ran 'service nscd restart' > >9) went back to the first termianl, entered my password and was able to > >log in. > >10) waited 1 hour > >11) ran an ls -l, ls -l then hung. CTRL-c will unhang ls or other > >process that does not catch the signal. > > > >There are times when nscd or nss-ldap will unhang on their own. Any > >process calling getpw* will continue. > > > >/etc/nsswitch.conf is set with: > >passwd: files ldap > >shadow: files ldap > >group: files ldap > > > >#hosts: db files nisplus nis dns > >hosts: files dns > > > ># Example - obey only what nisplus tells us... > >#services: nisplus [NOTFOUND=return] files > >#networks: nisplus [NOTFOUND=return] files > >#protocols: nisplus [NOTFOUND=return] files > >#rpc: nisplus [NOTFOUND=return] files > >#ethers: nisplus [NOTFOUND=return] files > >#netmasks: nisplus [NOTFOUND=return] files > > > >bootparams: nisplus [NOTFOUND=return] files > > > >ethers: files > >netmasks: files > >networks: files > >protocols: files > >rpc: files > >services: files > > > >netgroup: files > > > >publickey: nisplus > > > >automount: files > >aliases: files nisplus > > > >--------------------------------------------- > > > >The server is a Gateway 9515 with 2 3GHZ Xeon processors and 4GB RAM. > >It will be serving email and other services very soon. Fortunately, it > >is not in production yet. > > > >Any ideas? > > > >thank you. > > > >Matt Brookover > >mbrookov@xxxxxxxxx > >303-273-3436 > > > > > >-- > >redhat-list mailing list > >unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > >https://www.redhat.com/mailman/listinfo/redhat-list > > > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list