On 02/14/2012 06:37 PM, Iain Morgan wrote:
Hello, On a fairly frequent basis, one of my 389 DS servers hangs after certain CMP operations. Once this happens, the server cannot be shutdown gracefully. This has been going on for several weeks, and I have not yet found a solution. My setup consists of two systems running RHEL 6.2 with 389 DS 1.2.9.16. Multimaster replication is enabled between the two servers, but the client systems (currently just two test systems) preferrentially use the same server, ServerA. The second server, ServerB, is the one which is experiencing the problem. We are using class-of-service entries to to set the values for the shadowMax, shadowMin, and shadowWarning attributes. And we are conditionally setting a pwdPolicySubentry attribute for some entries in the same manner. If I execute an ldapcompare command, such as the following: # ldapcompare uid=imorgan,ou=People,dc=example,dc=com \ pwdpolicysubentry:"cn=Special Policy,ou=Policies,dc=example,dc=com" the command will occassionally hang. Most of the time, the command succeeds and indicates that the attribute is not defined for that entry. However, once or twice a day it will simply hang. The access log shows that the CMP request was received, but no result is logged. After this occurs, the server will not shut down gracefully. The init script fails to shut down the server and I end up having to send a SIGKILL to ns-slapd.
When you get the hang, can you attach to the process with gdb? ps -ef|grep ns-slapd gdb /usr/sbin/ns-slapd pid-of-ns-slapd
The error log does not report any issues. CMP operations against other attributes, such as loginShell, do not seem to exhibit this problem. Also, the problem does not occur on ServerA; only on ServerB. Once the CMP operation has hung, comparisons against other attributes, even shadowMax, continue to work. As noted above, most of the time the CMP operation returns normally. However, if I reinitialize ServerB from ServerA, the problem occurs with the first CMP operation against ServerB. Both servers have the same set of RPMs and the dse.ldif on both systems do not have any significant differences. Has anyone seen a similar issue? Any suggestions on how to debug of fix this? A somewhat simplified and redacted version of the class-of-service configuration is listed below. Thanks
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users