Edward Z. Yang wrote: > Excerpts from Rich Megginson's message of Thu Oct 14 15:35:38 -0400 2010: > >> We have tested server to server SASL/GSSAPI with replication on RHEL5, >> but we have not seen this happen. Do you have more than one replication >> agreement? >> > > Yes; we're doing full multimaster, so ever master has a replication agreement > with every other master. > > >> Would it be possible for you to provide a stacktrace >> obtained with thread apply all bt in gdb? >> > > Sure. See: > > http://web.mit.edu/~ezyang/Public/wedged-ldap.txt > > Edward > Thanks. Looks like this stack trace is from a 389-ds-base-1.2.5 server: Thread 36 (Thread 0x7f29ff5fe910 (LWP 24382)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:220 #1 0x0000003facc22ff9 in ?? () from /lib64/libnspr4.so #2 0x0000003facc23bdc in PR_WaitCondVar () from /lib64/libnspr4.so #3 0x00007f2a1898ecfc in protocol_sleep (prp=0x2723a50, duration=300000) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1309 #4 0x00007f2a1898fedc in repl5_inc_run (prp=0x2723a50) at ldap/servers/plugins/replication/repl5_inc_protocol.c:796 #5 0x00007f2a18994119 in prot_thread_main (arg=<value optimized out>) at ldap/servers/plugins/replication/repl5_protocol.c:313 #6 0x0000003facc29773 in ?? () from /lib64/libnspr4.so #7 0x000000300b80685a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #8 0x000000300acde22d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #9 0x0000000000000000 in ?? () This corresponds to: http://git.fedorahosted.org/git/?p=389/ds.git;a=blob;f=ldap/servers/plugins/replication/repl5_inc_protocol.c;h=4e733dec208e3426d13c2ed2b4239300d955e232;hb=389-ds-base-1.2.5 795 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l795> wait_change_timer_set = 1; 796 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l796> protocol_sleep(prp, MAX_WAIT_BETWEEN_SESSIONS); 797 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l797> } But not to 1.2.6: http://git.fedorahosted.org/git/?p=389/ds.git;a=blob;f=ldap/servers/plugins/replication/repl5_inc_protocol.c;h=6475eb89ba168b30a8cb38cd5a78f8dc1d8b4796;hb=389-ds-base-1.2.6 795 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l795> else 796 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l796> { 797 <http://git.fedorahosted.org/cgi-bin/gitweb.cgi#l797> if (wait_change_timer_set) Although I can't say for sure whether the bug you are encountering exists in 1.2.6, it's much easier for us to support the latest version. Can you try to reproduce with 1.2.6? If you would rather use 1.2.6.1, it has been pushed to Fedora/EPEL Stable and should be available from the mirrors within the next 48 hours. If you don't want to wait you can install from Fedora updates-testing or EPEL epel-testing.