On Fri, 3 Aug 2007, Anand Avati wrote: > Nathan, > we were checking this issue in our labs. the issue seems to be that one > extra connect() is tried on the dead server, and the 'freezing' just happens > to be the block till the first connect() times out. So, after a minute or > so, once the first connect() times out, things proceed smooth. We are > looking into why the first connect() is blocking. Did some more testing with this, I shutdown the server and after a bit it crashed. (gdb) bt #0 dict_destroy (this=0x622420) at dict.c:244 #1 0x00002aaaab93002b in server_reply_proc (data=<value optimized out>) at server-protocol.c:255 #2 0x0000003f610062f7 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003f604ce86d in clone () from /lib64/libc.so.6 #4 0x0000000000000000 in ?? () (bdb) After about 5 more min the 2nd box crashed: (gdb) bt #0 0x0000003f60430065 in raise () from /lib64/libc.so.6 #1 0x0000003f60431b00 in abort () from /lib64/libc.so.6 #2 0x0000003f6046825b in __libc_message () from /lib64/libc.so.6 #3 0x0000003f6046f504 in _int_free () from /lib64/libc.so.6 #4 0x0000003f60472b2c in free () from /lib64/libc.so.6 #5 0x00002aaaaaace140 in dict_destroy (this=0x2aaab00012b0) at dict.c:250 #6 0x00002aaaab93002b in server_reply_proc (data=<value optimized out>) at server-protocol.c:255 #7 0x0000003f610062f7 in start_thread () from /lib64/libpthread.so.0 #8 0x0000003f604ce86d in clone () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () (gdb)