Re: glusterfsd crash with glusterfs--mainline--2.5--patch-410

Nathan Allen Stratton <nathan@xxxxxxxxxxxx> · Fri, 3 Aug 2007 11:52:35 -0400 (EDT)

On Fri, 3 Aug 2007, Anand Avati wrote:

> Nathan,
>  we were checking this issue in our labs. the issue seems to be that one
> extra connect() is tried on the dead server, and the 'freezing' just happens
> to be the block till the first connect() times out. So, after a minute or
> so, once the first connect() times out, things proceed smooth. We are
> looking into why the first connect() is blocking.

Did some more testing with this, I shutdown the server and after a bit it
crashed.

(gdb) bt
#0  dict_destroy (this=0x622420) at dict.c:244
#1  0x00002aaaab93002b in server_reply_proc (data=<value optimized out>)
at server-protocol.c:255
#2  0x0000003f610062f7 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003f604ce86d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(bdb)

After about 5 more min the 2nd box crashed:

(gdb) bt
#0  0x0000003f60430065 in raise () from /lib64/libc.so.6
#1  0x0000003f60431b00 in abort () from /lib64/libc.so.6
#2  0x0000003f6046825b in __libc_message () from /lib64/libc.so.6
#3  0x0000003f6046f504 in _int_free () from /lib64/libc.so.6
#4  0x0000003f60472b2c in free () from /lib64/libc.so.6
#5  0x00002aaaaaace140 in dict_destroy (this=0x2aaab00012b0) at dict.c:250
#6  0x00002aaaab93002b in server_reply_proc (data=<value optimized out>) at server-protocol.c:255
#7  0x0000003f610062f7 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003f604ce86d in clone () from /lib64/libc.so.6
#9  0x0000000000000000 in ?? ()
(gdb)