Majied, If you have a core file run: gdb /usr/sbin/glusterfsd -c core.??? where core.??? is the name of the core file then issue a bt the post the resulting backtrace. That will show the devs where the crash is happening Harris ----- Original Message ----- From: "Majied Najjar" <majied.najjar@xxxxxxxxxxxxxxx> To: "Harris Landgarten" <harrisl@xxxxxxxxxxxxx> Cc: gluster-devel@xxxxxxxxxx Sent: Thursday, June 28, 2007 6:00:37 PM (GMT-0500) America/New_York Subject: Re: Re: client cannot maintain mount of unified AFR Here is the backtrace without io-threads, the same behavior: 2007-06-28 17:52:39 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7fd454e] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7e8ffb9] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7ec3d3a] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7ecb5cf] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7ecb672] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x59) [0xb7fce279] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so [0xb764529c] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so(notify+0x2fb) [0xb764051b] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(transport_notify+0x37) [0xb7fd59f7] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd9) [0xb7fd6489] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7fd5acd] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x804927e] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7e7aea8] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x8048cb1] 2007-06-28 17:52:39 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f0254e] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7dbdfb9] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7df1d3a] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7df95cf] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7df9672] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x75) [0xb7efc295] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so [0xb757329c] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so(notify+0x2fb) [0xb756e51b] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(transport_notify+0x37) [0xb7f039f7] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd9) [0xb7f04489] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f03acd] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x804927e] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7da8ea8] 2007-06-28 17:52:39 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x8048cb1] 2007-06-28 17:52:40 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7edc54e] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7d97fb9] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7dcbd3a] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7dd35cf] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7dd3672] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x75) [0xb7ed6295] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so [0xb754d29c] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so(notify+0x2fb) [0xb754851b] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(transport_notify+0x37) [0xb7edd9f7] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd9) [0xb7ede489] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7eddacd] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x804927e] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7d82ea8] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x8048cb1] 2007-06-28 17:52:40 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f6a54e] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7e25fb9] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7e59d3a] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7e615cf] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7e61672] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x75) [0xb7f64295] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so [0xb75db29c] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/protocol/server.so(notify+0x2fb) [0xb75d651b] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(transport_notify+0x37) [0xb7f6b9f7] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xd9) [0xb7f6c489] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f6bacd] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x804927e] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7e10ea8] 2007-06-28 17:52:40 C [common-utils.c:207:gf_print_trace] debug-backtrace: [glusterfsd] [0x8048cb1] Majied On Thu, 28 Jun 2007 16:46:02 -0400 (EDT) Harris Landgarten <harrisl@xxxxxxxxxxxxx> wrote: > Majied, > > This looks like the same bug in io-threads that I have already reported. Try turning off io-threads and see if your problem goes away. > > Harris > > ----- Original Message ----- > From: "Majied Najjar" <majied.najjar@xxxxxxxxxxxxxxx> > To: gluster-devel@xxxxxxxxxx > Sent: Thursday, June 28, 2007 3:07:20 PM (GMT-0500) America/New_York > Subject: Re: Re: client cannot maintain mount of unified AFR > > Here is the debug-backtrace from glusterfsd right during/after it dies: > > 2007-06-28 14:59:57 E [protocol.c:262:gf_block_unserialize_transport] libglusterfs/protocol: full_read of header failed: peer (127.0.0.1) > 2007-06-28 14:59:57 C [tcp.c:81:tcp_disconnect] server: connection disconnected > 2007-06-28 14:59:57 E [protocol.c:262:gf_block_unserialize_transport] libglusterfs/protocol: full_read of header failed: peer (127.0.0.1) > 2007-06-28 14:59:57 C [tcp.c:81:tcp_disconnect] server: connection disconnected > 2007-06-28 14:59:57 E [protocol.c:262:gf_block_unserialize_transport] libglusterfs/protocol: full_read of header failed: peer (127.0.0.1) > 2007-06-28 14:59:57 C [tcp.c:81:tcp_disconnect] server: connection disconnected > 2007-06-28 14:59:57 E [protocol.c:262:gf_block_unserialize_transport] libglusterfs/protocol: full_read of header failed: peer (127.0.0.1) > 2007-06-28 14:59:57 C [tcp.c:81:tcp_disconnect] server: connection disconnected > 2007-06-28 14:59:57 E [protocol.c:262:gf_block_unserialize_transport] libglusterfs/protocol: full_read of header failed: peer (127.0.0.1) > 2007-06-28 14:59:57 C [tcp.c:81:tcp_disconnect] server: connection disconnected > 2007-06-28 15:00:47 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (11), printing backtrace > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f2c54e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x4e) [0xb7f2626e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_unref+0x4e) [0xb7f2630e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7f302d0] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x67) [0xb7f30417] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so [0xb75a01ef] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libpthread.so.0 [0xb7ef4240] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__clone+0x5e) [0xb7e893de] > 2007-06-28 15:00:47 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (11), printing backtrace > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7fbf54e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_destroy+0x4e) [0xb7fb926e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(dict_unref+0x4e) [0xb7fb930e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0 [0xb7fc32d0] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x67) [0xb7fc3417] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so [0xb76331ef] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libpthread.so.0 [0xb7f87240] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__clone+0x5e) [0xb7f1c3de] > 2007-06-28 15:00:47 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7eda54e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7d95fb9] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7dc9d3a] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7dcfd8c] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7dd139f] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7dd1672] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x40) [0xb7ede3f0] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so [0xb754e1ef] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libpthread.so.0 [0xb7ea2240] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__clone+0x5e) [0xb7e373de] > 2007-06-28 15:00:47 C [common-utils.c:205:gf_print_trace] debug-backtrace: Got signal (6), printing backtrace > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7ef054e] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: [0xffffe420] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(abort+0x109) [0xb7dabfb9] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7ddfd3a] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7de5d8c] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6 [0xb7de739f] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__libc_free+0x82) [0xb7de7672] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/libglusterfs.so.0(call_resume+0x40) [0xb7ef43f0] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /usr/lib/glusterfs/1.3.0-pre5/xlator/performance/io-threads.so [0xb75641ef] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libpthread.so.0 [0xb7eb8240] > 2007-06-28 15:00:47 C [common-utils.c:207:gf_print_trace] debug-backtrace: /lib/tls/i686/cmov/libc.so.6(__clone+0x5e) [0xb7e4d3de] > > Majied > > > > On Thu, 28 Jun 2007 11:32:53 -0400 > Majied Najjar <majied.najjar@xxxxxxxxxxxxxxx> wrote: > > > Yes. After I sent this message, I realized that I neglected to upgrade the client. However, after I upgraded the client and updated the config to include the namespace info, the servers kept crashing. Since this was a production machine, I had to downgrade as my "maintenance window" was over. :-) > > > > In an effort to get more data, I set up another instance on a testing server and got somewhat similar results. I have placed my core file from the server crash at http://majied.net/core.txt and my client/server config at http://majied.net/client-server.txt . > > > > Let me know if you need more information. > > > > Thanks, > > Majied > > > > > > On Thu, 28 Jun 2007 18:43:34 +0530 > > "Anand Avati" <avati@xxxxxxxxxxxxx> wrote: > > > > > You would get a 'connection refused' if the server is not running. Can you > > > please check if glusterfsd was running at that moment? also please get the > > > logs of the glusterfsd which was not running (and if possible, the core > > > dump's backtrace) > > > > > > Also have you upgraded all the servers and clients? > > > > > > thanks, > > > avati > > > > > > > > > 2007/6/27, Majied Najjar <majied.najjar@xxxxxxxxxxxxxxx>: > > > > > > > > Also, > > > > > > > > This happened when the first client in the client config > > > > rebooted. Normally, the second client in the afr group would have picked up > > > > the slack, but instead I was getting connection refused from the client. I > > > > am assuming this is a locking issue? > > > > > > > > Majied Najjar > > > > > > > > > > > > -- > > > Anand V. Avati > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >