Re: [Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Mon, 2 Dec 2013 04:25:44 -0500 (EST)

Could you please raise a bug for this. Even I saw this some time back, will work on this with priority if you could do that.

Pranith

----- Original Message -----
> From: "Song" <gluster@xxxxxxx>
> To: "Song" <gluster@xxxxxxx>, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> Cc: "John Mark Walker" <johnmark@xxxxxxxxxxx>, gluster-users@xxxxxxxxxxx
> Sent: Monday, December 2, 2013 2:49:26 PM
> Subject: RE:  [Gluster-devel] GlusterFS 3.3.1 client crash	(signal received: 6)
> 
> Pranith,
> 
> Another kind of client crash happened, gdb information is as below for you
> reference:
> 
> Core was generated by `/usr/sbin/glusterfs --log-level=INFO --volfile-id=gfs6
> --volfile-server=bj-nx-c'.
> Program terminated with signal 11, Segmentation fault.
> #0  afr_frame_return (frame=<value optimized out>) at afr-common.c:983
> 983	                call_count = --local->call_count;
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.47.el6.x86_64 libgcc-4.4.6-3.el6.x86_64
> openssl-1.0.0-20.el6.x86_64 zlib-1.2.3-27.el6.x86_64
> (gdb) where
> #0  afr_frame_return (frame=<value optimized out>) at afr-common.c:983
> #1  0x00007f8aa1c1ebbc in afr_sh_entry_impunge_parent_setattr_cbk
> (setattr_frame=0x7f8aa525b248, cookie=<value optimized out>, this=0x1a82e00,
> op_ret=<value optimized out>,
>     op_errno=<value optimized out>, preop=<value optimized out>, postop=0x0,
>     xdata=0x0) at afr-self-heal-entry.c:970
> #2  0x00007f8aa1e5fecb in client3_1_setattr (frame=0x7f8aa54ec634,
> this=<value optimized out>, data=<value optimized out>) at
> client3_1-fops.c:5801
> #3  0x00007f8aa1e58b41 in client_setattr (frame=0x7f8aa54ec634, this=<value
> optimized out>, loc=<value optimized out>, stbuf=<value optimized out>,
> valid=<value optimized out>,
>     xdata=<value optimized out>) at client.c:1915
> #4  0x00007f8aa1c1f080 in afr_sh_entry_impunge_setattr
> (impunge_frame=0x7f8aa5454e10, this=<value optimized out>) at
> afr-self-heal-entry.c:1017
> #5  0x00007f8aa1c1f5c0 in afr_sh_entry_impunge_xattrop_cbk
> (impunge_frame=0x7f8aa5454e10, cookie=0x1, this=0x1a82e00, op_ret=<value
> optimized out>, op_errno=22, xattr=<value optimized out>,
>     xdata=0x0) at afr-self-heal-entry.c:1067
> #6  0x00007f8aa1e6b34e in client3_1_xattrop_cbk (req=<value optimized out>,
> iov=<value optimized out>, count=<value optimized out>,
> myframe=0x7f8aa54ad5b8) at client3_1-fops.c:1715
> #7  0x00000037eba0f4e5 in rpc_clnt_handle_reply (clnt=0x1eaccd0,
> pollin=0x2fba390) at rpc-clnt.c:786
> #8  0x00000037eba0fce0 in rpc_clnt_notify (trans=<value optimized out>,
> mydata=0x1eacd00, event=<value optimized out>, data=<value optimized out>)
> at rpc-clnt.c:905
> #9  0x00000037eba0aeb8 in rpc_transport_notify (this=<value optimized out>,
> event=<value optimized out>, data=<value optimized out>) at
> rpc-transport.c:489
> #10 0x00007f8aa2cb5764 in socket_event_poll_in (this=0x1ebc730) at
> socket.c:1677
> #11 0x00007f8aa2cb5847 in socket_event_handler (fd=<value optimized out>,
> idx=127, data=0x1ebc730, poll_in=1, poll_out=0, poll_err=<value optimized
> out>) at socket.c:1792
> #12 0x00000037eb63e464 in event_dispatch_epoll_handler (event_pool=0x19eddf0)
> at event.c:785
> #13 event_dispatch_epoll (event_pool=0x19eddf0) at event.c:847
> #14 0x000000000040736a in main (argc=<value optimized out>,
> argv=0x7fff26cdcd78) at glusterfsd.c:1689
> 
> -----Original Message-----
> From: Song [mailto:gluster@xxxxxxx]
> Sent: Monday, October 28, 2013 11:25 AM
> To: 'Pranith Kumar Karampuri'
> Cc: 'John Mark Walker'; 'gluster-users@xxxxxxxxxxx'
> Subject: RE:  [Gluster-devel] GlusterFS 3.3.1 client crash
> (signal received: 6)
> 
> Pranith,
> 
> Another similar client crash happened. Following are the glusterfs log and
> gdb output for your reference.
> 
> pending frames:
> frame : type(1) op(STATFS)
> frame : type(1) op(STATFS)
> 
> patchset: git://git.gluster.com/glusterfs.git
> signal received: 6
> time of crash: 2013-10-28 00:41:53
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> fdatasync 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.3.1
> /lib64/libc.so.6[0x3a0c432900]
> /lib64/libc.so.6(gsignal+0x35)[0x3a0c432885]
> /lib64/libc.so.6(abort+0x175)[0x3a0c434065]
> /lib64/libc.so.6[0x3a0c46f7a7]
> /lib64/libc.so.6[0x3a0c4750c6]
> /usr/lib/libglusterfs.so.0(gf_timer_call_cancel+0xb0)[0x328b42a180]
> /usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client_ping_cbk+0x6d)[0x7f3514afe54d]
> /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x328b80f4e5]
> /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x328b80fce0]
> /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x328b80aeb8]
> /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f351593a764]
> /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7f351593a847]
> /usr/lib/libglusterfs.so.0[0x328b43e464]
> /usr/sbin/glusterfs(main+0x58a)[0x40736a]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3a0c41ecdd]
> /usr/sbin/glusterfs[0x4042d9]
> ---------
> 
> 
> (gdb) where
> #0  0x0000003a0c432885 in raise () from /lib64/libc.so.6
> #1  0x0000003a0c434065 in abort () from /lib64/libc.so.6
> #2  0x0000003a0c46f7a7 in __libc_message () from /lib64/libc.so.6
> #3  0x0000003a0c4750c6 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000328b42a180 in gf_timer_call_cancel (ctx=<value optimized out>,
> event=0x7f34f0001730) at timer.c:122
> #5  0x00007f3514afe54d in client_ping_cbk (req=<value optimized out>,
> iov=<value optimized out>, count=<value optimized out>,
> myframe=0x7f3517f0751c) at client-handshake.c:285
> #6  0x000000328b80f4e5 in rpc_clnt_handle_reply (clnt=0x1890aa0,
> pollin=0x1e7acb0) at rpc-clnt.c:786
> #7  0x000000328b80fce0 in rpc_clnt_notify (trans=<value optimized out>,
> mydata=0x1890ad0, event=<value optimized out>, data=<value optimized out>)
> at rpc-clnt.c:905
> #8  0x000000328b80aeb8 in rpc_transport_notify (this=<value optimized out>,
> event=<value optimized out>, data=<value optimized out>) at
> rpc-transport.c:489
> #9  0x00007f351593a764 in socket_event_poll_in (this=0x18a0500) at
> socket.c:1677
> #10 0x00007f351593a847 in socket_event_handler (fd=<value optimized out>,
> idx=41, data=0x18a0500, poll_in=1, poll_out=0, poll_err=<value optimized
> out>) at socket.c:1792
> #11 0x000000328b43e464 in event_dispatch_epoll_handler (event_pool=0x930df0)
> at event.c:785
> #12 event_dispatch_epoll (event_pool=0x930df0) at event.c:847
> #13 0x000000000040736a in main (argc=<value optimized out>,
> argv=0x7fff829eac78) at glusterfsd.c:1689
> 
> 
> 
> -----Original Message-----
> From: Pranith Kumar Karampuri [mailto:pkarampu@xxxxxxxxxx]
> Sent: Friday, October 25, 2013 2:00 PM
> To: Song
> Cc: John Mark Walker; gluster-users@xxxxxxxxxxx
> Subject: Re:  [Gluster-devel] GlusterFS 3.3.1 client crash
> (signal received: 6)
> 
> Thanks for this information. Let us see if we can re-create the issue in our
> environment. If that does not help, we shall do a detailed analysis of the
> code to figure this out.
> 
> Pranith
> ----- Original Message -----
> > From: "Song" <gluster@xxxxxxx>
> > To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> > Cc: "John Mark Walker" <johnmark@xxxxxxxxxxx>,
> > gluster-users@xxxxxxxxxxx
> > Sent: Wednesday, October 23, 2013 2:53:03 PM
> > Subject: RE:  [Gluster-devel] GlusterFS 3.3.1 client crash
> > 	(signal received: 6)
> > 
> > Pranith,
> > 
> > Thanks for your detail answer.
> > 
> > Our workload includes CREATE/WRITE/READ/STAT/ACCESS, as well as
> > chmod(filepath, 0). While I don't know which kind of workload lead to
> > the crash.
> > We have analyzed the related code such as dict, lookup of cluster/afr,
> > lookup of protocol/client and have nothing useful information to help
> > locate the issues.
> > 
> > Song.
> > 
> > -----Original Message-----
> > From: Pranith Kumar Karampuri [mailto:pkarampu@xxxxxxxxxx]
> > Sent: Tuesday, October 22, 2013 5:25 PM
> > To: Song
> > Cc: John Mark Walker; gluster-users@xxxxxxxxxxx
> > Subject: Re:  [Gluster-devel] GlusterFS 3.3.1 client
> > crash (signal received: 6)
> > 
> > Song,
> >      The information printed in that function gf_print_trace has been
> >      useful
> >      in the sense that we know it happens when there is a double 'memput'
> >      of
> >      one of the data structures as part of 'lookup'. The problem is this
> >      issue seems to be happening only in some peculiar case, which
> >      unfortunately you are hitting every day on 1-2 clients. That is why I
> >      was trying to figure out what the workload is.
> > 
> > Let me tell you what I mean by 'workload' is.
> > For example:
> > For websites which do some kind of image manipulation. They generally
> > CREATE temporary information and do some transformations i.e.
> > READS/WRITES and then RENAME them to the actual files.
> > So here the work load is CREATE/READ/WRITE/RENAME intensive.
> > 
> > To give you one more example:
> > VM image hosting(At least with the KVM images that I test generally),
> > On each VM image it pretty much does WRITES, READs, STATs so it is
> > WRITEs/STATs/READs intensive.
> > 
> > I would really like to know what kind of workload happens on your
> > setup to figure out what is that peculiar thing that may lead to this
> > crash.
> > 
> > Pranith.
> > 
> > ----- Original Message -----
> > > From: "Song" <gluster@xxxxxxx>
> > > To: "Song" <gluster@xxxxxxx>, "John Mark Walker"
> > > <johnmark@xxxxxxxxxxx>, "Pranith Kumar Karampuri"
> > > <pkarampu@xxxxxxxxxx>
> > > Cc: gluster-users@xxxxxxxxxxx
> > > Sent: Tuesday, October 22, 2013 1:56:48 PM
> > > Subject: RE:  [Gluster-devel] GlusterFS 3.3.1 client crash
> > > 	(signal received: 6)
> > > 
> > > To locate this issue, is it possible to print more useful
> > > information in backtrace?
> > > When client crashed, trace information was printed. Which is coded
> > > in function of "gf_print_trace", in common-utils.c.
> > > I hope that some helpful debug information would be appended in this
> > > function and when client crash next time, the data can help us to
> > > analyze the problem.
> > > 
> > > Could you give me the suggestion what codes is useful?
> > > Thanks!
> > > 
> > > -----Original Message-----
> > > From: gluster-users-bounces@xxxxxxxxxxx
> > > [mailto:gluster-users-bounces@xxxxxxxxxxx] On Behalf Of Song
> > > Sent: Friday, September 06, 2013 10:17 AM
> > > To: 'John Mark Walker'; 'Pranith Kumar Karampuri'
> > > Cc: gluster-users@xxxxxxxxxxx
> > > Subject: Re:  [Gluster-devel] GlusterFS 3.3.1 client
> > > crash (signal received: 6)
> > > 
> > > It's a pity I don't know how to re-create the issue. While there are
> > > 1-2 crashed clients in total 120 clients every day.
> > > 
> > > Below is gdb result:
> > > 
> > > (gdb) where
> > > #0  0x0000003267432885 in raise () from /lib64/libc.so.6
> > > #1  0x0000003267434065 in abort () from /lib64/libc.so.6
> > > #2  0x000000326746f7a7 in __libc_message () from /lib64/libc.so.6
> > > #3  0x00000032674750c6 in malloc_printerr () from /lib64/libc.so.6
> > > #4  0x00007fc4f2847684 in mem_put (ptr=0x7fc4b0a4c03c) at
> > > mem-pool.c:559
> > > #5  0x00007fc4f281cc9b in dict_destroy (this=0x7fc4f12cc5cc) at
> > > dict.c:397
> > > #6  0x00007fc4ede24c30 in afr_local_cleanup (local=0x7fc4ce68ac20,
> > > this=<value optimized out>) at afr-common.c:848
> > > #7  0x00007fc4ede2c0f1 in afr_lookup_done (frame=0x18d5ae4,
> > > cookie=0x0, this=<value optimized out>, op_ret=<value optimized
> > > out>, op_errno=<value optimized out>, inode=0x18d5b20,
> > >     buf=0x7fffcb83ec50, xattr=0x7fc4f12e1818,
> > > postparent=0x7fffcb83ebe0) at
> > > afr-common.c:1881
> > > #8  afr_lookup_cbk (frame=0x18d5ae4, cookie=0x0, this=<value
> > > optimized
> > > out>, op_ret=<value optimized out>, op_errno=<value optimized out>,
> > > inode=0x18d5b20, buf=0x7fffcb83ec50,
> > >     xattr=0x7fc4f12e1818, postparent=0x7fffcb83ebe0) at
> > > afr-common.c:2044
> > > #9  0x00007fc4ee066550 in client3_1_lookup_cbk (req=<value optimized
> > > out>, iov=<value optimized out>, count=<value optimized out>,
> > > myframe=0x7fc4f16f390c) at client3_1-fops.c:2636
> > > #10 0x00007fc4f25ff4e5 in rpc_clnt_handle_reply (clnt=0x3b5c600,
> > > pollin=0x6ba00f0) at rpc-clnt.c:786
> > > #11 0x00007fc4f25ffce0 in rpc_clnt_notify (trans=<value optimized
> > > out>, mydata=0x3b5c630, event=<value optimized out>, data=<value
> > > optimized out>) at rpc-clnt.c:905
> > > #12 0x00007fc4f25faeb8 in rpc_transport_notify (this=<value
> > > optimized
> > > out>, event=<value optimized out>, data=<value optimized out>) at
> > > rpc-transport.c:489
> > > #13 0x00007fc4eeeb0764 in socket_event_poll_in (this=0x3b6c060) at
> > > socket.c:1677
> > > #14 0x00007fc4eeeb0847 in socket_event_handler (fd=<value optimized
> > > out>, idx=265, data=0x3b6c060, poll_in=1, poll_out=0,
> > > out>poll_err=<value
> > > optimized
> > > out>) at socket.c:1792
> > > #15 0x00007fc4f2846464 in event_dispatch_epoll_handler
> > > (event_pool=0x177cdf0) at event.c:785
> > > #16 event_dispatch_epoll (event_pool=0x177cdf0) at event.c:847
> > > #17 0x000000000040736a in main (argc=<value optimized out>,
> > > argv=0x7fffcb83efc8) at glusterfsd.c:1689
> > > 
> > > 
> > > -----Original Message-----
> > > From: jowalker@xxxxxxxxxx [mailto:jowalker@xxxxxxxxxx] On Behalf Of
> > > John Mark Walker
> > > Sent: Thursday, September 05, 2013 1:06 PM
> > > To: Pranith Kumar Karampuri
> > > Cc: Song; gluster-devel@xxxxxxxxxx
> > > Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client crash (signal
> > > received:
> > > 6)
> > > 
> > > Posting to gluster-users.
> > > 
> > > 
> > > ----- Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
> > > > Song,
> > > > Seems like the issue is happening because of double 'memput',
> > > > Could you
> > > let us know the steps to re-create the issue? Or the load that may
> > > lead to this?
> > > > 
> > > > Pranith
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Song" <gluster@xxxxxxx>
> > > > > To: gluster-devel@xxxxxxxxxx
> > > > > Sent: Thursday, September 5, 2013 8:05:57 AM
> > > > > Subject: [Gluster-devel] GlusterFS 3.3.1 client crash (signal
> > > > > received: 6)
> > > > > 
> > > > > 
> > > > > 
> > > > > I installed GlusterFS 3.3.1 in my 24 servers, created a DHT+AFR
> > > > > volume and mounted it with native client.
> > > > > 
> > > > > Recently, some glusterfs clients is crashed, the log is as below.
> > > > > 
> > > > > 
> > > > > 
> > > > > The OS is 64bit CentOS6.2, kernel version:
> > > > > 2.6.32-220.23.1.el6.x86_64 #1 SMP Fri Jun 28 00:56:49 CST 2013
> > > > > x86_64 x86_64 x86_64 GNU/Linux
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > pending frames:
> > > > > 
> > > > > frame : type(1) op(LOOKUP)
> > > > > 
> > > > > frame : type(1) op(LOOKUP)
> > > > > 
> > > > > frame : type(1) op(LOOKUP)
> > > > > 
> > > > > 
> > > > > 
> > > > > patchset: git://git.gluster.com/glusterfs.git
> > > > > 
> > > > > signal received: 6
> > > > > 
> > > > > time of crash: 2013-09-05 00:37:40
> > > > > 
> > > > > configuration details:
> > > > > 
> > > > > argp 1
> > > > > 
> > > > > backtrace 1
> > > > > 
> > > > > dlfcn 1
> > > > > 
> > > > > fdatasync 1
> > > > > 
> > > > > libpthread 1
> > > > > 
> > > > > llistxattr 1
> > > > > 
> > > > > setfsid 1
> > > > > 
> > > > > spinlock 1
> > > > > 
> > > > > epoll.h 1
> > > > > 
> > > > > xattr.h 1
> > > > > 
> > > > > st_atim.tv_nsec 1
> > > > > 
> > > > > package-string: glusterfs 3.3.1
> > > > > 
> > > > > /lib64/libc.so.6[0x3ac0232900]
> > > > > 
> > > > > /lib64/libc.so.6(gsignal+0x35)[0x3ac0232885]
> > > > > 
> > > > > /lib64/libc.so.6(abort+0x175)[0x3ac0234065]
> > > > > 
> > > > > /lib64/libc.so.6[0x3ac026f7a7]
> > > > > 
> > > > > /lib64/libc.so.6[0x3ac02750c6]
> > > > > 
> > > > > /usr/lib/libglusterfs.so.0(mem_put+0x64)[0x7f3f99c2c684]
> > > > > 
> > > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_local_c
> > > > > le
> > > > > an
> > > > > up+0x60)[0x7f3f95209c30]
> > > > > 
> > > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_lookup_
> > > > > cb
> > > > > k+
> > > > > 0x5a1)[0x7f3f952110f1]
> > > > > 
> > > > > /usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client3_1_loo
> > > > > ku
> > > > > p_
> > > > > cbk+0x6b0)[0x7f3f9544b550]
> > > > > 
> > > > > /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f3f999e44e
> > > > > 5]
> > > > > 
> > > > > /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f3f999e4ce0]
> > > > > 
> > > > > /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f3f999dfeb8
> > > > > ]
> > > > > 
> > > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_po
> > > > > ll
> > > > > _i
> > > > > n+0x34)[0x7f3f96295764]
> > > > > 
> > > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_ha
> > > > > nd
> > > > > le
> > > > > r+0xc7)[0x7f3f96295847]
> > > > > 
> > > > > /usr/lib/libglusterfs.so.0(+0x3e464)[0x7f3f99c2b464]
> > > > > 
> > > > > /usr/sbin/glusterfs(main+0x58a)[0x40736a]
> > > > > 
> > > > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ac021ecdd]
> > > > > 
> > > > > /usr/sbin/glusterfs[0x4042d9]
> > > > > 
> > > > > ---------
> > > > > 
> > > > > 
> > > > > 
> > > > > Best regards.
> > > > > 
> > > > > Willard Song
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > Gluster-devel mailing list
> > > > > Gluster-devel@xxxxxxxxxx
> > > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxx
> > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users@xxxxxxxxxxx
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > > 
> > > 
> > > 
> > 
> > 
> > 
> 
> 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users