[Gluster-devel] GlusterFS 3.3.1 client crash (signal received: 6)

pkarampu at redhat.com (Pranith Kumar Karampuri) · Fri, 25 Oct 2013 02:00:17 -0400 (EDT)

Thanks for this information. Let us see if we can re-create the issue in our environment. If that does not help, we shall do a detailed analysis of the code to figure this out.

Pranith
----- Original Message -----
> From: "Song" <gluster at 163.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "John Mark Walker" <johnmark at gluster.org>, gluster-users at gluster.org
> Sent: Wednesday, October 23, 2013 2:53:03 PM
> Subject: RE: [Gluster-devel] GlusterFS 3.3.1 client crash	(signal received: 6)
> 
> Pranith,
> 
> Thanks for your detail answer.
> 
> Our workload includes CREATE/WRITE/READ/STAT/ACCESS, as well as
> chmod(filepath, 0). While I don't know which kind of workload lead to the
> crash.
> We have analyzed the related code such as dict, lookup of cluster/afr, lookup
> of protocol/client and have nothing useful information to help locate the
> issues.
> 
> Song.
> 
> -----Original Message-----
> From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com]
> Sent: Tuesday, October 22, 2013 5:25 PM
> To: Song
> Cc: John Mark Walker; gluster-users at gluster.org
> Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client crash
> (signal received: 6)
> 
> Song,
>      The information printed in that function gf_print_trace has been useful
>      in the sense that we know it happens when there is a double 'memput' of
>      one of the data structures as part of 'lookup'. The problem is this
>      issue seems to be happening only in some peculiar case, which
>      unfortunately you are hitting every day on 1-2 clients. That is why I
>      was trying to figure out what the workload is.
> 
> Let me tell you what I mean by 'workload' is.
> For example:
> For websites which do some kind of image manipulation. They generally CREATE
> temporary information and do some transformations i.e. READS/WRITES and then
> RENAME them to the actual files.
> So here the work load is CREATE/READ/WRITE/RENAME intensive.
> 
> To give you one more example:
> VM image hosting(At least with the KVM images that I test generally), On each
> VM image it pretty much does WRITES, READs, STATs so it is
> WRITEs/STATs/READs intensive.
> 
> I would really like to know what kind of workload happens on your setup to
> figure out what is that peculiar thing that may lead to this crash.
> 
> Pranith.
> 
> ----- Original Message -----
> > From: "Song" <gluster at 163.com>
> > To: "Song" <gluster at 163.com>, "John Mark Walker" <johnmark at gluster.org>,
> > "Pranith Kumar Karampuri"
> > <pkarampu at redhat.com>
> > Cc: gluster-users at gluster.org
> > Sent: Tuesday, October 22, 2013 1:56:48 PM
> > Subject: RE: [Gluster-devel] GlusterFS 3.3.1 client crash
> > 	(signal received: 6)
> > 
> > To locate this issue, is it possible to print more useful information
> > in backtrace?
> > When client crashed, trace information was printed. Which is coded in
> > function of "gf_print_trace", in common-utils.c.
> > I hope that some helpful debug information would be appended in this
> > function and when client crash next time, the data can help us to
> > analyze the problem.
> > 
> > Could you give me the suggestion what codes is useful?
> > Thanks!
> > 
> > -----Original Message-----
> > From: gluster-users-bounces at gluster.org
> > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Song
> > Sent: Friday, September 06, 2013 10:17 AM
> > To: 'John Mark Walker'; 'Pranith Kumar Karampuri'
> > Cc: gluster-users at gluster.org
> > Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client
> > crash (signal received: 6)
> > 
> > It's a pity I don't know how to re-create the issue. While there are
> > 1-2 crashed clients in total 120 clients every day.
> > 
> > Below is gdb result:
> > 
> > (gdb) where
> > #0  0x0000003267432885 in raise () from /lib64/libc.so.6
> > #1  0x0000003267434065 in abort () from /lib64/libc.so.6
> > #2  0x000000326746f7a7 in __libc_message () from /lib64/libc.so.6
> > #3  0x00000032674750c6 in malloc_printerr () from /lib64/libc.so.6
> > #4  0x00007fc4f2847684 in mem_put (ptr=0x7fc4b0a4c03c) at
> > mem-pool.c:559
> > #5  0x00007fc4f281cc9b in dict_destroy (this=0x7fc4f12cc5cc) at
> > dict.c:397
> > #6  0x00007fc4ede24c30 in afr_local_cleanup (local=0x7fc4ce68ac20,
> > this=<value optimized out>) at afr-common.c:848
> > #7  0x00007fc4ede2c0f1 in afr_lookup_done (frame=0x18d5ae4,
> > cookie=0x0, this=<value optimized out>, op_ret=<value optimized out>,
> > op_errno=<value optimized out>, inode=0x18d5b20,
> >     buf=0x7fffcb83ec50, xattr=0x7fc4f12e1818,
> > postparent=0x7fffcb83ebe0) at
> > afr-common.c:1881
> > #8  afr_lookup_cbk (frame=0x18d5ae4, cookie=0x0, this=<value optimized
> > out>, op_ret=<value optimized out>, op_errno=<value optimized out>,
> > inode=0x18d5b20, buf=0x7fffcb83ec50,
> >     xattr=0x7fc4f12e1818, postparent=0x7fffcb83ebe0) at
> > afr-common.c:2044
> > #9  0x00007fc4ee066550 in client3_1_lookup_cbk (req=<value optimized
> > out>, iov=<value optimized out>, count=<value optimized out>,
> > myframe=0x7fc4f16f390c) at client3_1-fops.c:2636
> > #10 0x00007fc4f25ff4e5 in rpc_clnt_handle_reply (clnt=0x3b5c600,
> > pollin=0x6ba00f0) at rpc-clnt.c:786
> > #11 0x00007fc4f25ffce0 in rpc_clnt_notify (trans=<value optimized
> > out>, mydata=0x3b5c630, event=<value optimized out>, data=<value
> > optimized out>) at rpc-clnt.c:905
> > #12 0x00007fc4f25faeb8 in rpc_transport_notify (this=<value optimized
> > out>, event=<value optimized out>, data=<value optimized out>) at
> > rpc-transport.c:489
> > #13 0x00007fc4eeeb0764 in socket_event_poll_in (this=0x3b6c060) at
> > socket.c:1677
> > #14 0x00007fc4eeeb0847 in socket_event_handler (fd=<value optimized
> > out>, idx=265, data=0x3b6c060, poll_in=1, poll_out=0, poll_err=<value
> > optimized
> > out>) at socket.c:1792
> > #15 0x00007fc4f2846464 in event_dispatch_epoll_handler
> > (event_pool=0x177cdf0) at event.c:785
> > #16 event_dispatch_epoll (event_pool=0x177cdf0) at event.c:847
> > #17 0x000000000040736a in main (argc=<value optimized out>,
> > argv=0x7fffcb83efc8) at glusterfsd.c:1689
> > 
> > 
> > -----Original Message-----
> > From: jowalker at redhat.com [mailto:jowalker at redhat.com] On Behalf Of
> > John Mark Walker
> > Sent: Thursday, September 05, 2013 1:06 PM
> > To: Pranith Kumar Karampuri
> > Cc: Song; gluster-devel at nongnu.org
> > Subject: Re: [Gluster-devel] GlusterFS 3.3.1 client crash (signal received:
> > 6)
> > 
> > Posting to gluster-users.
> > 
> > 
> > ----- Pranith Kumar Karampuri <pkarampu at redhat.com> wrote:
> > > Song,
> > > Seems like the issue is happening because of double 'memput', Could
> > > you
> > let us know the steps to re-create the issue? Or the load that may
> > lead to this?
> > > 
> > > Pranith
> > > 
> > > ----- Original Message -----
> > > > From: "Song" <gluster at 163.com>
> > > > To: gluster-devel at nongnu.org
> > > > Sent: Thursday, September 5, 2013 8:05:57 AM
> > > > Subject: [Gluster-devel] GlusterFS 3.3.1 client crash (signal
> > > > received: 6)
> > > > 
> > > > 
> > > > 
> > > > I installed GlusterFS 3.3.1 in my 24 servers, created a DHT+AFR
> > > > volume and mounted it with native client.
> > > > 
> > > > Recently, some glusterfs clients is crashed, the log is as below.
> > > > 
> > > > 
> > > > 
> > > > The OS is 64bit CentOS6.2, kernel version:
> > > > 2.6.32-220.23.1.el6.x86_64 #1 SMP Fri Jun 28 00:56:49 CST 2013
> > > > x86_64 x86_64 x86_64 GNU/Linux
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > pending frames:
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > frame : type(1) op(LOOKUP)
> > > > 
> > > > 
> > > > 
> > > > patchset: git://git.gluster.com/glusterfs.git
> > > > 
> > > > signal received: 6
> > > > 
> > > > time of crash: 2013-09-05 00:37:40
> > > > 
> > > > configuration details:
> > > > 
> > > > argp 1
> > > > 
> > > > backtrace 1
> > > > 
> > > > dlfcn 1
> > > > 
> > > > fdatasync 1
> > > > 
> > > > libpthread 1
> > > > 
> > > > llistxattr 1
> > > > 
> > > > setfsid 1
> > > > 
> > > > spinlock 1
> > > > 
> > > > epoll.h 1
> > > > 
> > > > xattr.h 1
> > > > 
> > > > st_atim.tv_nsec 1
> > > > 
> > > > package-string: glusterfs 3.3.1
> > > > 
> > > > /lib64/libc.so.6[0x3ac0232900]
> > > > 
> > > > /lib64/libc.so.6(gsignal+0x35)[0x3ac0232885]
> > > > 
> > > > /lib64/libc.so.6(abort+0x175)[0x3ac0234065]
> > > > 
> > > > /lib64/libc.so.6[0x3ac026f7a7]
> > > > 
> > > > /lib64/libc.so.6[0x3ac02750c6]
> > > > 
> > > > /usr/lib/libglusterfs.so.0(mem_put+0x64)[0x7f3f99c2c684]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_local_cle
> > > > an
> > > > up+0x60)[0x7f3f95209c30]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/cluster/replicate.so(afr_lookup_cb
> > > > k+
> > > > 0x5a1)[0x7f3f952110f1]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/xlator/protocol/client.so(client3_1_looku
> > > > p_
> > > > cbk+0x6b0)[0x7f3f9544b550]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f3f999e44e5]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f3f999e4ce0]
> > > > 
> > > > /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f3f999dfeb8]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_poll
> > > > _i
> > > > n+0x34)[0x7f3f96295764]
> > > > 
> > > > /usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_event_hand
> > > > le
> > > > r+0xc7)[0x7f3f96295847]
> > > > 
> > > > /usr/lib/libglusterfs.so.0(+0x3e464)[0x7f3f99c2b464]
> > > > 
> > > > /usr/sbin/glusterfs(main+0x58a)[0x40736a]
> > > > 
> > > > /lib64/libc.so.6(__libc_start_main+0xfd)[0x3ac021ecdd]
> > > > 
> > > > /usr/sbin/glusterfs[0x4042d9]
> > > > 
> > > > ---------
> > > > 
> > > > 
> > > > 
> > > > Best regards.
> > > > 
> > > > Willard Song
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at nongnu.org
> > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > > > 
> > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel at nongnu.org
> > > https://lists.nongnu.org/mailman/listinfo/gluster-devel
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > 
> 
> 
>