Re: Core by test case : georep-tarssh-hybrid.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Appended comments inline.

----- Original Message -----
> From: "Susant Palai" <spalai@xxxxxxxxxx>
> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> Cc: gluster-devel@xxxxxxxxxxx
> Sent: Friday, April 24, 2015 7:17:33 PM
> Subject: Re:  Core by test case : georep-tarssh-hybrid.t
> 
> Hi,
>   Here is a speculation :
> 
>   With the introduction of multi-threaded epoll we are processing multiple
>   responses at the same time. The crash happened in _gf_free which
>   originated from dht_getxattr_cbk (as seen in the backtrace). In current
>   state we don't have a frame lock inside dht_getxattr_cbk. Hence, this path
>   is prone to races.
> 
> Here is a code-snippet from dht_getxattr_cbk.
> ===============================================
>         this_call_cnt = dht_frame_return (frame); 
            Need to move the above line after "out" section. Othere wise will end up in dead lock. 
> ..
> ..
> ..
> ..
> 
> 
>         if (!local->xattr) {
>                 local->xattr = dict_copy_with_ref (xattr, NULL);
>         } else {
>                 dht_aggregate_xattr (local->xattr, xattr);
>         }
> out:
>         if (is_last_call (this_call_cnt)) {
>                 DHT_STACK_UNWIND (getxattr, frame, local->op_ret, op_errno,
>                                   local->xattr, NULL);
>         }
>         return 0;
> 
> ===============================================
> Here I am depicting the responses from two cbks from a two subvol cluster.
>       
>                     Thread:1  CBK1
>                     Thread:2
>                     CBK2
>                   ====================
>                   =====================
> time: 1.     this_call_cnt = 1 (2-1)
>  
> time:2
> this_call_cnt
> = 0 (1 - 1)
> 
> time:3      enters this function dict_copy_with_ref
> 
> time:4
> dht_aggregate_xattr
> 
> time:5
> DHT_STACK_UNWIND
> [leading to dict_unref and destroy]
> 
> time:6      Still busy with dict_with_ref
>             and tries to unref dict leading to
>             free  which  is already freed in
>             other thread. Hence, a double free.
> 
> 
> Will compose a patch which encompass the  critical section under frame->lock.
> 
> 
> Regards,
> Susant
> 
> ----- Original Message -----
> > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> > To: "Venky Shankar" <vshankar@xxxxxxxxxx>, "Pranith Kumar Karampuri"
> > <pkarampu@xxxxxxxxxx>
> > Cc: gluster-devel@xxxxxxxxxxx
> > Sent: Friday, April 24, 2015 11:04:09 AM
> > Subject: Re:  Core by test case : georep-tarssh-hybrid.t
> > 
> > I apologize, I thought it is the same issue that we assumed. I just
> > looked into the stack trace in and is a different issue. This crash
> > has happened when stime getxattr.
> > 
> > Pranith,
> > You were working on min stime for ec, do you know abt this?
> > 
> > The trace looks like this.
> > 
> > 1.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64
> > openssl-1.0.1e-30.el6.8.x86_64 zlib-1.2.3-29.el6.x86_64
> > (gdb) bt
> > #0  0x00007f4d89c41380 in pthread_spin_lock () from /lib64/libpthread.so.0
> > #1  0x00007f4d8a714438 in __gf_free (free_ptr=0x7f4d70023550) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/mem-pool.c:303
> > #2  0x00007f4d8a6ca1fb in data_destroy (data=0x7f4d87f27488) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:148
> > #3  0x00007f4d8a6caf46 in data_unref (this=0x7f4d87f27488) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:549
> > #4  0x00007f4d8a6cde55 in dict_get_bin (this=0x7f4d88108be8,
> >     key=0x7f4d78131230
> >     "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> >     bin=0x7f4d7de276d8)
> >     at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:2231
> > #5  0x00007f4d7cfa0d19 in gf_get_min_stime (this=0x7f4d7800d690,
> > dst=0x7f4d88108be8,
> >     key=0x7f4d78131230
> >     "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> >     value=0x7f4d87f271b0)
> >     at
> >     /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/../../../../xlators/lib/src/libxlator.c:330
> > #6  0x00007f4d7cd16419 in dht_aggregate (this=0x7f4d88108d8c,
> >     key=0x7f4d78131230
> >     "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime",
> >     value=0x7f4d87f271b0, data=0x7f4d88108be8)
> >     at
> >     /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:116
> > #7  0x00007f4d8a6cc3b1 in dict_foreach_match (dict=0x7f4d88108d8c,
> > match=0x7f4d8a6cc244 <dict_match_everything>, match_data=0x0,
> >     action=0x7f4d7cd16330 <dht_aggregate>, action_data=0x7f4d88108be8) at
> >     /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1182
> > #8  0x00007f4d8a6cc2a4 in dict_foreach (dict=0x7f4d88108d8c,
> > fn=0x7f4d7cd16330 <dht_aggregate>, data=0x7f4d88108be8)
> >     at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1141
> > #9  0x00007f4d7cd165ae in dht_aggregate_xattr (dst=0x7f4d88108be8,
> > src=0x7f4d88108d8c) at
> > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:153
> > #10 0x00007f4d7cd2415e in dht_getxattr_cbk (frame=0x7f4d8870d118,
> > cookie=0x7f4d8870d1c4, this=0x7f4d7800d690, op_ret=0, op_errno=0,
> > xattr=0x7f4d88108d8c, xdata=0x0)
> >     at
> >     /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:2710
> > #11 0x00007f4d7cf81293 in afr_getxattr_cbk (frame=0x7f4d8870d1c4,
> > cookie=0x0,
> > this=0x7f4d7800b560, op_ret=0, op_errno=0, dict=0x7f4d88108d8c, xdata=0x0)
> >     at
> >     /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/afr-inode-read.c:500
> > #12 0x00007f4d7d1fd829 in client3_3_getxattr_cbk (req=0x7f4d75e59504,
> > iov=0x7f4d75e59544, count=1, myframe=0x7f4d8870d270)
> >     at
> >     /home/jenkins/root/workspace/smoke/xlators/protocol/client/src/client-rpc-fops.c:1093
> > #13 0x00007f4d8a4a0d1c in rpc_clnt_handle_reply (clnt=0x7f4d7811a100,
> > pollin=0x7f4d7812c660) at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:766
> > #14 0x00007f4d8a4a113c in rpc_clnt_notify (trans=0x7f4d78129d70,
> > mydata=0x7f4d7811a130, event=RPC_TRANSPORT_MSG_RECEIVED,
> > data=0x7f4d7812c660)
> >     at /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:894
> > #15 0x00007f4d8a49d66c in rpc_transport_notify (this=0x7f4d78129d70,
> > event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4d7812c660)
> >     at
> >     /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-transport.c:543
> > #16 0x00007f4d7f44e311 in socket_event_poll_in (this=0x7f4d78129d70) at
> > /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2290
> > #17 0x00007f4d7f44e7cc in socket_event_handler (fd=15, idx=4,
> > data=0x7f4d78129d70, poll_in=1, poll_out=0, poll_err=0)
> >     at
> >     /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2403
> > #18 0x00007f4d8a747e2d in event_dispatch_epoll_handler
> > (event_pool=0x1cc9ba0,
> > event=0x7f4d7de27e70)
> >     at
> >     /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:572
> > #19 0x00007f4d8a748186 in event_dispatch_epoll_worker (data=0x1d11cc0) at
> > /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:674
> > #20 0x00007f4d89c3c9d1 in start_thread () from /lib64/libpthread.so.0
> > #21 0x00007f4d895a68fd in clone () from /lib64/libc.so.6
> > 
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > ----- Original Message -----
> > > From: "Venky Shankar" <vshankar@xxxxxxxxxx>
> > > To: gluster-devel@xxxxxxxxxxx
> > > Sent: Friday, April 24, 2015 10:53:45 AM
> > > Subject: Re:  Core by test case : georep-tarssh-hybrid.t
> > > 
> > > 
> > > On 04/24/2015 10:22 AM, Kotresh Hiremath Ravishankar wrote:
> > > > Hi Atin,
> > > >
> > > > It is not spurious, there is an issue with this pointer I think. All
> > > > changelog consumers such as bitrot, geo-rep would see this. Since it's
> > > > a race, it occurred with gsyncd.
> > > 
> > > Correct. Jeff has mentioned this a while ago. I'll help out Kotresh in
> > > fixing this issue. In the meantime is it possible to disable
> > > geo-replication regression test cases until this gets fixed?
> > > 
> > > >   
> > > > No, the patch http://review.gluster.org/#/c/10340/ will not
> > > > take care of it. It just improves the time taken for geo-rep
> > > > regression.
> > > >
> > > > I am looking into it.
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ----- Original Message -----
> > > >> From: "Atin Mukherjee" <amukherj@xxxxxxxxxx>
> > > >> To: "kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Aravinda
> > > >> Vishwanathapura Krishna Murthy"
> > > >> <avishwan@xxxxxxxxxx>
> > > >> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > >> Sent: Friday, April 24, 2015 9:35:00 AM
> > > >> Subject: Core by test case : georep-tarssh-hybrid.t
> > > >>
> > > >> [1] has core file generated by tests/geo-rep/georep-tarssh-hybrid.t.
> > > >> Is
> > > >> it something alarming or http://review.gluster.org/#/c/10340/ would
> > > >> take
> > > >> care of it?
> > > >>
> > > >> [1]
> > > >> http://build.gluster.org/job/rackspace-regression-2GB-triggered/7345/consoleFull
> > > >> --
> > > >> ~Atin
> > > >>
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux