Appended comments inline. ----- Original Message ----- > From: "Susant Palai" <spalai@xxxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > Cc: gluster-devel@xxxxxxxxxxx > Sent: Friday, April 24, 2015 7:17:33 PM > Subject: Re: Core by test case : georep-tarssh-hybrid.t > > Hi, > Here is a speculation : > > With the introduction of multi-threaded epoll we are processing multiple > responses at the same time. The crash happened in _gf_free which > originated from dht_getxattr_cbk (as seen in the backtrace). In current > state we don't have a frame lock inside dht_getxattr_cbk. Hence, this path > is prone to races. > > Here is a code-snippet from dht_getxattr_cbk. > =============================================== > this_call_cnt = dht_frame_return (frame); Need to move the above line after "out" section. Othere wise will end up in dead lock. > .. > .. > .. > .. > > > if (!local->xattr) { > local->xattr = dict_copy_with_ref (xattr, NULL); > } else { > dht_aggregate_xattr (local->xattr, xattr); > } > out: > if (is_last_call (this_call_cnt)) { > DHT_STACK_UNWIND (getxattr, frame, local->op_ret, op_errno, > local->xattr, NULL); > } > return 0; > > =============================================== > Here I am depicting the responses from two cbks from a two subvol cluster. > > Thread:1 CBK1 > Thread:2 > CBK2 > ==================== > ===================== > time: 1. this_call_cnt = 1 (2-1) > > time:2 > this_call_cnt > = 0 (1 - 1) > > time:3 enters this function dict_copy_with_ref > > time:4 > dht_aggregate_xattr > > time:5 > DHT_STACK_UNWIND > [leading to dict_unref and destroy] > > time:6 Still busy with dict_with_ref > and tries to unref dict leading to > free which is already freed in > other thread. Hence, a double free. > > > Will compose a patch which encompass the critical section under frame->lock. > > > Regards, > Susant > > ----- Original Message ----- > > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > To: "Venky Shankar" <vshankar@xxxxxxxxxx>, "Pranith Kumar Karampuri" > > <pkarampu@xxxxxxxxxx> > > Cc: gluster-devel@xxxxxxxxxxx > > Sent: Friday, April 24, 2015 11:04:09 AM > > Subject: Re: Core by test case : georep-tarssh-hybrid.t > > > > I apologize, I thought it is the same issue that we assumed. I just > > looked into the stack trace in and is a different issue. This crash > > has happened when stime getxattr. > > > > Pranith, > > You were working on min stime for ec, do you know abt this? > > > > The trace looks like this. > > > > 1.el6.x86_64 libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 > > openssl-1.0.1e-30.el6.8.x86_64 zlib-1.2.3-29.el6.x86_64 > > (gdb) bt > > #0 0x00007f4d89c41380 in pthread_spin_lock () from /lib64/libpthread.so.0 > > #1 0x00007f4d8a714438 in __gf_free (free_ptr=0x7f4d70023550) at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/mem-pool.c:303 > > #2 0x00007f4d8a6ca1fb in data_destroy (data=0x7f4d87f27488) at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:148 > > #3 0x00007f4d8a6caf46 in data_unref (this=0x7f4d87f27488) at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:549 > > #4 0x00007f4d8a6cde55 in dict_get_bin (this=0x7f4d88108be8, > > key=0x7f4d78131230 > > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime", > > bin=0x7f4d7de276d8) > > at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:2231 > > #5 0x00007f4d7cfa0d19 in gf_get_min_stime (this=0x7f4d7800d690, > > dst=0x7f4d88108be8, > > key=0x7f4d78131230 > > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime", > > value=0x7f4d87f271b0) > > at > > /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/../../../../xlators/lib/src/libxlator.c:330 > > #6 0x00007f4d7cd16419 in dht_aggregate (this=0x7f4d88108d8c, > > key=0x7f4d78131230 > > "trusted.glusterfs.2e9a9aed-0389-4ead-ad39-8196f875cd56.6fe2b66c-0f08-40c2-8a5b-93ce6daf8d32.stime", > > value=0x7f4d87f271b0, data=0x7f4d88108be8) > > at > > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:116 > > #7 0x00007f4d8a6cc3b1 in dict_foreach_match (dict=0x7f4d88108d8c, > > match=0x7f4d8a6cc244 <dict_match_everything>, match_data=0x0, > > action=0x7f4d7cd16330 <dht_aggregate>, action_data=0x7f4d88108be8) at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1182 > > #8 0x00007f4d8a6cc2a4 in dict_foreach (dict=0x7f4d88108d8c, > > fn=0x7f4d7cd16330 <dht_aggregate>, data=0x7f4d88108be8) > > at /home/jenkins/root/workspace/smoke/libglusterfs/src/dict.c:1141 > > #9 0x00007f4d7cd165ae in dht_aggregate_xattr (dst=0x7f4d88108be8, > > src=0x7f4d88108d8c) at > > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:153 > > #10 0x00007f4d7cd2415e in dht_getxattr_cbk (frame=0x7f4d8870d118, > > cookie=0x7f4d8870d1c4, this=0x7f4d7800d690, op_ret=0, op_errno=0, > > xattr=0x7f4d88108d8c, xdata=0x0) > > at > > /home/jenkins/root/workspace/smoke/xlators/cluster/dht/src/dht-common.c:2710 > > #11 0x00007f4d7cf81293 in afr_getxattr_cbk (frame=0x7f4d8870d1c4, > > cookie=0x0, > > this=0x7f4d7800b560, op_ret=0, op_errno=0, dict=0x7f4d88108d8c, xdata=0x0) > > at > > /home/jenkins/root/workspace/smoke/xlators/cluster/afr/src/afr-inode-read.c:500 > > #12 0x00007f4d7d1fd829 in client3_3_getxattr_cbk (req=0x7f4d75e59504, > > iov=0x7f4d75e59544, count=1, myframe=0x7f4d8870d270) > > at > > /home/jenkins/root/workspace/smoke/xlators/protocol/client/src/client-rpc-fops.c:1093 > > #13 0x00007f4d8a4a0d1c in rpc_clnt_handle_reply (clnt=0x7f4d7811a100, > > pollin=0x7f4d7812c660) at > > /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:766 > > #14 0x00007f4d8a4a113c in rpc_clnt_notify (trans=0x7f4d78129d70, > > mydata=0x7f4d7811a130, event=RPC_TRANSPORT_MSG_RECEIVED, > > data=0x7f4d7812c660) > > at /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-clnt.c:894 > > #15 0x00007f4d8a49d66c in rpc_transport_notify (this=0x7f4d78129d70, > > event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f4d7812c660) > > at > > /home/jenkins/root/workspace/smoke/rpc/rpc-lib/src/rpc-transport.c:543 > > #16 0x00007f4d7f44e311 in socket_event_poll_in (this=0x7f4d78129d70) at > > /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2290 > > #17 0x00007f4d7f44e7cc in socket_event_handler (fd=15, idx=4, > > data=0x7f4d78129d70, poll_in=1, poll_out=0, poll_err=0) > > at > > /home/jenkins/root/workspace/smoke/rpc/rpc-transport/socket/src/socket.c:2403 > > #18 0x00007f4d8a747e2d in event_dispatch_epoll_handler > > (event_pool=0x1cc9ba0, > > event=0x7f4d7de27e70) > > at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:572 > > #19 0x00007f4d8a748186 in event_dispatch_epoll_worker (data=0x1d11cc0) at > > /home/jenkins/root/workspace/smoke/libglusterfs/src/event-epoll.c:674 > > #20 0x00007f4d89c3c9d1 in start_thread () from /lib64/libpthread.so.0 > > #21 0x00007f4d895a68fd in clone () from /lib64/libc.so.6 > > > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "Venky Shankar" <vshankar@xxxxxxxxxx> > > > To: gluster-devel@xxxxxxxxxxx > > > Sent: Friday, April 24, 2015 10:53:45 AM > > > Subject: Re: Core by test case : georep-tarssh-hybrid.t > > > > > > > > > On 04/24/2015 10:22 AM, Kotresh Hiremath Ravishankar wrote: > > > > Hi Atin, > > > > > > > > It is not spurious, there is an issue with this pointer I think. All > > > > changelog consumers such as bitrot, geo-rep would see this. Since it's > > > > a race, it occurred with gsyncd. > > > > > > Correct. Jeff has mentioned this a while ago. I'll help out Kotresh in > > > fixing this issue. In the meantime is it possible to disable > > > geo-replication regression test cases until this gets fixed? > > > > > > > > > > > No, the patch http://review.gluster.org/#/c/10340/ will not > > > > take care of it. It just improves the time taken for geo-rep > > > > regression. > > > > > > > > I am looking into it. > > > > > > > > Thanks and Regards, > > > > Kotresh H R > > > > > > > > ----- Original Message ----- > > > >> From: "Atin Mukherjee" <amukherj@xxxxxxxxxx> > > > >> To: "kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Aravinda > > > >> Vishwanathapura Krishna Murthy" > > > >> <avishwan@xxxxxxxxxx> > > > >> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > > > >> Sent: Friday, April 24, 2015 9:35:00 AM > > > >> Subject: Core by test case : georep-tarssh-hybrid.t > > > >> > > > >> [1] has core file generated by tests/geo-rep/georep-tarssh-hybrid.t. > > > >> Is > > > >> it something alarming or http://review.gluster.org/#/c/10340/ would > > > >> take > > > >> care of it? > > > >> > > > >> [1] > > > >> http://build.gluster.org/job/rackspace-regression-2GB-triggered/7345/consoleFull > > > >> -- > > > >> ~Atin > > > >> > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@xxxxxxxxxxx > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxxx > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel