Hi, Yes, with this patch we need not set conn->trans to NULL in rpc_clnt_disable Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Raghavendra G" <raghavendra@xxxxxxxxxxx> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Thursday, March 3, 2016 5:06:00 PM > Subject: Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t > > > > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote: > > [Replying on top of my own reply] > > > > Hi, > > > > I have submitted the below patch [1] to avoid the issue of > > 'rpc_clnt_submit' > > getting reconnected. But it won't take care of memory leak problem you were > > trying to fix. That we have to carefully go through all cases and fix it. > > Please have a look at it. > > > Looks good. IIUC, with this patch, we need not set conn->trans to NULL > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as > the transport object shall then get freed as part of > 'rpc_clnt_trigger_destroy'. > > > > http://review.gluster.org/#/c/13592/ > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > >> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > >> To: "Soumya Koduri" <skoduri@xxxxxxxxxx> > >> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel" > >> <gluster-devel@xxxxxxxxxxx> > >> Sent: Thursday, March 3, 2016 3:39:11 PM > >> Subject: Re: Cores generated with > >> ./tests/geo-rep/georep-basic-dr-tarssh.t > >> > >> Hi Soumya, > >> > >> I tested the lastes patch [2] on master where your previous patch [1] in > >> merged. > >> I see crashes at different places. > >> > >> 1. If there are code paths that are holding rpc object without taking ref > >> on > >> it, all those > >> code path will crash on invoking rpc submit on that object as rpc > >> object > >> would have freed > >> by last unref on DISCONNECT event. I see this kind of use-case in > >> chagnelog rpc code. > >> Need to check on other users of rpc. > Agree. We should fix all such code-paths. Since this seem to be an > intricate fix, shall we take these patches only in master branch and not > in 3.7 release for now till we fix all such paths as we encounter? > > >> > >> 2. And also we need to take care of reconnect timers that are being set > >> and > >> are re-tried to > >> connect back on expiration. In those cases also, we might crash as rpc > >> object would have freed. > Your patch addresses this..right? > > Thanks, > Soumya > > >> > >> > >> [1] http://review.gluster.org/#/c/13507/ > >> [2] http://review.gluster.org/#/c/13587/ > >> > >> Thanks and Regards, > >> Kotresh H R > >> > >> ----- Original Message ----- > >>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > >>> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath > >>> Ravishankar" <khiremat@xxxxxxxxxx> > >>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > >>> Sent: Thursday, March 3, 2016 12:24:00 PM > >>> Subject: Re: Cores generated with > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>> > >>> Thanks a lot Kotresh. > >>> > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote: > >>>> Hi Soumya, > >>>> > >>>> Can you send a fix to this regression on upstream master too? This patch > >>>> is merged there. > >>>> > >>> I have submitted below patch. > >>> http://review.gluster.org/#/c/13587/ > >>> > >>> Kindly review the same. > >>> > >>> Thanks, > >>> Soumya > >>> > >>>> regards, > >>>> Raghavendra > >>>> > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar > >>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote: > >>>> > >>>> Hi Soumya, > >>>> > >>>> I analysed the issue and found out that crash has happened because > >>>> of the patch [1]. > >>>> > >>>> The patch doesn't set transport object to NULL in > >>>> 'rpc_clnt_disable' > >>>> but instead does it on > >>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations > >>>> on the rpc object that > >>>> is disabled (those instances are possible as happening now in > >>>> changelog), it will trigger a > >>>> CONNECT notify again with 'mydata' that is freed causing a crash. > >>>> This happens because > >>>> 'rpc_clnt_submit' reconnects if rpc is not connected. > >>>> > >>>> rpc_clnt_submit (...) { > >>>> ... > >>>> if (conn->connected == 0) { > >>>> ret = rpc_transport_connect (conn->trans, > >>>> > >>>> conn->config.remote_port); > >>>> } > >>>> ... > >>>> } > >>>> > >>>> Without your patch, conn->trans was set NULL and hence CONNECT > >>>> fails > >>>> not resulting with > >>>> CONNECT notify call. And also the cleanup happens in failure path. > >>>> > >>>> So the memory leak can happen, if there is no try for rpc > >>>> invocation > >>>> after DISCONNECT. > >>>> It will be cleaned up otherwise. > >>>> > >>>> > >>>> [1] http://review.gluster.org/#/c/13507/ > >>>> > >>>> Thanks and Regards, > >>>> Kotresh H R > >>>> > >>>> ----- Original Message ----- > >>>> > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx > >>>> <mailto:khiremat@xxxxxxxxxx>> > >>>> > To: "Soumya Koduri" <skoduri@xxxxxxxxxx > >>>> > <mailto:skoduri@xxxxxxxxxx>> > >>>> > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "Gluster > >>>> Devel" <gluster-devel@xxxxxxxxxxx > >>>> <mailto:gluster-devel@xxxxxxxxxxx>> > >>>> > Sent: Monday, February 29, 2016 4:15:22 PM > >>>> > Subject: Re: Cores generated with > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>> > > >>>> > Hi Soumya, > >>>> > > >>>> > I just tested that it is reproducible only with your patch both > >>>> in master and > >>>> > 3.76 branch. > >>>> > The geo-rep test cases are marked bad in master. So it's not hit > >>>> in master. > >>>> > rpc is introduced > >>>> > in changelog xlator to communicate to applications via > >>>> libgfchangelog. > >>>> > Venky/Me will check > >>>> > why is the crash happening and will update. > >>>> > > >>>> > > >>>> > Thanks and Regards, > >>>> > Kotresh H R > >>>> > > >>>> > ----- Original Message ----- > >>>> > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx > >>>> <mailto:skoduri@xxxxxxxxxx>> > >>>> > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, > >>>> > > "kotresh" > >>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> > >>>> > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx > >>>> <mailto:gluster-devel@xxxxxxxxxxx>> > >>>> > > Sent: Monday, February 29, 2016 2:10:51 PM > >>>> > > Subject: Cores generated with > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>> > > > >>>> > > Hi Aravinda/Kotresh, > >>>> > > > >>>> > > With [1], I consistently see cores generated with the test > >>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7 > >>>> branch. From > >>>> > > the cores, looks like we are trying to dereference a freed > >>>> > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify(). > >>>> Strangely > >>>> > > this was not reported in master branch. > >>>> > > > >>>> > > I tried debugging but couldn't find any possible suspects. I > >>>> request you > >>>> > > to take a look and let me know if [1] caused any regression. > >>>> > > > >>>> > > Thanks, > >>>> > > Soumya > >>>> > > > >>>> > > [1] http://review.gluster.org/#/c/13507/ > >>>> > > > >>>> > > >>>> _______________________________________________ > >>>> Gluster-devel mailing list > >>>> Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx> > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Raghavendra G > >>> > >> _______________________________________________ > >> Gluster-devel mailing list > >> Gluster-devel@xxxxxxxxxxx > >> http://www.gluster.org/mailman/listinfo/gluster-devel > >> > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel