Thanks for the reminder. Will take that up. ----- Original Message ----- > From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > To: "Venky Shankar" <vshankar@xxxxxxxxxx>, "Raghavendra G" <raghavendra@xxxxxxxxxxx> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Tuesday, June 28, 2016 3:24:13 PM > Subject: Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t > > Hi Raghavendra/Venky, > > Gentle reminder. Oleksander (post-factum) confirmed that their > production setup has been running with this patch included since > sometime and had no issues. Shall we consider merging this patch if > there are no review comments? > > Thanks, > Soumya > > On 03/09/2016 09:16 PM, Kotresh Hiremath Ravishankar wrote: > > Hi All, > > > > The following patch is sent to address changelog rpc mem-leak issues. > > The fix is intricate and needs lot of testing before taking in. Please > > review the same. > > > > http://review.gluster.org/#/c/13658/1 > > > > Thanks and Regards, > > Kotresh H R > > > > > > ----- Original Message ----- > >> From: "Venky Shankar" <vshankar@xxxxxxxxxx> > >> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx> > >> Cc: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Gluster Devel" > >> <gluster-devel@xxxxxxxxxxx> > >> Sent: Monday, March 7, 2016 3:52:13 PM > >> Subject: Re: Cores generated with > >> ./tests/geo-rep/georep-basic-dr-tarssh.t > >> > >> On Fri, Mar 04, 2016 at 02:02:32PM +0530, Raghavendra G wrote: > >>> On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar < > >>> khiremat@xxxxxxxxxx> wrote: > >>> > >>>> Hi, > >>>> > >>>> Yes, with this patch we need not set conn->trans to NULL in > >>>> rpc_clnt_disable > >>>> > >>> > >>> While [1] fixes the crash, things can be improved in the way how > >>> changelog > >>> is using rpc. > >>> > >>> 1. In the current code, there is an rpc_clnt object leak during > >>> disconnect > >>> event. > >> > >> Just for the record (as I couldn't find this information while building up > >> rpc infrastructure in changelog): > >> > >> Unref rpc-clnt object after calling rpc_clnt_disable() in notify() upon > >> RPC_CLNT_DISCONNECT. Free up 'mydata' in notify() upon RPC_CLNT_DESTROY. > >> > >>> 2. Also, freed "mydata" of changelog is still associated with rpc_clnt > >>> object (corollary of 1), though change log might not get any events with > >>> "mydata" (as connection is dead). > >>> > >>> I've discussed with Kotresh about changes needed, offline. So, following > >>> are the action items. > >>> 1. Soumya's patch [2] is valid and is needed for 3.7 branch too. > >>> 2. [2] can be accepted. However, someone might want to re-use an rpc > >>> object > >>> after disabling it, like introducing a new api rpc_clnt_enable_again > >>> (though no of such use-cases is very less). But [2] doesn't allow it. The > >>> point is as long as rpc-clnt object is alive, transport object is alive > >>> (though disconnected) and we can re-use it. So, I would prefer not to > >>> accept it. > >>> 3. Kotresh will work on new changes to make sure changelog makes correct > >>> use of rpc-clnt. > >>> > >>> [1] http://review.gluster.org/#/c/13592 > >>> [2] http://review.gluster.org/#/c/1359 > >>> > >>> regards, > >>> Raghavendra. > >>> > >>> > >>>> Thanks and Regards, > >>>> Kotresh H R > >>>> > >>>> ----- Original Message ----- > >>>>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > >>>>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Raghavendra > >>>> G" <raghavendra@xxxxxxxxxxx> > >>>>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > >>>>> Sent: Thursday, March 3, 2016 5:06:00 PM > >>>>> Subject: Re: Cores generated with > >>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>>> > >>>>> > >>>>> > >>>>> On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote: > >>>>>> [Replying on top of my own reply] > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I have submitted the below patch [1] to avoid the issue of > >>>>>> 'rpc_clnt_submit' > >>>>>> getting reconnected. But it won't take care of memory leak problem > >>>>>> you > >>>> were > >>>>>> trying to fix. That we have to carefully go through all cases and fix > >>>> it. > >>>>>> Please have a look at it. > >>>>>> > >>>>> Looks good. IIUC, with this patch, we need not set conn->trans to NULL > >>>>> in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak > >>>>> as > >>>>> the transport object shall then get freed as part of > >>>>> 'rpc_clnt_trigger_destroy'. > >>>>> > >>>>> > >>>>>> http://review.gluster.org/#/c/13592/ > >>>>>> > >>>>>> Thanks and Regards, > >>>>>> Kotresh H R > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>>> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > >>>>>>> To: "Soumya Koduri" <skoduri@xxxxxxxxxx> > >>>>>>> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel" > >>>>>>> <gluster-devel@xxxxxxxxxxx> > >>>>>>> Sent: Thursday, March 3, 2016 3:39:11 PM > >>>>>>> Subject: Re: Cores generated with > >>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>>>>> > >>>>>>> Hi Soumya, > >>>>>>> > >>>>>>> I tested the lastes patch [2] on master where your previous patch > >>>>>>> [1] > >>>> in > >>>>>>> merged. > >>>>>>> I see crashes at different places. > >>>>>>> > >>>>>>> 1. If there are code paths that are holding rpc object without > >>>>>>> taking > >>>> ref > >>>>>>> on > >>>>>>> it, all those > >>>>>>> code path will crash on invoking rpc submit on that object as > >>>>>>> rpc > >>>>>>> object > >>>>>>> would have freed > >>>>>>> by last unref on DISCONNECT event. I see this kind of use-case > >>>>>>> in > >>>>>>> chagnelog rpc code. > >>>>>>> Need to check on other users of rpc. > >>>>> Agree. We should fix all such code-paths. Since this seem to be an > >>>>> intricate fix, shall we take these patches only in master branch and > >>>>> not > >>>>> in 3.7 release for now till we fix all such paths as we encounter? > >>>>> > >>>>>>> > >>>>>>> 2. And also we need to take care of reconnect timers that are being > >>>> set > >>>>>>> and > >>>>>>> are re-tried to > >>>>>>> connect back on expiration. In those cases also, we might crash > >>>> as rpc > >>>>>>> object would have freed. > >>>>> Your patch addresses this..right? > >>>>> > >>>>> Thanks, > >>>>> Soumya > >>>>> > >>>>>>> > >>>>>>> > >>>>>>> [1] http://review.gluster.org/#/c/13507/ > >>>>>>> [2] http://review.gluster.org/#/c/13587/ > >>>>>>> > >>>>>>> Thanks and Regards, > >>>>>>> Kotresh H R > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > >>>>>>>> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath > >>>>>>>> Ravishankar" <khiremat@xxxxxxxxxx> > >>>>>>>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > >>>>>>>> Sent: Thursday, March 3, 2016 12:24:00 PM > >>>>>>>> Subject: Re: Cores generated with > >>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>>>>>> > >>>>>>>> Thanks a lot Kotresh. > >>>>>>>> > >>>>>>>> On 03/03/2016 08:47 AM, Raghavendra G wrote: > >>>>>>>>> Hi Soumya, > >>>>>>>>> > >>>>>>>>> Can you send a fix to this regression on upstream master too? This > >>>> patch > >>>>>>>>> is merged there. > >>>>>>>>> > >>>>>>>> I have submitted below patch. > >>>>>>>> http://review.gluster.org/#/c/13587/ > >>>>>>>> > >>>>>>>> Kindly review the same. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Soumya > >>>>>>>> > >>>>>>>>> regards, > >>>>>>>>> Raghavendra > >>>>>>>>> > >>>>>>>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar > >>>>>>>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Soumya, > >>>>>>>>> > >>>>>>>>> I analysed the issue and found out that crash has happened > >>>> because > >>>>>>>>> of the patch [1]. > >>>>>>>>> > >>>>>>>>> The patch doesn't set transport object to NULL in > >>>>>>>>> 'rpc_clnt_disable' > >>>>>>>>> but instead does it on > >>>>>>>>> 'rpc_clnt_trigger_destroy'. So if there are pending rpc > >>>> invocations > >>>>>>>>> on the rpc object that > >>>>>>>>> is disabled (those instances are possible as happening now in > >>>>>>>>> changelog), it will trigger a > >>>>>>>>> CONNECT notify again with 'mydata' that is freed causing a > >>>> crash. > >>>>>>>>> This happens because > >>>>>>>>> 'rpc_clnt_submit' reconnects if rpc is not connected. > >>>>>>>>> > >>>>>>>>> rpc_clnt_submit (...) { > >>>>>>>>> ... > >>>>>>>>> if (conn->connected == 0) { > >>>>>>>>> ret = rpc_transport_connect > >>>> (conn->trans, > >>>>>>>>> > >>>>>>>>> conn->config.remote_port); > >>>>>>>>> } > >>>>>>>>> ... > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> Without your patch, conn->trans was set NULL and hence > >>>>>>>>> CONNECT > >>>>>>>>> fails > >>>>>>>>> not resulting with > >>>>>>>>> CONNECT notify call. And also the cleanup happens in failure > >>>> path. > >>>>>>>>> > >>>>>>>>> So the memory leak can happen, if there is no try for rpc > >>>>>>>>> invocation > >>>>>>>>> after DISCONNECT. > >>>>>>>>> It will be cleaned up otherwise. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> [1] http://review.gluster.org/#/c/13507/ > >>>>>>>>> > >>>>>>>>> Thanks and Regards, > >>>>>>>>> Kotresh H R > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>> > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx > >>>>>>>>> <mailto:khiremat@xxxxxxxxxx>> > >>>>>>>>> > To: "Soumya Koduri" <skoduri@xxxxxxxxxx > >>>>>>>>> > <mailto:skoduri@xxxxxxxxxx>> > >>>>>>>>> > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, > >>>> "Gluster > >>>>>>>>> Devel" <gluster-devel@xxxxxxxxxxx > >>>>>>>>> <mailto:gluster-devel@xxxxxxxxxxx>> > >>>>>>>>> > Sent: Monday, February 29, 2016 4:15:22 PM > >>>>>>>>> > Subject: Re: Cores generated with > >>>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>>>>>>> > > >>>>>>>>> > Hi Soumya, > >>>>>>>>> > > >>>>>>>>> > I just tested that it is reproducible only with your patch > >>>> both > >>>>>>>>> in master and > >>>>>>>>> > 3.76 branch. > >>>>>>>>> > The geo-rep test cases are marked bad in master. So it's > >>>> not hit > >>>>>>>>> in master. > >>>>>>>>> > rpc is introduced > >>>>>>>>> > in changelog xlator to communicate to applications via > >>>>>>>>> libgfchangelog. > >>>>>>>>> > Venky/Me will check > >>>>>>>>> > why is the crash happening and will update. > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > Thanks and Regards, > >>>>>>>>> > Kotresh H R > >>>>>>>>> > > >>>>>>>>> > ----- Original Message ----- > >>>>>>>>> > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx > >>>>>>>>> <mailto:skoduri@xxxxxxxxxx>> > >>>>>>>>> > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, > >>>>>>>>> > > "kotresh" > >>>>>>>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> > >>>>>>>>> > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx > >>>>>>>>> <mailto:gluster-devel@xxxxxxxxxxx>> > >>>>>>>>> > > Sent: Monday, February 29, 2016 2:10:51 PM > >>>>>>>>> > > Subject: Cores generated with > >>>>>>>>> ./tests/geo-rep/georep-basic-dr-tarssh.t > >>>>>>>>> > > > >>>>>>>>> > > Hi Aravinda/Kotresh, > >>>>>>>>> > > > >>>>>>>>> > > With [1], I consistently see cores generated with the > >>>>>>>>> > > test > >>>>>>>>> > > './tests/geo-rep/georep-basic-dr-tarssh.t' in > >>>>>>>>> > > release-3.7 > >>>>>>>>> branch. From > >>>>>>>>> > > the cores, looks like we are trying to dereference a > >>>>>>>>> > > freed > >>>>>>>>> > > changelog_rpc_clnt_t(crpc) object in > >>>> changelog_rpc_notify(). > >>>>>>>>> Strangely > >>>>>>>>> > > this was not reported in master branch. > >>>>>>>>> > > > >>>>>>>>> > > I tried debugging but couldn't find any possible > >>>> suspects. I > >>>>>>>>> request you > >>>>>>>>> > > to take a look and let me know if [1] caused any > >>>> regression. > >>>>>>>>> > > > >>>>>>>>> > > Thanks, > >>>>>>>>> > > Soumya > >>>>>>>>> > > > >>>>>>>>> > > [1] http://review.gluster.org/#/c/13507/ > >>>>>>>>> > > > >>>>>>>>> > > >>>>>>>>> _______________________________________________ > >>>>>>>>> Gluster-devel mailing list > >>>>>>>>> Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx> > >>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Raghavendra G > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Gluster-devel mailing list > >>>>>>> Gluster-devel@xxxxxxxxxxx > >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>>>>> > >>>>> > >>>> _______________________________________________ > >>>> Gluster-devel mailing list > >>>> Gluster-devel@xxxxxxxxxxx > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>> > >>> > >>> > >>> > >>> -- > >>> Raghavendra G > >> > >>> _______________________________________________ > >>> Gluster-devel mailing list > >>> Gluster-devel@xxxxxxxxxxx > >>> http://www.gluster.org/mailman/listinfo/gluster-devel > >> > >> > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel