Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Yes, with this patch we need not set conn->trans to NULL in rpc_clnt_disable

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Raghavendra G" <raghavendra@xxxxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Thursday, March 3, 2016 5:06:00 PM
> Subject: Re:  Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
> 
> 
> 
> On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> > [Replying on top of my own reply]
> >
> > Hi,
> >
> > I have submitted the below patch [1] to avoid the issue of
> > 'rpc_clnt_submit'
> > getting reconnected. But it won't take care of memory leak problem you were
> > trying to fix. That we have to carefully go through all cases and fix it.
> > Please have a look at it.
> >
> Looks good. IIUC, with this patch, we need not set conn->trans to NULL
> in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as
> the transport object shall then get freed as part of
> 'rpc_clnt_trigger_destroy'.
> 
> 
> > http://review.gluster.org/#/c/13592/
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> >> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> >> To: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> >> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel"
> >> <gluster-devel@xxxxxxxxxxx>
> >> Sent: Thursday, March 3, 2016 3:39:11 PM
> >> Subject: Re:  Cores generated with
> >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>
> >> Hi Soumya,
> >>
> >> I tested the lastes patch [2] on master where your previous patch [1] in
> >> merged.
> >> I see crashes at different places.
> >>
> >> 1. If there are code paths that are holding rpc object without taking ref
> >> on
> >> it, all those
> >>     code path will crash on invoking rpc submit on that object as rpc
> >>     object
> >>     would have freed
> >>     by last unref on DISCONNECT event. I see this kind of use-case in
> >>     chagnelog rpc code.
> >>     Need to check on other users of rpc.
> Agree. We should fix all such code-paths. Since this seem to be an
> intricate fix, shall we take these patches only in master branch and not
> in 3.7 release for now till we fix all such paths as we encounter?
> 
> >>
> >> 2. And also we need to take care of reconnect timers that are being set
> >> and
> >> are re-tried to
> >>     connect back on expiration. In those cases also, we might crash as rpc
> >>     object would have freed.
> Your patch addresses this..right?
> 
> Thanks,
> Soumya
> 
> >>
> >>
> >> [1] http://review.gluster.org/#/c/13507/
> >> [2] http://review.gluster.org/#/c/13587/
> >>
> >> Thanks and Regards,
> >> Kotresh H R
> >>
> >> ----- Original Message -----
> >>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> >>> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath
> >>> Ravishankar" <khiremat@xxxxxxxxxx>
> >>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> >>> Subject: Re:  Cores generated with
> >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>
> >>> Thanks a lot Kotresh.
> >>>
> >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> >>>> Hi Soumya,
> >>>>
> >>>> Can you send a fix to this regression on upstream master too? This patch
> >>>> is merged there.
> >>>>
> >>> I have submitted below patch.
> >>> 	http://review.gluster.org/#/c/13587/
> >>>
> >>> Kindly review the same.
> >>>
> >>> Thanks,
> >>> Soumya
> >>>
> >>>> regards,
> >>>> Raghavendra
> >>>>
> >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> >>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote:
> >>>>
> >>>>      Hi Soumya,
> >>>>
> >>>>      I analysed the issue and found out that crash has happened because
> >>>>      of the patch [1].
> >>>>
> >>>>      The patch doesn't set transport object to NULL in
> >>>>      'rpc_clnt_disable'
> >>>>      but instead does it on
> >>>>      'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
> >>>>      on the rpc object that
> >>>>      is disabled (those instances are possible as happening now in
> >>>>      changelog), it will trigger a
> >>>>      CONNECT notify again with 'mydata' that is freed causing a crash.
> >>>>      This happens because
> >>>>      'rpc_clnt_submit' reconnects if rpc is not connected.
> >>>>
> >>>>        rpc_clnt_submit (...) {
> >>>>          ...
> >>>>                       if (conn->connected == 0) {
> >>>>                               ret = rpc_transport_connect (conn->trans,
> >>>>
> >>>>        conn->config.remote_port);
> >>>>                       }
> >>>>          ...
> >>>>        }
> >>>>
> >>>>      Without your patch, conn->trans was set NULL and hence CONNECT
> >>>>      fails
> >>>>      not resulting with
> >>>>      CONNECT notify call. And also the cleanup happens in failure path.
> >>>>
> >>>>      So the memory leak can happen, if there is no try for rpc
> >>>>      invocation
> >>>>      after DISCONNECT.
> >>>>      It will be cleaned up otherwise.
> >>>>
> >>>>
> >>>>      [1] http://review.gluster.org/#/c/13507/
> >>>>
> >>>>      Thanks and Regards,
> >>>>      Kotresh H R
> >>>>
> >>>>      ----- Original Message -----
> >>>>       > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx
> >>>>      <mailto:khiremat@xxxxxxxxxx>>
> >>>>       > To: "Soumya Koduri" <skoduri@xxxxxxxxxx
> >>>>       > <mailto:skoduri@xxxxxxxxxx>>
> >>>>       > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "Gluster
> >>>>      Devel" <gluster-devel@xxxxxxxxxxx
> >>>>      <mailto:gluster-devel@xxxxxxxxxxx>>
> >>>>       > Sent: Monday, February 29, 2016 4:15:22 PM
> >>>>       > Subject: Re: Cores generated with
> >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>       >
> >>>>       > Hi Soumya,
> >>>>       >
> >>>>       > I just tested that it is reproducible only with your patch both
> >>>>      in master and
> >>>>       > 3.76 branch.
> >>>>       > The geo-rep test cases are marked bad in master. So it's not hit
> >>>>      in master.
> >>>>       > rpc is introduced
> >>>>       > in changelog xlator to communicate to applications via
> >>>>      libgfchangelog.
> >>>>       > Venky/Me will check
> >>>>       > why is the crash happening and will update.
> >>>>       >
> >>>>       >
> >>>>       > Thanks and Regards,
> >>>>       > Kotresh H R
> >>>>       >
> >>>>       > ----- Original Message -----
> >>>>       > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx
> >>>>      <mailto:skoduri@xxxxxxxxxx>>
> >>>>       > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>,
> >>>>       > > "kotresh"
> >>>>      <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>
> >>>>       > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
> >>>>      <mailto:gluster-devel@xxxxxxxxxxx>>
> >>>>       > > Sent: Monday, February 29, 2016 2:10:51 PM
> >>>>       > > Subject: Cores generated with
> >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> >>>>       > >
> >>>>       > > Hi Aravinda/Kotresh,
> >>>>       > >
> >>>>       > > With [1], I consistently see cores generated with the test
> >>>>       > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> >>>>      branch. From
> >>>>       > > the cores, looks like we are trying to dereference a freed
> >>>>       > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
> >>>>      Strangely
> >>>>       > > this was not reported in master branch.
> >>>>       > >
> >>>>       > > I tried debugging but couldn't find any possible suspects. I
> >>>>      request you
> >>>>       > > to take a look and let me know if [1] caused any regression.
> >>>>       > >
> >>>>       > > Thanks,
> >>>>       > > Soumya
> >>>>       > >
> >>>>       > > [1] http://review.gluster.org/#/c/13507/
> >>>>       > >
> >>>>       >
> >>>>      _______________________________________________
> >>>>      Gluster-devel mailing list
> >>>>      Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
> >>>>      http://www.gluster.org/mailman/listinfo/gluster-devel
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Raghavendra G
> >>>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel@xxxxxxxxxxx
> >> http://www.gluster.org/mailman/listinfo/gluster-devel
> >>
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux