Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 04, 2016 at 02:02:32PM +0530, Raghavendra G wrote:
> On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar <
> khiremat@xxxxxxxxxx> wrote:
> 
> > Hi,
> >
> > Yes, with this patch we need not set conn->trans to NULL in
> > rpc_clnt_disable
> >
> 
> While [1] fixes the crash, things can be improved in the way how changelog
> is using rpc.
> 
> 1. In the current code, there is an rpc_clnt object leak during disconnect
> event.

Just for the record (as I couldn't find this information while building up
rpc infrastructure in changelog):

Unref rpc-clnt object after calling rpc_clnt_disable() in notify() upon
RPC_CLNT_DISCONNECT. Free up 'mydata' in notify() upon RPC_CLNT_DESTROY.

> 2. Also, freed "mydata" of changelog is still associated with rpc_clnt
> object (corollary of 1), though change log might not get any events with
> "mydata" (as connection is dead).
> 
> I've discussed with Kotresh about changes needed, offline. So, following
> are the action items.
> 1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
> 2. [2] can be accepted. However, someone might want to re-use an rpc object
> after disabling it, like introducing a new api rpc_clnt_enable_again
> (though no of such use-cases is very less). But [2] doesn't allow it. The
> point is as long as rpc-clnt object is alive, transport object is alive
> (though disconnected) and we can re-use it. So, I would prefer not to
> accept it.
> 3. Kotresh will work on new changes to make sure changelog makes correct
> use of rpc-clnt.
> 
> [1] http://review.gluster.org/#/c/13592
> [2] http://review.gluster.org/#/c/1359
> 
> regards,
> Raghavendra.
> 
> 
> > Thanks and Regards,
> > Kotresh H R
> >
> > ----- Original Message -----
> > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Raghavendra
> > G" <raghavendra@xxxxxxxxxxx>
> > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > Sent: Thursday, March 3, 2016 5:06:00 PM
> > > Subject: Re:  Cores generated with
> > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > >
> > >
> > >
> > > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
> > > > [Replying on top of my own reply]
> > > >
> > > > Hi,
> > > >
> > > > I have submitted the below patch [1] to avoid the issue of
> > > > 'rpc_clnt_submit'
> > > > getting reconnected. But it won't take care of memory leak problem you
> > were
> > > > trying to fix. That we have to carefully go through all cases and fix
> > it.
> > > > Please have a look at it.
> > > >
> > > Looks good. IIUC, with this patch, we need not set conn->trans to NULL
> > > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as
> > > the transport object shall then get freed as part of
> > > 'rpc_clnt_trigger_destroy'.
> > >
> > >
> > > > http://review.gluster.org/#/c/13592/
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > > ----- Original Message -----
> > > >> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> > > >> To: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> > > >> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel"
> > > >> <gluster-devel@xxxxxxxxxxx>
> > > >> Sent: Thursday, March 3, 2016 3:39:11 PM
> > > >> Subject: Re:  Cores generated with
> > > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>
> > > >> Hi Soumya,
> > > >>
> > > >> I tested the lastes patch [2] on master where your previous patch [1]
> > in
> > > >> merged.
> > > >> I see crashes at different places.
> > > >>
> > > >> 1. If there are code paths that are holding rpc object without taking
> > ref
> > > >> on
> > > >> it, all those
> > > >>     code path will crash on invoking rpc submit on that object as rpc
> > > >>     object
> > > >>     would have freed
> > > >>     by last unref on DISCONNECT event. I see this kind of use-case in
> > > >>     chagnelog rpc code.
> > > >>     Need to check on other users of rpc.
> > > Agree. We should fix all such code-paths. Since this seem to be an
> > > intricate fix, shall we take these patches only in master branch and not
> > > in 3.7 release for now till we fix all such paths as we encounter?
> > >
> > > >>
> > > >> 2. And also we need to take care of reconnect timers that are being
> > set
> > > >> and
> > > >> are re-tried to
> > > >>     connect back on expiration. In those cases also, we might crash
> > as rpc
> > > >>     object would have freed.
> > > Your patch addresses this..right?
> > >
> > > Thanks,
> > > Soumya
> > >
> > > >>
> > > >>
> > > >> [1] http://review.gluster.org/#/c/13507/
> > > >> [2] http://review.gluster.org/#/c/13587/
> > > >>
> > > >> Thanks and Regards,
> > > >> Kotresh H R
> > > >>
> > > >> ----- Original Message -----
> > > >>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> > > >>> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath
> > > >>> Ravishankar" <khiremat@xxxxxxxxxx>
> > > >>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > > >>> Sent: Thursday, March 3, 2016 12:24:00 PM
> > > >>> Subject: Re:  Cores generated with
> > > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>
> > > >>> Thanks a lot Kotresh.
> > > >>>
> > > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > > >>>> Hi Soumya,
> > > >>>>
> > > >>>> Can you send a fix to this regression on upstream master too? This
> > patch
> > > >>>> is merged there.
> > > >>>>
> > > >>> I have submitted below patch.
> > > >>>   http://review.gluster.org/#/c/13587/
> > > >>>
> > > >>> Kindly review the same.
> > > >>>
> > > >>> Thanks,
> > > >>> Soumya
> > > >>>
> > > >>>> regards,
> > > >>>> Raghavendra
> > > >>>>
> > > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> > > >>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote:
> > > >>>>
> > > >>>>      Hi Soumya,
> > > >>>>
> > > >>>>      I analysed the issue and found out that crash has happened
> > because
> > > >>>>      of the patch [1].
> > > >>>>
> > > >>>>      The patch doesn't set transport object to NULL in
> > > >>>>      'rpc_clnt_disable'
> > > >>>>      but instead does it on
> > > >>>>      'rpc_clnt_trigger_destroy'. So if there are pending rpc
> > invocations
> > > >>>>      on the rpc object that
> > > >>>>      is disabled (those instances are possible as happening now in
> > > >>>>      changelog), it will trigger a
> > > >>>>      CONNECT notify again with 'mydata' that is freed causing a
> > crash.
> > > >>>>      This happens because
> > > >>>>      'rpc_clnt_submit' reconnects if rpc is not connected.
> > > >>>>
> > > >>>>        rpc_clnt_submit (...) {
> > > >>>>          ...
> > > >>>>                       if (conn->connected == 0) {
> > > >>>>                               ret = rpc_transport_connect
> > (conn->trans,
> > > >>>>
> > > >>>>        conn->config.remote_port);
> > > >>>>                       }
> > > >>>>          ...
> > > >>>>        }
> > > >>>>
> > > >>>>      Without your patch, conn->trans was set NULL and hence CONNECT
> > > >>>>      fails
> > > >>>>      not resulting with
> > > >>>>      CONNECT notify call. And also the cleanup happens in failure
> > path.
> > > >>>>
> > > >>>>      So the memory leak can happen, if there is no try for rpc
> > > >>>>      invocation
> > > >>>>      after DISCONNECT.
> > > >>>>      It will be cleaned up otherwise.
> > > >>>>
> > > >>>>
> > > >>>>      [1] http://review.gluster.org/#/c/13507/
> > > >>>>
> > > >>>>      Thanks and Regards,
> > > >>>>      Kotresh H R
> > > >>>>
> > > >>>>      ----- Original Message -----
> > > >>>>       > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx
> > > >>>>      <mailto:khiremat@xxxxxxxxxx>>
> > > >>>>       > To: "Soumya Koduri" <skoduri@xxxxxxxxxx
> > > >>>>       > <mailto:skoduri@xxxxxxxxxx>>
> > > >>>>       > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>,
> > "Gluster
> > > >>>>      Devel" <gluster-devel@xxxxxxxxxxx
> > > >>>>      <mailto:gluster-devel@xxxxxxxxxxx>>
> > > >>>>       > Sent: Monday, February 29, 2016 4:15:22 PM
> > > >>>>       > Subject: Re: Cores generated with
> > > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>>       >
> > > >>>>       > Hi Soumya,
> > > >>>>       >
> > > >>>>       > I just tested that it is reproducible only with your patch
> > both
> > > >>>>      in master and
> > > >>>>       > 3.76 branch.
> > > >>>>       > The geo-rep test cases are marked bad in master. So it's
> > not hit
> > > >>>>      in master.
> > > >>>>       > rpc is introduced
> > > >>>>       > in changelog xlator to communicate to applications via
> > > >>>>      libgfchangelog.
> > > >>>>       > Venky/Me will check
> > > >>>>       > why is the crash happening and will update.
> > > >>>>       >
> > > >>>>       >
> > > >>>>       > Thanks and Regards,
> > > >>>>       > Kotresh H R
> > > >>>>       >
> > > >>>>       > ----- Original Message -----
> > > >>>>       > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx
> > > >>>>      <mailto:skoduri@xxxxxxxxxx>>
> > > >>>>       > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>,
> > > >>>>       > > "kotresh"
> > > >>>>      <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>
> > > >>>>       > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
> > > >>>>      <mailto:gluster-devel@xxxxxxxxxxx>>
> > > >>>>       > > Sent: Monday, February 29, 2016 2:10:51 PM
> > > >>>>       > > Subject: Cores generated with
> > > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
> > > >>>>       > >
> > > >>>>       > > Hi Aravinda/Kotresh,
> > > >>>>       > >
> > > >>>>       > > With [1], I consistently see cores generated with the test
> > > >>>>       > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> > > >>>>      branch. From
> > > >>>>       > > the cores, looks like we are trying to dereference a freed
> > > >>>>       > > changelog_rpc_clnt_t(crpc) object in
> > changelog_rpc_notify().
> > > >>>>      Strangely
> > > >>>>       > > this was not reported in master branch.
> > > >>>>       > >
> > > >>>>       > > I tried debugging but couldn't find any possible
> > suspects. I
> > > >>>>      request you
> > > >>>>       > > to take a look and let me know if [1] caused any
> > regression.
> > > >>>>       > >
> > > >>>>       > > Thanks,
> > > >>>>       > > Soumya
> > > >>>>       > >
> > > >>>>       > > [1] http://review.gluster.org/#/c/13507/
> > > >>>>       > >
> > > >>>>       >
> > > >>>>      _______________________________________________
> > > >>>>      Gluster-devel mailing list
> > > >>>>      Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
> > > >>>>      http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Raghavendra G
> > > >>>
> > > >> _______________________________________________
> > > >> Gluster-devel mailing list
> > > >> Gluster-devel@xxxxxxxxxxx
> > > >> http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >>
> > >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
> 
> 
> 
> -- 
> Raghavendra G

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux