Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Raghavendra,

On 03/04/2016 03:09 PM, Raghavendra G wrote:


On Fri, Mar 4, 2016 at 2:02 PM, Raghavendra G <raghavendra@xxxxxxxxxxx
<mailto:raghavendra@xxxxxxxxxxx>> wrote:



    On Thu, Mar 3, 2016 at 6:26 PM, Kotresh Hiremath Ravishankar
    <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote:

        Hi,

        Yes, with this patch we need not set conn->trans to NULL in
        rpc_clnt_disable


    While [1] fixes the crash, things can be improved in the way how
    changelog is using rpc.

    1. In the current code, there is an rpc_clnt object leak during
    disconnect event.
    2. Also, freed "mydata" of changelog is still associated with
    rpc_clnt object (corollary of 1), though change log might not get
    any events with "mydata" (as connection is dead).

    I've discussed with Kotresh about changes needed, offline. So,
    following are the action items.
    1. Soumya's patch [2] is valid and is needed for 3.7 branch too.
    2. [2] can be accepted. However, someone might want to re-use an rpc
    object after disabling it, like introducing a new api
    rpc_clnt_enable_again (though no of such use-cases is very less).
    But [2] doesn't allow it. The point is as long as rpc-clnt object is
    alive, transport object is alive (though disconnected) and we can
    re-use it. So, I would prefer not to accept it.


[2] will be accepted now.

The link mentioned seems to be invalid. Just to be clear, we have 3 patches in question here -

[1] Original patch (merged in master but not in release-3.7) http://review.gluster.org/13456

There are two patches proposed to fix the regression
[2] http://review.gluster.org/#/c/13587/
[3] http://review.gluster.org/#/c/13592/

Since the patch by Kotresh [3] completely fixes the regression, we have decided to abandon [2]. In addition, since these fixes look very intricate, till we are sure that the code is stable, we thought it may be better not to back-port these patches to stable release branch.

Please confirm if you are implying to revive [2] and back-port all these 3 patches to release-3.7 branch?

Thanks,
Soumya

    3. Kotresh will work on new changes to make sure changelog makes
    correct use of rpc-clnt.

    [1] http://review.gluster.org/#/c/13592
    [2] http://review.gluster.org/#/c/1359

    regards,
    Raghavendra.


        Thanks and Regards,
        Kotresh H R

        ----- Original Message -----
        > From: "Soumya Koduri" <skoduri@xxxxxxxxxx <mailto:skoduri@xxxxxxxxxx>>
        > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>, "Raghavendra
        G" <raghavendra@xxxxxxxxxxx <mailto:raghavendra@xxxxxxxxxxx>>
        > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx <mailto:gluster-devel@xxxxxxxxxxx>>
         > Sent: Thursday, March 3, 2016 5:06:00 PM
         > Subject: Re:  Cores generated with
        ./tests/geo-rep/georep-basic-dr-tarssh.t
         >
         >
         >
         > On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
         > > [Replying on top of my own reply]
         > >
         > > Hi,
         > >
         > > I have submitted the below patch [1] to avoid the issue of
         > > 'rpc_clnt_submit'
         > > getting reconnected. But it won't take care of memory leak
        problem you were
         > > trying to fix. That we have to carefully go through all
        cases and fix it.
         > > Please have a look at it.
         > >
         > Looks good. IIUC, with this patch, we need not set
        conn->trans to NULL
         > in 'rpc_clnt_disable()'. Right? If yes, then it takes care of
        memleak as
         > the transport object shall then get freed as part of
         > 'rpc_clnt_trigger_destroy'.
         >
         >
         > > http://review.gluster.org/#/c/13592/
         > >
         > > Thanks and Regards,
         > > Kotresh H R
         > >
         > > ----- Original Message -----
         > >> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx
        <mailto:khiremat@xxxxxxxxxx>>
         > >> To: "Soumya Koduri" <skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>>
         > >> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx
        <mailto:raghavendra@xxxxxxxxxxx>>, "Gluster Devel"
         > >> <gluster-devel@xxxxxxxxxxx <mailto:gluster-devel@xxxxxxxxxxx>>
         > >> Sent: Thursday, March 3, 2016 3:39:11 PM
         > >> Subject: Re:  Cores generated with
         > >> ./tests/geo-rep/georep-basic-dr-tarssh.t
         > >>
         > >> Hi Soumya,
         > >>
         > >> I tested the lastes patch [2] on master where your
        previous patch [1] in
         > >> merged.
         > >> I see crashes at different places.
         > >>
         > >> 1. If there are code paths that are holding rpc object
        without taking ref
         > >> on
         > >> it, all those
         > >>     code path will crash on invoking rpc submit on that
        object as rpc
         > >>     object
         > >>     would have freed
         > >>     by last unref on DISCONNECT event. I see this kind of
        use-case in
         > >>     chagnelog rpc code.
         > >>     Need to check on other users of rpc.
         > Agree. We should fix all such code-paths. Since this seem to
        be an
         > intricate fix, shall we take these patches only in master
        branch and not
         > in 3.7 release for now till we fix all such paths as we
        encounter?
         >
         > >>
         > >> 2. And also we need to take care of reconnect timers that
        are being set
         > >> and
         > >> are re-tried to
         > >>     connect back on expiration. In those cases also, we
        might crash as rpc
         > >>     object would have freed.
         > Your patch addresses this..right?
         >
         > Thanks,
         > Soumya
         >
         > >>
         > >>
         > >> [1] http://review.gluster.org/#/c/13507/
         > >> [2] http://review.gluster.org/#/c/13587/
         > >>
         > >> Thanks and Regards,
         > >> Kotresh H R
         > >>
         > >> ----- Original Message -----
         > >>> From: "Soumya Koduri" <skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>>
         > >>> To: "Raghavendra G" <raghavendra@xxxxxxxxxxx
        <mailto:raghavendra@xxxxxxxxxxx>>, "Kotresh Hiremath
         > >>> Ravishankar" <khiremat@xxxxxxxxxx
        <mailto:khiremat@xxxxxxxxxx>>
         > >>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
        <mailto:gluster-devel@xxxxxxxxxxx>>
         > >>> Sent: Thursday, March 3, 2016 12:24:00 PM
         > >>> Subject: Re:  Cores generated with
         > >>> ./tests/geo-rep/georep-basic-dr-tarssh.t
         > >>>
         > >>> Thanks a lot Kotresh.
         > >>>
         > >>> On 03/03/2016 08:47 AM, Raghavendra G wrote:
         > >>>> Hi Soumya,
         > >>>>
         > >>>> Can you send a fix to this regression on upstream master
        too? This patch
         > >>>> is merged there.
         > >>>>
         > >>> I have submitted below patch.
         > >>> http://review.gluster.org/#/c/13587/
         > >>>
         > >>> Kindly review the same.
         > >>>
         > >>> Thanks,
         > >>> Soumya
         > >>>
         > >>>> regards,
         > >>>> Raghavendra
         > >>>>
         > >>>> On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath
        Ravishankar
         > >>>> <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>
        <mailto:khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>> wrote:
         > >>>>
         > >>>>      Hi Soumya,
         > >>>>
         > >>>>      I analysed the issue and found out that crash has
        happened because
         > >>>>      of the patch [1].
         > >>>>
         > >>>>      The patch doesn't set transport object to NULL in
         > >>>>      'rpc_clnt_disable'
         > >>>>      but instead does it on
         > >>>>      'rpc_clnt_trigger_destroy'. So if there are pending
        rpc invocations
         > >>>>      on the rpc object that
         > >>>>      is disabled (those instances are possible as
        happening now in
         > >>>>      changelog), it will trigger a
         > >>>>      CONNECT notify again with 'mydata' that is freed
        causing a crash.
         > >>>>      This happens because
         > >>>>      'rpc_clnt_submit' reconnects if rpc is not connected.
         > >>>>
         > >>>>        rpc_clnt_submit (...) {
         > >>>>          ...
         > >>>>                       if (conn->connected == 0) {
         > >>>>                               ret =
        rpc_transport_connect (conn->trans,
         > >>>>
         > >>>>        conn->config.remote_port);
         > >>>>                       }
         > >>>>          ...
         > >>>>        }
         > >>>>
         > >>>>      Without your patch, conn->trans was set NULL and
        hence CONNECT
         > >>>>      fails
         > >>>>      not resulting with
         > >>>>      CONNECT notify call. And also the cleanup happens
        in failure path.
         > >>>>
         > >>>>      So the memory leak can happen, if there is no try
        for rpc
         > >>>>      invocation
         > >>>>      after DISCONNECT.
         > >>>>      It will be cleaned up otherwise.
         > >>>>
         > >>>>
         > >>>>      [1] http://review.gluster.org/#/c/13507/
         > >>>>
         > >>>>      Thanks and Regards,
         > >>>>      Kotresh H R
         > >>>>
         > >>>>      ----- Original Message -----
         > >>>>       > From: "Kotresh Hiremath Ravishankar"
        <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>
         > >>>>      <mailto:khiremat@xxxxxxxxxx
        <mailto:khiremat@xxxxxxxxxx>>>
         > >>>>       > To: "Soumya Koduri" <skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>
         > >>>>       > <mailto:skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>>>
         > >>>>       > Cc: avishwan@xxxxxxxxxx
        <mailto:avishwan@xxxxxxxxxx> <mailto:avishwan@xxxxxxxxxx
        <mailto:avishwan@xxxxxxxxxx>>, "Gluster
         > >>>>      Devel" <gluster-devel@xxxxxxxxxxx
        <mailto:gluster-devel@xxxxxxxxxxx>
         > >>>>      <mailto:gluster-devel@xxxxxxxxxxx
        <mailto:gluster-devel@xxxxxxxxxxx>>>
         > >>>>       > Sent: Monday, February 29, 2016 4:15:22 PM
         > >>>>       > Subject: Re: Cores generated with
         > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
         > >>>>       >
         > >>>>       > Hi Soumya,
         > >>>>       >
         > >>>>       > I just tested that it is reproducible only with
        your patch both
         > >>>>      in master and
         > >>>>       > 3.76 branch.
         > >>>>       > The geo-rep test cases are marked bad in master.
        So it's not hit
         > >>>>      in master.
         > >>>>       > rpc is introduced
         > >>>>       > in changelog xlator to communicate to
        applications via
         > >>>>      libgfchangelog.
         > >>>>       > Venky/Me will check
         > >>>>       > why is the crash happening and will update.
         > >>>>       >
         > >>>>       >
         > >>>>       > Thanks and Regards,
         > >>>>       > Kotresh H R
         > >>>>       >
         > >>>>       > ----- Original Message -----
         > >>>>       > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>
         > >>>>      <mailto:skoduri@xxxxxxxxxx
        <mailto:skoduri@xxxxxxxxxx>>>
         > >>>>       > > To: avishwan@xxxxxxxxxx
        <mailto:avishwan@xxxxxxxxxx> <mailto:avishwan@xxxxxxxxxx
        <mailto:avishwan@xxxxxxxxxx>>,
         > >>>>       > > "kotresh"
         > >>>>      <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>
        <mailto:khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>>
         > >>>>       > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
        <mailto:gluster-devel@xxxxxxxxxxx>
         > >>>>      <mailto:gluster-devel@xxxxxxxxxxx
        <mailto:gluster-devel@xxxxxxxxxxx>>>
         > >>>>       > > Sent: Monday, February 29, 2016 2:10:51 PM
         > >>>>       > > Subject: Cores generated with
         > >>>>      ./tests/geo-rep/georep-basic-dr-tarssh.t
         > >>>>       > >
         > >>>>       > > Hi Aravinda/Kotresh,
         > >>>>       > >
         > >>>>       > > With [1], I consistently see cores generated
        with the test
         > >>>>       > > './tests/geo-rep/georep-basic-dr-tarssh.t' in
        release-3.7
         > >>>>      branch. From
         > >>>>       > > the cores, looks like we are trying to
        dereference a freed
         > >>>>       > > changelog_rpc_clnt_t(crpc) object in
        changelog_rpc_notify().
         > >>>>      Strangely
         > >>>>       > > this was not reported in master branch.
         > >>>>       > >
         > >>>>       > > I tried debugging but couldn't find any
        possible suspects. I
         > >>>>      request you
         > >>>>       > > to take a look and let me know if [1] caused
        any regression.
         > >>>>       > >
         > >>>>       > > Thanks,
         > >>>>       > > Soumya
         > >>>>       > >
         > >>>>       > > [1] http://review.gluster.org/#/c/13507/
         > >>>>       > >
         > >>>>       >
         > >>>>      _______________________________________________
         > >>>>      Gluster-devel mailing list
         > >>>> Gluster-devel@xxxxxxxxxxx
        <mailto:Gluster-devel@xxxxxxxxxxx>
        <mailto:Gluster-devel@xxxxxxxxxxx
        <mailto:Gluster-devel@xxxxxxxxxxx>>
         > >>>> http://www.gluster.org/mailman/listinfo/gluster-devel
         > >>>>
         > >>>>
         > >>>>
         > >>>>
         > >>>> --
         > >>>> Raghavendra G
         > >>>
         > >> _______________________________________________
         > >> Gluster-devel mailing list
         > >> Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
         > >> http://www.gluster.org/mailman/listinfo/gluster-devel
         > >>
         >
        _______________________________________________
        Gluster-devel mailing list
        Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
        http://www.gluster.org/mailman/listinfo/gluster-devel




    --
    Raghavendra G




--
Raghavendra G
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux