Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/03/2016 04:58 PM, Kotresh Hiremath Ravishankar wrote:
[Replying on top of my own reply]

Hi,

I have submitted the below patch [1] to avoid the issue of 'rpc_clnt_submit'
getting reconnected. But it won't take care of memory leak problem you were
trying to fix. That we have to carefully go through all cases and fix it.
Please have a look at it.

Looks good. IIUC, with this patch, we need not set conn->trans to NULL in 'rpc_clnt_disable()'. Right? If yes, then it takes care of memleak as the transport object shall then get freed as part of 'rpc_clnt_trigger_destroy'.


http://review.gluster.org/#/c/13592/

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
To: "Soumya Koduri" <skoduri@xxxxxxxxxx>
Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Thursday, March 3, 2016 3:39:11 PM
Subject: Re:  Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

Hi Soumya,

I tested the lastes patch [2] on master where your previous patch [1] in
merged.
I see crashes at different places.

1. If there are code paths that are holding rpc object without taking ref on
it, all those
    code path will crash on invoking rpc submit on that object as rpc object
    would have freed
    by last unref on DISCONNECT event. I see this kind of use-case in
    chagnelog rpc code.
    Need to check on other users of rpc.
Agree. We should fix all such code-paths. Since this seem to be an intricate fix, shall we take these patches only in master branch and not in 3.7 release for now till we fix all such paths as we encounter?


2. And also we need to take care of reconnect timers that are being set and
are re-tried to
    connect back on expiration. In those cases also, we might crash as rpc
    object would have freed.
Your patch addresses this..right?

Thanks,
Soumya



[1] http://review.gluster.org/#/c/13507/
[2] http://review.gluster.org/#/c/13587/

Thanks and Regards,
Kotresh H R

----- Original Message -----
From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath
Ravishankar" <khiremat@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Thursday, March 3, 2016 12:24:00 PM
Subject: Re:  Cores generated with
./tests/geo-rep/georep-basic-dr-tarssh.t

Thanks a lot Kotresh.

On 03/03/2016 08:47 AM, Raghavendra G wrote:
Hi Soumya,

Can you send a fix to this regression on upstream master too? This patch
is merged there.

I have submitted below patch.
	http://review.gluster.org/#/c/13587/

Kindly review the same.

Thanks,
Soumya

regards,
Raghavendra

On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
<khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote:

     Hi Soumya,

     I analysed the issue and found out that crash has happened because
     of the patch [1].

     The patch doesn't set transport object to NULL in 'rpc_clnt_disable'
     but instead does it on
     'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
     on the rpc object that
     is disabled (those instances are possible as happening now in
     changelog), it will trigger a
     CONNECT notify again with 'mydata' that is freed causing a crash.
     This happens because
     'rpc_clnt_submit' reconnects if rpc is not connected.

       rpc_clnt_submit (...) {
         ...
                      if (conn->connected == 0) {
                              ret = rpc_transport_connect (conn->trans,

       conn->config.remote_port);
                      }
         ...
       }

     Without your patch, conn->trans was set NULL and hence CONNECT fails
     not resulting with
     CONNECT notify call. And also the cleanup happens in failure path.

     So the memory leak can happen, if there is no try for rpc invocation
     after DISCONNECT.
     It will be cleaned up otherwise.


     [1] http://review.gluster.org/#/c/13507/

     Thanks and Regards,
     Kotresh H R

     ----- Original Message -----
      > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx
     <mailto:khiremat@xxxxxxxxxx>>
      > To: "Soumya Koduri" <skoduri@xxxxxxxxxx
      > <mailto:skoduri@xxxxxxxxxx>>
      > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "Gluster
     Devel" <gluster-devel@xxxxxxxxxxx <mailto:gluster-devel@xxxxxxxxxxx>>
      > Sent: Monday, February 29, 2016 4:15:22 PM
      > Subject: Re: Cores generated with
     ./tests/geo-rep/georep-basic-dr-tarssh.t
      >
      > Hi Soumya,
      >
      > I just tested that it is reproducible only with your patch both
     in master and
      > 3.76 branch.
      > The geo-rep test cases are marked bad in master. So it's not hit
     in master.
      > rpc is introduced
      > in changelog xlator to communicate to applications via
     libgfchangelog.
      > Venky/Me will check
      > why is the crash happening and will update.
      >
      >
      > Thanks and Regards,
      > Kotresh H R
      >
      > ----- Original Message -----
      > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx
     <mailto:skoduri@xxxxxxxxxx>>
      > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "kotresh"
     <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>
      > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
     <mailto:gluster-devel@xxxxxxxxxxx>>
      > > Sent: Monday, February 29, 2016 2:10:51 PM
      > > Subject: Cores generated with
     ./tests/geo-rep/georep-basic-dr-tarssh.t
      > >
      > > Hi Aravinda/Kotresh,
      > >
      > > With [1], I consistently see cores generated with the test
      > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
     branch. From
      > > the cores, looks like we are trying to dereference a freed
      > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
     Strangely
      > > this was not reported in master branch.
      > >
      > > I tried debugging but couldn't find any possible suspects. I
     request you
      > > to take a look and let me know if [1] caused any regression.
      > >
      > > Thanks,
      > > Soumya
      > >
      > > [1] http://review.gluster.org/#/c/13507/
      > >
      >
     _______________________________________________
     Gluster-devel mailing list
     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
     http://www.gluster.org/mailman/listinfo/gluster-devel




--
Raghavendra G

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux