Re: Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Replying on top of my own reply]

Hi,

I have submitted the below patch [1] to avoid the issue of 'rpc_clnt_submit'
getting reconnected. But it won't take care of memory leak problem you were
trying to fix. That we have to carefully go through all cases and fix it.
Please have a look at it.

http://review.gluster.org/#/c/13592/

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> To: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> Sent: Thursday, March 3, 2016 3:39:11 PM
> Subject: Re:  Cores generated with ./tests/geo-rep/georep-basic-dr-tarssh.t
> 
> Hi Soumya,
> 
> I tested the lastes patch [2] on master where your previous patch [1] in
> merged.
> I see crashes at different places.
> 
> 1. If there are code paths that are holding rpc object without taking ref on
> it, all those
>    code path will crash on invoking rpc submit on that object as rpc object
>    would have freed
>    by last unref on DISCONNECT event. I see this kind of use-case in
>    chagnelog rpc code.
>    Need to check on other users of rpc.
>    
> 2. And also we need to take care of reconnect timers that are being set and
> are re-tried to
>    connect back on expiration. In those cases also, we might crash as rpc
>    object would have freed.
>    
> 
> [1] http://review.gluster.org/#/c/13507/
> [2] http://review.gluster.org/#/c/13587/
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
> > To: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, "Kotresh Hiremath
> > Ravishankar" <khiremat@xxxxxxxxxx>
> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > Sent: Thursday, March 3, 2016 12:24:00 PM
> > Subject: Re:  Cores generated with
> > ./tests/geo-rep/georep-basic-dr-tarssh.t
> > 
> > Thanks a lot Kotresh.
> > 
> > On 03/03/2016 08:47 AM, Raghavendra G wrote:
> > > Hi Soumya,
> > >
> > > Can you send a fix to this regression on upstream master too? This patch
> > > is merged there.
> > >
> > I have submitted below patch.
> > 	http://review.gluster.org/#/c/13587/
> > 
> > Kindly review the same.
> > 
> > Thanks,
> > Soumya
> > 
> > > regards,
> > > Raghavendra
> > >
> > > On Tue, Mar 1, 2016 at 10:34 PM, Kotresh Hiremath Ravishankar
> > > <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>> wrote:
> > >
> > >     Hi Soumya,
> > >
> > >     I analysed the issue and found out that crash has happened because
> > >     of the patch [1].
> > >
> > >     The patch doesn't set transport object to NULL in 'rpc_clnt_disable'
> > >     but instead does it on
> > >     'rpc_clnt_trigger_destroy'. So if there are pending rpc invocations
> > >     on the rpc object that
> > >     is disabled (those instances are possible as happening now in
> > >     changelog), it will trigger a
> > >     CONNECT notify again with 'mydata' that is freed causing a crash.
> > >     This happens because
> > >     'rpc_clnt_submit' reconnects if rpc is not connected.
> > >
> > >       rpc_clnt_submit (...) {
> > >         ...
> > >                      if (conn->connected == 0) {
> > >                              ret = rpc_transport_connect (conn->trans,
> > >
> > >       conn->config.remote_port);
> > >                      }
> > >         ...
> > >       }
> > >
> > >     Without your patch, conn->trans was set NULL and hence CONNECT fails
> > >     not resulting with
> > >     CONNECT notify call. And also the cleanup happens in failure path.
> > >
> > >     So the memory leak can happen, if there is no try for rpc invocation
> > >     after DISCONNECT.
> > >     It will be cleaned up otherwise.
> > >
> > >
> > >     [1] http://review.gluster.org/#/c/13507/
> > >
> > >     Thanks and Regards,
> > >     Kotresh H R
> > >
> > >     ----- Original Message -----
> > >      > From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx
> > >     <mailto:khiremat@xxxxxxxxxx>>
> > >      > To: "Soumya Koduri" <skoduri@xxxxxxxxxx
> > >      > <mailto:skoduri@xxxxxxxxxx>>
> > >      > Cc: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "Gluster
> > >     Devel" <gluster-devel@xxxxxxxxxxx <mailto:gluster-devel@xxxxxxxxxxx>>
> > >      > Sent: Monday, February 29, 2016 4:15:22 PM
> > >      > Subject: Re: Cores generated with
> > >     ./tests/geo-rep/georep-basic-dr-tarssh.t
> > >      >
> > >      > Hi Soumya,
> > >      >
> > >      > I just tested that it is reproducible only with your patch both
> > >     in master and
> > >      > 3.76 branch.
> > >      > The geo-rep test cases are marked bad in master. So it's not hit
> > >     in master.
> > >      > rpc is introduced
> > >      > in changelog xlator to communicate to applications via
> > >     libgfchangelog.
> > >      > Venky/Me will check
> > >      > why is the crash happening and will update.
> > >      >
> > >      >
> > >      > Thanks and Regards,
> > >      > Kotresh H R
> > >      >
> > >      > ----- Original Message -----
> > >      > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx
> > >     <mailto:skoduri@xxxxxxxxxx>>
> > >      > > To: avishwan@xxxxxxxxxx <mailto:avishwan@xxxxxxxxxx>, "kotresh"
> > >     <khiremat@xxxxxxxxxx <mailto:khiremat@xxxxxxxxxx>>
> > >      > > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx
> > >     <mailto:gluster-devel@xxxxxxxxxxx>>
> > >      > > Sent: Monday, February 29, 2016 2:10:51 PM
> > >      > > Subject: Cores generated with
> > >     ./tests/geo-rep/georep-basic-dr-tarssh.t
> > >      > >
> > >      > > Hi Aravinda/Kotresh,
> > >      > >
> > >      > > With [1], I consistently see cores generated with the test
> > >      > > './tests/geo-rep/georep-basic-dr-tarssh.t' in release-3.7
> > >     branch. From
> > >      > > the cores, looks like we are trying to dereference a freed
> > >      > > changelog_rpc_clnt_t(crpc) object in changelog_rpc_notify().
> > >     Strangely
> > >      > > this was not reported in master branch.
> > >      > >
> > >      > > I tried debugging but couldn't find any possible suspects. I
> > >     request you
> > >      > > to take a look and let me know if [1] caused any regression.
> > >      > >
> > >      > > Thanks,
> > >      > > Soumya
> > >      > >
> > >      > > [1] http://review.gluster.org/#/c/13507/
> > >      > >
> > >      >
> > >     _______________________________________________
> > >     Gluster-devel mailing list
> > >     Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
> > >     http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> > >
> > >
> > >
> > > --
> > > Raghavendra G
> > 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux