Re: Re: Timeout settings and self-healing ? (WAS: HA failover test unsuccessful (inaccessible mountpoint))

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Mon, 21 Apr 2008 17:20:53 +0530

Guido,

Can you give the setup details, conf files?
you can use http://glusterfs.pastebin.com for pasting conf files.

Thanks
Krishna

On Fri, Apr 4, 2008 at 2:40 PM, Anand Avati <avati@xxxxxxxxxxxxx> wrote:
> Daniel/Guido,
>   can you paste the logs which are relevant from the time of unplugging the
>  cable till the end of experiment?
>
>  avati
>
>  2008/4/3, Daniel Maher <dma+gluster@xxxxxxxxx <dma%2Bgluster@xxxxxxxxx>>:
>
>
> >
>  > On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <avati@xxxxxxxxxxxxx>
>  > wrote:
>  >
>  > > Daniel,
>  > >  maybe it is just taking long to detect connection failure. Can you
>  > > try with 'option transport-timeout 20' (sets response timeout to 20
>  > > seconds) in all your protocol/client and see if you still face the
>  > > 'hang' ?
>  >
>  > My simple test case is as follows :
>  > 1. Unplug one of the nodes (dfsD)
>  > 2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint -
>  > is contained)
>  >
>  > I set the timeout option along with every client instance in both the
>  > client and server configs.  I tested timeout settings of 10 and 20
>  > seconds (just to see).  In both cases, the 'hang' releases after a while
>  > (approx 30 seconds), but the results are odd. For example :
>  >
>  > # ls -l
>  >    (hang ~ 30 seconds)
>  > ls: cannot access gfs-mount: Transport endpoint is not connected
>  > total 0
>  > d????????? ? ? ? ?                ? gfs-mount
>  >
>  > # ls -l
>  >    (immediate)
>  > ls: cannot access gfs-mount: Transport endpoint is not connected
>  > total 0
>  > d????????? ? ? ? ?                ? gfs-mount
>  >
>  >    (user wait ~ 5 seconds)
>  >
>  > # ls -l
>  > total 8
>  > drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount
>  >
>  > It would appear that the "recovery" time, regardless of whether the
>  > timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the
>  > very least, it recovered.  Is there any reasonable way to bring this
>  > period of time down ?
>  >
>  > Thank you all so much for your feedback on this topic !
>  >
>  >
>
>
> _______________________________________________
>  Gluster-devel mailing list
>  Gluster-devel@xxxxxxxxxx
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>