Re: Timeout settings and self-healing ? (WAS: HA failover test unsuccessful (inaccessible mountpoint))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After this message, I've tried the same, with exactly the same results.
When unplugging one of the server nodes, the clients lose the connection completely. When plugging the server back in, the connection recovers.

My setup:

2 storage nodes with AFR/Unify, 2 clients using fuse/glusterfs
All servers and clients use the same version:
glusterfs 1.3.8 built on Mar 11 2008 10:23:37
Repository revision: glusterfs--mainline--2.5--patch-701

I've tried the timout settings on the servers and clients, but the mount stays unavailable.



Daniel Maher wrote:
On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <avati@xxxxxxxxxxxxx>
wrote:

Daniel,
 maybe it is just taking long to detect connection failure. Can you
try with 'option transport-timeout 20' (sets response timeout to 20
seconds) in all your protocol/client and see if you still face the
'hang' ?

My simple test case is as follows :
1. Unplug one of the nodes (dfsD)
2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint -
is contained)

I set the timeout option along with every client instance in both the
client and server configs.  I tested timeout settings of 10 and 20
seconds (just to see).  In both cases, the 'hang' releases after a while
(approx 30 seconds), but the results are odd. For example :

# ls -l
   (hang ~ 30 seconds)
ls: cannot access gfs-mount: Transport endpoint is not connected
total 0
d????????? ? ? ? ?                ? gfs-mount

# ls -l
   (immediate)
ls: cannot access gfs-mount: Transport endpoint is not connected
total 0
d????????? ? ? ? ?                ? gfs-mount

   (user wait ~ 5 seconds)

# ls -l
total 8
drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount

It would appear that the "recovery" time, regardless of whether the
timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the
very least, it recovered.  Is there any reasonable way to bring this
period of time down ?

Thank you all so much for your feedback on this topic !



--
Met vriendelijke groet,

Guido Smit
ComLog B.V.

Televisieweg 133
1322 BE Almere
T. 036 5470500
F. 036 5470481

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.5/1358 - Release Date: 4/3/2008 6:36 PM

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux