Daniel/Guido, can you paste the logs which are relevant from the time of unplugging the cable till the end of experiment? avati 2008/4/3, Daniel Maher <dma+gluster@xxxxxxxxx <dma%2Bgluster@xxxxxxxxx>>: > > On Thu, 3 Apr 2008 14:55:48 +0530 "Anand Avati" <avati@xxxxxxxxxxxxx> > wrote: > > > Daniel, > > maybe it is just taking long to detect connection failure. Can you > > try with 'option transport-timeout 20' (sets response timeout to 20 > > seconds) in all your protocol/client and see if you still face the > > 'hang' ? > > My simple test case is as follows : > 1. Unplug one of the nodes (dfsD) > 2. Attempt to ls -l the /opt/ (in which gfs-mount/ - the mountpoint - > is contained) > > I set the timeout option along with every client instance in both the > client and server configs. I tested timeout settings of 10 and 20 > seconds (just to see). In both cases, the 'hang' releases after a while > (approx 30 seconds), but the results are odd. For example : > > # ls -l > (hang ~ 30 seconds) > ls: cannot access gfs-mount: Transport endpoint is not connected > total 0 > d????????? ? ? ? ? ? gfs-mount > > # ls -l > (immediate) > ls: cannot access gfs-mount: Transport endpoint is not connected > total 0 > d????????? ? ? ? ? ? gfs-mount > > (user wait ~ 5 seconds) > > # ls -l > total 8 > drwxr-xr-x 2 root root 4096 2008-04-03 09:43 gfs-mount > > It would appear that the "recovery" time, regardless of whether the > timeout is set to 10 or 20, is around 35 to 40 seconds - though, at the > very least, it recovered. Is there any reasonable way to bring this > period of time down ? > > Thank you all so much for your feedback on this topic ! > >