On Sun, Aug 23, 2009 at 3:17 AM, Stephan von Krawczynski<skraw@xxxxxxxxxx> wrote: > On Sat, 22 Aug 2009 10:24:48 -0700 > Anand Avati <avati@xxxxxxxxxxx> wrote: > >> [... long technical explanation ...] >> As you rightly summarized, >> Your theory: glusterfs is buggy (cause) and results in all fuse >> mountpoints hanging, and also results in server2's backend fs hanging >> (effect) >> >> My theory: your backend fs is buggy (cause) and hangs and results in >> all fuse mountpoints to hang (effect) which happens because of reasons >> explained above >> >> I maintain that my theory is right because glusterfsd just cannot >> cause a backend filesystem to hang, and if it indeed did, the bug is >> in the backend fs because glusterfsd only performs system calls to >> access it. > > Lets assume your theory is right. Then I obviously managed to create a > scenario where the bail-out decisions for servers are clearly bad. In fact > they are so bad that the whole service breaks down. This is of course a no-go > for an application thats sole (or primary) purpose is to keep your fileservice > up, no matter what servers in the backend crash or vanish. As long as there is > a theoretical way of performing the needed fileservice it should be up and > running. Even iff your theory were right, still glusterfs does not handle > the situation as good as is could (read: as a user would expect). OK, first of all, this is now a very different issue we are trying to address. Correct me if I'm wrong, the new problem definition now is - 'when glusterfs is presented with a backend filesystem which hangs FS calls, the replicate module does not provide FS service' (and not any more, as previously described by you, 'glusterfs has not been able to run bonnie even for an hour on all 2.0.x releases because of lack of attention towards stability and concentration on featurism'). Please do understand that this is not at all a (regular) crash of the filesystem, as described, which can be reliably reproduced within an hour, and the dev team not caring to fix it. The problem does not deserve such an attack. The reason why this issue persists is - there is no reliable way to even detect this hang programatically. The right way to "deal" with it is to translate the "disk hang" into a "subvolume down" is hard, because -- Has the server stopped responding? No, ping-pong replies are coming just fine. Has the backend disk started returning IO errors? No, the FS calls just hang exactly like a deadlock. Detecting hardware failures can be done with reasonable reliability. Detecting buggy software lockups and such deadlocks is a very hard (theoretical) problem. The simplest way around it having timeouts at a higher layer. And it is for a reason that the current call timeouts are 1800 seconds - we have seen in our QA lab that truncate() call on multi terabyte large file on ext3 takes more than 20 minutes to complete, and during that period all other calls happening on that filesystem also freeze. Programatically this situation is no different from the hang you face. The 1800sec timeout currently used is based on experimental calculations and not arbitrary. If you can come up with a better way of reliably detecting that the backend FS has hung itself (even considering the delay situations which I explained above), we are willing to use that technique provided it is reasonable enough (do consider situations where the backend fs could be an NFS which might have temporarily blocked for multiple minutes for the server to reboot etc). Avati