> well, that never hapen before when using nfs with the same > computers, same disk, etc ... for almost 2 years, so it's more > than possible that is glusterfs the one which is triggering this > suposed ext3 bug, but appart from this: > > a) documentation says "All operations that do not modify the file > or directory are sent to all the subvolumes and the first successful > reply is returned to the application", why is blocking then ? > it's suposed that the reply from the non blocked server will > come first and nothing will block, but clients are blocking on > a simple ls operation The calls (as you have seen in the logs as well) which are hanging are lookup calls, which have to be sent to all subvolumes to ensure all the copies are in sync. > b) server1 (the ?non blocked one) also has the volumes mounted like > any other client, but having option read-subvolume set to the local > volume, but it also hangs when it was suposed to read from the local > volume, not from the hanged one The read calls are indeed served from read-subvolume, but that is only for read() system calls so that you can avoid bulk data transfer on the network. Calls like lookup() have to be sent to all subvolumes as long as they report to be "up". The problem is that in the current version there is no way to translate a "hanging backend fs" into a "down subvolume". > c) does not glsuterfs ping the servers periodically to see if they > are available or not ? if so, why does not it detect that situation ? It does, but in this case the server is up and running and replying with pongs. The current ping-pong only checks for network reachability to the server process. Avati