> [...] > Glusterfs log only shows lines like this ones: > > [2009-08-28 09:19:28] E [client-protocol.c:292:call_bail] data2: bailing > out frame LOOKUP(32) frame sent = 2009-08-28 08:49:18. frame-timeout = 1800 > [2009-08-28 09:23:38] E [client-protocol.c:292:call_bail] data2: bailing > out frame LOOKUP(32) frame sent = 2009-08-28 08:53:28. frame-timeout = 1800 > > Once server2 has been rebooted all gluster fs become available > again on all clients and the hanged df and ls processes terminate, > but difficult to understand why a replicated share that must survive > to failure on one server does not. You are suffering from the problem we talked about few days ago on the list. If your local fs produces a deadlock somehow on one server glusterfs is currently unable to cope with the situation and just _waits_ for things to come. This deadlocks your clients, too, without any need. Your experience backs my critics on the handling of these situations. -- Regards, Stephan