Replication not working on server hang

skraw at ithnet.com (Stephan von Krawczynski) · Fri, 28 Aug 2009 15:28:43 +0200

On Fri, 28 Aug 2009 14:28:51 +0200
David Saez Padros <david at ols.es> wrote:

> Hi
> 
> well, that never hapen before when using nfs with the same
> computers, same disk, etc ... for almost 2 years, so it's more
> than possible that is glusterfs the one which is triggering this
> suposed ext3 bug, but appart from this:

I can assure you that you will never have an agreement on this point on this
list, this happens to be the only bugfree software in universe according to 
authors ;-) 

> a) documentation says "All operations that do not modify the file
> or directory are sent to all the subvolumes and the first successful
> reply is returned to the application", why is blocking then ?
> it's suposed that the reply from the non blocked server will
> come first and nothing will block, but clients are blocking on
> a simple ls operation

My impression is that you have to imagine the setup as serialized queue on the
server. If there was one operation with a hang, all future ones will be
hanging, too. 

> b) server1 (the  non blocked one) also has the volumes mounted like
> any other client, but having option read-subvolume set to the local
> volume, but it also hangs when it was suposed to read from the local
> volume, not from the hanged one

This is exactly my experience. You cannot make it work either way. There seems
to be some locking across all used servers.

> c) does not glsuterfs ping the servers periodically to see if they
> are available or not ? if so, why does not it detect that situation ?

Well, this ping-pong procedure seems to be only detecting offline servers
(i.e. network down), but is obviously not able to give hints about being
operational or not.

My idea of a solution would be to implement something like a bail-out timeout
configurable on the client vol file for every brick. This would allow to
intermix slow and fast servers and it would cope with a situation where some
clients are far away with a slow connnections and others are nearby with very
fast connection to the same servers.
The biggest problem about it probably is not to bail out servers, but to
re-integrate them. Currently there seems to be no userspace tool to tell a
client to re-integrate a formerly dead server. Obviously this should not
happen auto-magically to prevent flapping.

> >> [...]
> >> Glusterfs log only shows lines like this ones:
> >>
> >> [2009-08-28 09:19:28] E [client-protocol.c:292:call_bail] data2: bailing 
> >> out frame LOOKUP(32) frame sent = 2009-08-28 08:49:18. frame-timeout = 1800
> >> [2009-08-28 09:23:38] E [client-protocol.c:292:call_bail] data2: bailing 
> >> out frame LOOKUP(32) frame sent = 2009-08-28 08:53:28. frame-timeout = 1800
> >>
> >> Once server2 has been rebooted all gluster fs become available
> >> again on all clients and the hanged df and ls processes terminate,
> >> but difficult to understand why a replicated share that must survive
> >> to failure on one server does not.
> > 
> > You are suffering from the problem we talked about few days ago on the list.
> > If your local fs produces a deadlock somehow on one server glusterfs is
> > currently unable to cope with the situation and just _waits_ for things to
> > come. This deadlocks your clients, too, without any need.
> > Your experience backs my critics on the handling of these situations.
> 
> -- 
> Best regards ...
> 
> ----------------------------------------------------------------
>     David Saez Padros                http://www.ols.es
>     On-Line Services 2000 S.L.       telf    +34 902 50 29 75
> ----------------------------------------------------------------
> 
> 
> 

-- 
Regards,
Stephan