Re: Unable to make HA work; mounts hang on remote node reboot

Kingsley <gluster@xxxxxxxxxxxxxxxxxxx> · Sun, 26 Apr 2015 11:13:19 +0100

On Thu, 2015-04-16 at 17:05 -0600, CJ Baar wrote:
> Also, I have realized that the problem is deeper than I originally
> thought.  It’s not just the mount that is hanging when a node reboots…
> it appears to be the entire system.  I cannot use my SSH connection,
> no matter where I am in the system, and services such as httpd become
> unresponsive.  I can ping the “surviving” system, but other than that
> it appears pretty unusable.  This is a major drawback to using
> gluster.  I can’t afford to lost two entire systems if one dies.

Out of interest, what is the longest amount of time you have waited for
gluster to become responsive again after a node goes down?

On our setup, if a node becomes inaccessible, I usually see it stop
responding for around 30 seconds (never actually timed it) before things
start working again. Things do start working again on our system after
this time, even if the inaccessible node stays down. If it comes back
later, updates are automatically synced to it from the other nodes.

In our case, although having the whole thing freezing for 30 seconds
isn't ideal, for us it is an acceptable trade-off, given that a node
failure should be a relatively rare event.

-- 
Cheers,
Kingsley.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users