On Thu, 2015-04-16 at 17:05 -0600, CJ Baar wrote: > Also, I have realized that the problem is deeper than I originally > thought. It’s not just the mount that is hanging when a node reboots… > it appears to be the entire system. I cannot use my SSH connection, > no matter where I am in the system, and services such as httpd become > unresponsive. I can ping the “surviving” system, but other than that > it appears pretty unusable. This is a major drawback to using > gluster. I can’t afford to lost two entire systems if one dies. Out of interest, what is the longest amount of time you have waited for gluster to become responsive again after a node goes down? On our setup, if a node becomes inaccessible, I usually see it stop responding for around 30 seconds (never actually timed it) before things start working again. Things do start working again on our system after this time, even if the inaccessible node stays down. If it comes back later, updates are automatically synced to it from the other nodes. In our case, although having the whole thing freezing for 30 seconds isn't ideal, for us it is an acceptable trade-off, given that a node failure should be a relatively rare event. -- Cheers, Kingsley. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users