Re: One node goes offline, the other node loses its connection to its local Gluster volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/22/2014 05:44 PM, Greg Scott wrote:

I have 2 nodes named fw1 and fw2.  When I ifdown the NIC I’m using for Gluster on either node, that node cannot see  its Gluster volume, but the other node can see it after a timeout.  As soon as I ifup that NIC, everyone can see everything again. 

 

Is this expected behavior?  When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again. 

If a client loses communication on an open tcp connection to a server, there is a timeout period (defaults to 42 seconds) where the client waits for the communication to continue as dropping and re-establishing hundreds to potentially tens of thousands of file descriptors and locks is a very expensive process, disruptive to the entire environment.

With the test process you're describing, the clients are connected to both servers (hopefully based on hostname resolution) ip addresses on the same network. When you down a nic, that address is no longer available. Not only can the remote client not connect to it, but your local client cannot as well as the address no longer exists.

In your real-life concern, the interconnect would not interfere with the existence of either machines' ip address so after the ping-timeout, operations would resume in a split-brain configuration. As long as no changes were made to the same file on both volumes, when the connection is reestablished, the self-heal will do exactly what you expect.

However.... what you're counting on is the most common cause of split-brain. Each client connected to one server independently modifies the same file. When the connection is reestablished, the self-heal is processed and that file is marked as split-brain - inaccessible from the client mount until it's resolved by admin intervention.

You can avoid the split-brain using a couple of quorum techniques, the one that would seem to satisfy your requirements leaving your volume read-only during the duration of the outage.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux