On Thu, Jan 28, 2010 at 10:25 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote: > The problem you have is that you have no way of enacting fencing if the > connectivity between the sites fails. If a node fails, any cluster file > system (GFS included) will mandate a fencing action to ensure that one of > the nodes gets taken down and stays down. If you have lost cross-site > connectivity, the nodes won't be able to fence each other, and GFS will > simply block until connectivity is restored and fencing succeeds. The > chances are that when this happens, it'll also cause a fencing shoot-out and > both nodes may well end up getting fenced. > > You could use some kind of cheat-fencing, say, by setting a firewall rule > that will prevent the nodes from re-connecting (you'd need to write your own > fencing agent, but that's not particularly difficult), but then you would be > pretty much guaranteeing a split-brain situation, where the nodes would end > up operating independently without any hope of ever re-synchronising. > > The bottom line is that you need reliable out-of-band fencing mechanism. If > you have GSM/wireless signal in both areas you could rig up a separate, > small fencing "server" on each site with a GSM modem, and write a fencing > agent that sends a fencing request by SMS. When the fencing server receives > a fencing request, you'd have to make it issue a local fencing action using > one of the more standard fencing agents. Note that in this case, due to high > latency of things like SMS, you'd need to implement accurate time stamping > and deliberately semi-randomize the delay between fencing requests being > sent so that you could check time stamps and the fencing servers could > sensibly decide whether to obey the local fencing request or the remote one. > > You have to get a little creative about it and write a few lines of code to > glue it together. I've been meaning to implement something like this for a > while, but I haven't gotten around to it yet. > This seems to address the split brain condition that can occur in case of network blackouts involving a 2-node cluster. https://bugzilla.redhat.com/show_bug.cgi?id=372901 -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster