On Wed, Aug 04, 2004 at 08:12:51AM +0200, Schumacher, Bernd wrote: > So, what I have learned from all answers is very bad news for me. It > seems, what happened is as expected by most of you. But this means: > > ----------------------------------------------------------------------- > --- One single point of failure in one node can stop the whole gfs. --- > ----------------------------------------------------------------------- > > The single point of failure is: > The lancard specified in "nodes.ccs:ip_interfaces" stops working on one > node. No matter if this node was master or slave. > > The whole gfs is stopped: > The rest of the cluster seems to need time to form a new cluster. The > bad node does not need so much time for switching to arbitrary mode. So > the bad node has enough time to fence all other nodes, before it would > be fenced by the new master. > > The bad node lives but it can not form a cluster. GFS is not working. > > Now all other nodes will reboot. But after reboot they can not join the > cluster, because they can not contact the bad node. The lancard is still > broken. GFS is not working. > > Did I miss something? > Please tell me that I am wrong! Although it's still in development/testing, what you're looking for is the way cman/fenced works. When there's a network partition, the group with quorum will fence the group without quorum. If neither has quorum then no one will be fenced and neither side can run. Gulm could probably be designed to do fencing differently but I'm not sure how likely that is at this point. -- Dave Teigland <teigland@xxxxxxxxxx>