On Tue, 01 Sep 2009 12:29:36 +0200 "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad@xxxxxx> wrote: > It isn't misbehaving at all here. > > The job of RHCS in this case is to save your data against failure. > > If fenced can't fence a node successfully, RHCS will wait in stalled > mode (because it doesn't get a successful response from the > fence-agent) until someone who knows what he is doing comes around to > fix up the problem. If it wouldn't do it that way a separated node > could eat up your data. It is the job of fenced to stop all > activities until fencing is in a working shape again. > > This behaviour is perfectly fine IMO... Isn't that the mission of quorum? For example - if you have qourum you will run services, if you don't have quorum you won't. If there is a qdisk and single of three nodes is missing, it can't have quorum - so it can't run services? OK I understand that this is the safer way... But that's why I was asking in the first place for a command to flag node as missing completely, so that I can avoid all reconfigurations. Reconfiguration while a node missing will trigger odd behavior when node comes back - node will be fenced constantly because it has wrong config version. > - You use system dependent fencing like "HP iLO" wich will be missing > if your system is missing and no independent fencing like an > APC PowerSwitch... Yes but that are the only devices I have available for fencing. So that is the limitation of hardware, on which I don't have any influence in this case. I already know that fence devices are my only SPOF currently... But I can't help myself. > Think about a power purge which kills booth of your PSU on a system, > a system dependent management device would be missing from your > network in this case leading to exactly the problem you're faced > with. I will take a look if APC UPS-es have something like killpower for certain ports, if not I will set up false manual fencing to get around this problem. Thank you. > Your mistake is that you started fenced in normal mode in which it > will fence all nodes that it can't reach to get around a possible > split-brain scenario. You need to start fenced in "clean start" > without fencing mode (read the fenced manpage as it is documented > there) because you know everything is right. Adding clean_start again presumes reconfiguring just like removing a node and declaring cluster a two_node, and I wanted to avoid reconfigurations... Thank you very much. -- | Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D | ================================================================= | start fighting cancer -> http://www.worldcommunitygrid.org/ | -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster