On Tue, 2005-11-08 at 16:46 -0800, Michael Will wrote: > I was more thinking along those lines: > > 1. node A fails > 2. node B reboots node A > 3. node A fails again because it has not been fixed. > > now we could have a 2-3-2 loop. worst case situation is > that 3. is actually > 3.1 node A comes up and starts reaquiring its ressource > 3.2 node A fails again because it has not been fixed > 3.3 goto 2 > > Your recommendation f/g is exactly what I was wondering about > as an alternative. I know it is possible but try to understand > why it would not be the default behavior. > > In active/passive heartbeat style setups I set the nice-failback > option so it does not try to reclaim ressources unless the other > node fails, but I wonder what is the best path in a multinode > active/active setup. IMHO, an auto reboot is never a good option. Theoretically, node A failed for some reason, and a human should examine it to find out what the problem is/was. Recovering a fenced node should require manual operator intervention--if for no other reason than to verify that a reboot will not cause a repeat of the incident. Fencing should a) turn off the fenced node's ability to reacquire resources; b) power down the fenced node (if possible); and c) alert the operator that fencing occurred. > Lon Hohberger wrote: > > On Tue, 2005-11-08 at 07:52 -0800, Michael Will wrote: > > > >>> Power-cycle. > >>> > >> I always wondered about this. If the node has a problem, chances are > >> that rebooting does not > >> fix it. Now if the node comes up semi-functional and attempts to regain > >> control over the ressource > >> that it owned before, then that could be bad. Should it not rather be > >> shut-down so an human intervention > >> can fix it before it is being made operational again? > >> > > > > This is a bit long, but maybe it will clear some things up a little. As > > far as a node taking over a resource it thinks it still has after a > > reboot (without notifying the other nodes of its intentions), that would > > be a bug the cluster software, and a really *bad* one too! > > > > A couple of things to remember when thinking about failures and fencing: > > > > (a) Failures are rare. A decent PC has something like a 99.95% uptime > > (I wish I knew where I heard/read this long ago) uptime - with no > > redundancy at all. A server with ECC RAM, RAID for internal disks, etc. > > probably has a higher uptime. > > > > (b) The hardware component most likely to fail is a hard disk (moving > > parts). If that's the root hard disk, the machine probably won't boot > > again. If it's the shared RAID set, then the whole cluster will likely > > have problems. > > > > (c) I hate to say this, but the kernel is probably more likely to fail > > (panic, hang) than any single piece of hardware. > > > > (d) Consider this (I think this is an example of what you said?): > > 1. Node A fails > > 2. Node B reboots node A > > 3. Node A correctly boots and rejoins cluster > > 4. Node A mounts a GFS file system correctly > > 5. Node A corrupts the GFS file system > > > > What is the chance that 5 will happen without data corruption occurring > > during before 1? Very slim, but nonzero - which brings me to my next > > point... > > > > (e) Always make backups of critical data, no matter what sort of block > > device or cluster technology you are using. A bad RAM chip (e.g. an > > parity RAM chip missing a double-bit errors) can cause periodic, quiet > > data corruption. Chances of this happening are also very slim, but > > again, nonzero. Probably at least as likely to happen as (d). > > > > (f) If you're worried about (d) and are willing to take the expected > > uptime hit for a given node when that node fails, even given (c), you > > can always change the cluster configuration to turn "off" a node instead > > of reboot it. :) > > > > (g) You can chkconfig --del the cluster components so that they don't > > automatically start on reboot; same effect as (f): the node won't > > reacquire the resources if it never rejoins the cluster... > > > > > > > >> I/O fencing instead of power fencing kind of works like this, you undo > >> the i/o block once you know > >> the node is fine again. > >> > > > > Typically, we refer to that as "fabric level fencing" vs. "power level > > fencing", both fit in with the I/O fencing paradigm in preventing a node > > from flushing buffers after it has misbehaved. > > > > Note that typically the only way to be 100% positive a node has no > > buffers waiting after it has been fenced at the fabric level is a hard > > reboot. > > > > Many administrators will reboot a failed node as a first attempt to fix > > it anyway - so we're just saving them a step :) (Again, if you want, > > you can always do (f) or (g) above...) > > > > -- Lon > > > > -- > > > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > ---------------------------------------------------------------------- - Rick Stevens, Senior Systems Engineer rstevens@xxxxxxxxxxxxxxx - - VitalStream, Inc. http://www.vitalstream.com - - - - "Hello. My PID is Inigo Montoya. You `kill -9'-ed my parent - - process. Prepare to vi." - ---------------------------------------------------------------------- -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster