On Tue, 2005-11-08 at 07:52 -0800, Michael Will wrote: > > > >Power-cycle. > > > > > I always wondered about this. If the node has a problem, chances are > that rebooting does not > fix it. Now if the node comes up semi-functional and attempts to regain > control over the ressource > that it owned before, then that could be bad. Should it not rather be > shut-down so an human intervention > can fix it before it is being made operational again? This is a bit long, but maybe it will clear some things up a little. As far as a node taking over a resource it thinks it still has after a reboot (without notifying the other nodes of its intentions), that would be a bug the cluster software, and a really *bad* one too! A couple of things to remember when thinking about failures and fencing: (a) Failures are rare. A decent PC has something like a 99.95% uptime (I wish I knew where I heard/read this long ago) uptime - with no redundancy at all. A server with ECC RAM, RAID for internal disks, etc. probably has a higher uptime. (b) The hardware component most likely to fail is a hard disk (moving parts). If that's the root hard disk, the machine probably won't boot again. If it's the shared RAID set, then the whole cluster will likely have problems. (c) I hate to say this, but the kernel is probably more likely to fail (panic, hang) than any single piece of hardware. (d) Consider this (I think this is an example of what you said?): 1. Node A fails 2. Node B reboots node A 3. Node A correctly boots and rejoins cluster 4. Node A mounts a GFS file system correctly 5. Node A corrupts the GFS file system What is the chance that 5 will happen without data corruption occurring during before 1? Very slim, but nonzero - which brings me to my next point... (e) Always make backups of critical data, no matter what sort of block device or cluster technology you are using. A bad RAM chip (e.g. an parity RAM chip missing a double-bit errors) can cause periodic, quiet data corruption. Chances of this happening are also very slim, but again, nonzero. Probably at least as likely to happen as (d). (f) If you're worried about (d) and are willing to take the expected uptime hit for a given node when that node fails, even given (c), you can always change the cluster configuration to turn "off" a node instead of reboot it. :) (g) You can chkconfig --del the cluster components so that they don't automatically start on reboot; same effect as (f): the node won't reacquire the resources if it never rejoins the cluster... > I/O fencing instead of power fencing kind of works like this, you undo > the i/o block once you know > the node is fine again. Typically, we refer to that as "fabric level fencing" vs. "power level fencing", both fit in with the I/O fencing paradigm in preventing a node from flushing buffers after it has misbehaved. Note that typically the only way to be 100% positive a node has no buffers waiting after it has been fenced at the fabric level is a hard reboot. Many administrators will reboot a failed node as a first attempt to fix it anyway - so we're just saving them a step :) (Again, if you want, you can always do (f) or (g) above...) -- Lon -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster