On Thu, 2008-02-21 at 11:43 -0800, Jonathan Biggar wrote: > I've got a deployment scenario for a two node cluster (where services > are configured as active and standby) where the customer is concerned > that the external power fencing device I am using (WTI) becomes a single > point of failure. If the WTI for the active node dies, taking down the > active node, the standby cannot bring up services because it cannot > successfully fence the failed node. This leaves the cluster down. Correct. Although, if you plug in a serial terminal server, I have a patch to talk to WTI switch through a terminal server in case the server gets unjacked, though. > In the setup, storage fencing is not feasible as a backup for power fencing. Not even using fence_scsi? (SCSI3 reservations)? That's unfortunate :( > I think I've worked out a scenario using qdiskd and the internal > hardware watchdog timers in our nodes to use as a backup for power > fencing that I hope will eliminate the single point of failure. Hardware watchdog timers = good stuff. > Here's how I see it working: > 2. Create a heuristic (besides the usual network reachability test) for > qdisk that resets the node's hardware watchdog timer. (I'll have to do > some additional work to ensure that the watchdog gets turned off if I am > gracefully shutting down the node's qdisk daemon.) There's a watchdog daemon (userspace code) that lets you configure heuristics for it. Most are internal to it - and are therefore superior to how qdiskd does heuristics from a HA / memory-neutrality perspective. If some heuristic(s) are not met, the daemon can at your option stop touching the watchdog device. There's an open bugzilla to provide an integration path between qdiskd and watchdogd - so that you can configure heuristics for watchdogd and have qdiskd base its state on those. For example, if watchdogd says "ok, we're not updating the watchdog driver because of X", qdiskd can trigger a self-demotion off of that, or maybe even write a 'If you don't hear from me in X seconds, consider me dead' message to disk...? > 3. Create a custom fencing script that is run if power fencing fails > that examines qdisk's state to see if the node that needs to be fenced > is no longer updating the quorum disk. I think the easiest thing to do is make a quick, small-footprint API or utility to talk to qdiskd to get states... > (I'm not sure how to do this--I > hope that the information in stored in qdisk's status_file will be > sufficient to determine this, if not, I might have to modify qdisk to > supply what I need.) ... because status_file is *sketchy* at best (really, it's a debugging tool). ;) > The standby node then should be sure that the active node has rebooted > itself either by qdiskd's action or via the watchdog timer, or else it > is power dead. > Can anyone see a weakness in this approach I haven't thought of? It's good from a best-effort standpoint. We don't have anything that does 'best effort' fencing - it's mostly all black/white. A question that comes up is: if we use the watchdog + watchdog daemon, do we need qdisk at all? I mean, if there's an 'eventual timeout' anyway based on the expectancy that the watchdog timer will fire and we rely on it - why bother with the intermediate steps? Hardware watchdog timers are going to be more reliable than just about anything qdiskd could provide. -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster