On Fri, Aug 17, 2007 at 01:03:03PM +0200, Maciej Bogucki wrote: > 1. Watchodog is a piece of code which run in user space, so You don't > have 100% guarantee that it will run correctly. Some clarification: Traditionally, a watchdog is a piece of hardware which a userland daemon writes to periodically. Failure to write to the piece of hardware after a set time causes a system reset (the app holding the watchdog open crashing is one obvious way to cause this to happen). The Linux kernel also has a software watchdog (called softdog) which operates in the kernel using the same API it exposes for hardware watchdogs. The watchdog daemon (Debian, RHEL5.1, etc.) is one implementation of the userland part of code which is well-known and often confused with being a watchdog timer itself. It monitors administrator-defined resources and touches the watchdog timer device periodically if things are "ok" and stops if things go bad (stopping causes the WD to fire). The point here is that it doesn't matter if the userspace code fails, blows up, or otherwise - the *failure* mode for a watchdog timer is to reset the system. > 2. Watchdog fencing can't protect You against split-brain situations, > where the consequences could be corruption of You data. Here comes > external fencing. You can (at least, mostly) solve this if you have alternative mechanisms for cluster communications (ex: a quorum disk on a SAN and/or using external tie-breakers/ping-nodes/whatever). However - more inline with your point - it's not simple, and it relies on a lot of assumptions. > There is another point of view about Linux Clusters and other Commercial > Clusters(fe. Sun Cluster). Linux Cluster resist in user-space so You > don't have guarantee that local fencing will run ok, and You need > exteral fencing to resolve this main problem. Sun Cluster resist in > kernel-space, so when one node lost quorum it do "kernel panic" and You > have 100% guarantee that it will success. FWIW, the hardware watchdog timer is outside of the operating system entirely. The entire kernel could hang/crash and the watchdog would still fire. Most of the reason for fencing (at all) is the notion of a live-hang of an indefinite period of time - where a node just stops for a few seconds due to a kernel bug or for some other reason. If the whole kernel stops for a few seconds, the node won't know it's no longer in the quorum, or calling panic() could be delayed. There are kernel hangcheck timers, but as I understand it, they're racy: You can not guarantee that the hang-check will complete before an outstanding I/O is flushed to disk. I could be wrong here. > For me network fencing(IPMI,DRAC,...) isn't good, because You have to > connect via network and it could fail, and so on. The best fencing > mechanism is fence_scsi, which is an I/O fencing agent. I can be used > with the SCSI devices that support persistent reservations (SPC-2 or > greater). In more cases You have shares storages taht support SPC-2 or > SPC-3. Yup. You can also use FC zoning, in addition to fence_scsi if you want. The biggest thing about not using watchdog timers as 'fencing' is that it's complex and difficult to do correctly/reliably, especially in the two-node case. -- Lon -- Lon Hohberger - Software Engineer - Red Hat, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster