On Mon, 2005-10-17 at 09:16 +0300, Omer Faruk Sen wrote: > (maybe the node wasn't dead and will try to write > something to shared storage which can cause catastrophic damage if GFS is > not used) write something to file system. Correct, except it causes catastrophic damage in any case, regardless of whether or not GFS is used. GFS requires fencing in order to operate. > It does this using power > switches or other methods such as IPMI or ILO .(I heard there was a new > module for fencing that uses vmware ) GFS can use fabric-level fencing - that is, you can tell the iSCSI server to cut a node off, or ask the fiber-channel switch to disable a port. This is in addition to "power-cycle" fencing. > Thus I think this fencing conecpt is the same as STONITH in linux-ha.org > which means Shoot The Other Node In The Head(Heart).... STONITH, STOMITH, etc. are indeed implementations of I/O fencing. Fencing is the act of forcefully preventing a node from being able to access resources after that node has been evicted from the cluster in an attempt to avoid corruption. The canonical example of when it is needed is the live-hang scenario, as you described: 1. node A hangs with I/Os pending to a shared file system 2. node B and node C decide that node A is dead and recover resources allocated on node A (including the shared file system) 3. node A resumes normal operation 4. node A completes I/Os to shared file system At this point, the shared file system is probably corrupt. If you're lucky, fsck will fix it -- if you're not, you'll need to restore from backup. I/O fencing (STONITH, or whatever we want to call it) prevents the last step (step 4) from happening. How fencing is done (power cycling via external switch, SCSI reservations, FC zoning, integrated methods like IPMI, iLO, manual intervention, etc.) is unimportant - so long as whatever method is used can guarantee that step 4 can not complete. -- Lon -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster