Digimer wrote: > >> > >> Fencing and Stonith are two names for the same thing; Fencing was > >> traditionally used in Red Hat clusters and STONITH in > >> heartbeat/pacemaker clusters. It's arguable which is > >> preferable, but I > >> personally prefer fencing as it more directly describes the goal of > >> "fencing off" (isolating) a failed node from the rest of > the cluster. Yes, but "STONITH" is a wonderful acronym. > >> > >> Now let's talk about how fencing fits; > >> > >> Let's assume that Node 1 hangs or dies while it still > holds the lock. > >> The fenced daemon will be triggered and it will notify DLM > >> that there is > >> a problem, and DLM will block all further requests. Next, > >> fenced tries > >> to fence the node using one of it's configured fence > methods. It will > >> try the first, then the second, then the first again, > looping forever > >> until one of the fence calls succeeds. > >> > >> Once a fence call succeeds, fenced notifies DLM that the > node is gone > >> and then DLM will clean up any locks formerly held by Node 1. After > >> this, Node 2 can get a lock, despite Node 1 never itself > releasing it. > >> > >> Now, let's imagine that a fence agent returned success but the node > >> wasn't actually fenced. Let's also assume that Node 1 was > >> hung, not dead. > >> > >> So DLM thinks that Node 1 was fenced, clears it's old locks > >> and gives a > >> new one to Node 2. Node 2 goes about recovering the > >> filesystem and the > >> proceeds to write new data. At some point later, Node 1 unfreezes, > >> thinks it still has an exclusive lock on the LV and finishes > >> writing to > >> the disk. > > > > But you said "So DLM thinks that Node 1 was fenced, clears > it's old locks and gives a > > new one to Node 2" How can node 1 get access after > unfreezing, when the lock is cleared ? > > DLM clears the lock, but it has no way of telling Node 1 that > the lock > is no longer valid (remember, it thinks the node has been > ejected from > the cluster, removing any communication). Meanwhile, Node 1 has no > reason to think that the lock it holds is no longer valid, so it just > goes ahead and accesses the storage figuring it has exclusive > access still. But does DLM not prevent node 1 in this situation accessing the filesystem ? DLM "knows" that the lock from node 1 has been cleared. Can't DLM "say" to node 1: "You think you have a valid lock, but don't have. Sorry, no access !" Bernd Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) Ingolstädter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum Registergericht: Amtsgericht München HRB 6466 USt-IdNr: DE 129521671 _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/