> > > Will self healing prevent an inconsistent cluster > > > from happening? I.E. Two node cluster, A+B. > > > > > > 1) Node A goes down > > > 2) Write occurs on Node B > > > 3) Node B goes down (cluster is down) > > > 4) Node A comes up -> cluster is inconsistent > > since B > > > is not yet available. Cluster should still be > > "down". > > > > > > This is not assured to work. The intersection of the > > two subsets of subvolumes before and after a group > > (subset) of nodes are added or removed should not be > > > empty. > > > Hmm, that is what I feared! Are there any plans to > ensure that this condition is met? Without this, how > do people currently trust AFR? Do they simply assume > that their cluster never cold boots? > > Since it does not sound like self healing will ensure > cluster consistency, is there another planned > task/feature that will? If not, is it because it is > viewed as impossible/difficult? It seems like in the > extreme case it would be at least simple enough to > track and prevent, wouldn't it? > > Once a cluster is up and running, any remaining > running nodes should be consistent? So it seems like > the tricky part is dealing with cold boots (when no > running consistent cluster exists.) Cold boot is not a problem. Let me explain with an example 1. Node A and Node B are UP 2. Node B goes down 3. Node A gets changes 4. Node A goes down now, 5a. Node A and B comes back together - no problem 5b. Node A alone comes back - no problem 5c. Node B alone comes back - potential problem if same files or directories changed in step 3 are accessed. 5d. Node B alone comes back and before data is accessed Node A comes back too - no problem. supporting 5c requires quite a bit of new framework code which is currently not in our highest priority. are the above restrictions unacceptable in your case? avati