Re: AFR write completion? AFR read redundancy?

"Anand Avati" <avati@xxxxxxxxxxxxx> · Tue, 4 Mar 2008 10:13:35 +0530

> > > Will self healing prevent an inconsistent cluster
> > > from happening?  I.E. Two node cluster, A+B.
> > >
> > > 1) Node A goes down
> > > 2) Write occurs on Node B
> > > 3) Node B goes down (cluster is down)
> > > 4) Node A comes up -> cluster is inconsistent
> > since B
> > > is not yet available.  Cluster should still be
> > "down".
> >
> >
> > This is not assured to work. The intersection of the
> > two subsets of subvolumes before and after a group
> > (subset) of nodes are added or removed should not be
>
> > empty.
>
>
> Hmm, that is what I feared!  Are there any plans to
> ensure that this condition is met?  Without this, how
> do people currently trust AFR?  Do they simply assume
> that their cluster never cold boots?
>
> Since it does not sound like self healing will ensure
> cluster consistency, is there another planned
> task/feature that will?  If not, is it because it is
> viewed as impossible/difficult?  It seems like in the
> extreme case it would be at least simple enough to
> track and prevent, wouldn't it?
>
> Once a cluster is up and running, any remaining
> running nodes should be consistent?  So it seems like
> the tricky part is dealing with cold boots (when no
> running consistent cluster exists.)

Cold boot is not a problem. Let me explain with an example

1. Node A and Node B are UP
2. Node B goes down
3. Node A gets changes
4. Node A goes down

now,

5a. Node A and B comes back together - no problem
5b. Node A alone comes back - no problem
5c. Node B alone comes back - potential problem if same files or directories
changed in step 3 are accessed.
5d. Node B alone comes back and before data is accessed Node A comes back
too - no problem.

supporting 5c requires quite a bit of new framework code which is currently
not in our highest priority. are the above restrictions unacceptable in your
case?

avati