[Gluster-devel] Re: AFR documentation

vikas at zresearch.com (Vikas Gorur) · Fri, 19 Dec 2008 23:17:52 +0530

2008/12/18 Martin Fick <mogulguy at yahoo.com>:
On Thu, 12/18/08, Martin Fick wrote:

> I have some questions about things that were perhaps implied in the
>document, but not really discussed.  It sounds like the 5 step write
>process lays the foundations for transaction based writes and
>efficient self healing?  I am curious how many of the failures cases
>are currently dealt with besides the mentioned split brain
>problems.  Are there any rollback procedures implemented?  It seems
>likely the intent, I am just trying to clarify the current
>functionality.

First off, it would be a misnomer to call it "transaction" (even though
you will see the term used in the code) because in actual database
transactions a complete record is kept about *what changed*. In AFR we
only keep track that *something changed*, and it is up to the
self-heal logic to figure out what changed (permissions, existence of
the file itself, or file contents) and do the appropriate thing.

>  For example:  what happens if the client dies between step 2 and
> 3? The client has 1. locked file (or directory) on all of the lock
> servers and 2. written the change log entries on all servers, and
> then dies.  Will the lock timeout?

Yes, the lock will be released if and when the client dies.

> If so, does another client then know how to (is it capable of)
> either complete the write or roll it back at this point?  What if
> the client failure occurs after or during any of the other steps, can
> the entire process be either moved forward or rolled back (yet)?
> With this 5 step process, it seems like a guaranteed rollback
> should be possible anytime before step 3 and a commit should be
> possible anytime after step 2 (even if only completed on one
> server). Is that correct?

What is guaranteed by the 5-step algorithm is that:

"as long as an operation (i.e., the actual file operation like write or
create) succeeds on atleast one node, the self-heal logic will bring
all the other nodes up-to-date as and when they come up"

If the procedure fails on all nodes before it gets to stage 3 (i.e., the actual
operation like write or create), then it will be as if nothing ever
happened (this situation will trigger a spurious self-heal but since
none of the copies
have been modified this is harmless).

The important caveat here is the split-brain case (as explained in the
documentation).

If a split-brain case happens (either the network being split into two
and two independent clients writing to the pieces of the network, or
the network being split in time, that is one server goes down and when
it comes back up the other server goes down), then there is nothing
AFR can do.

If the split-brain case happens, AFR gives you two options:

1) Disallow opening of the file (file open fails with "I/O error" and
a log message telling you to manually delete one copy). This is the
default because we don't want GlusterFS to lose user data
accidentally.

or

2) You can specify one of the subvolumes as the "favorite-child" and
the copy on that server/subvolume will be used as the definitive one
and self-heal will sync all other copies with the "favorite-child"
copy.

In future we hope to make this process easier through either the web
interface or some scripts/tools which will allow you to perform the
task of deletion in an easier way.

Vikas
-- 
Engineer - Z Research
http://gluster.com/