2008/12/18 Martin Fick <mogulguy@xxxxxxxxx>: On Thu, 12/18/08, Martin Fick wrote: > I have some questions about things that were perhaps implied in the >document, but not really discussed. It sounds like the 5 step write >process lays the foundations for transaction based writes and >efficient self healing? I am curious how many of the failures cases >are currently dealt with besides the mentioned split brain >problems. Are there any rollback procedures implemented? It seems >likely the intent, I am just trying to clarify the current >functionality. First off, it would be a misnomer to call it "transaction" (even though you will see the term used in the code) because in actual database transactions a complete record is kept about *what changed*. In AFR we only keep track that *something changed*, and it is up to the self-heal logic to figure out what changed (permissions, existence of the file itself, or file contents) and do the appropriate thing. > For example: what happens if the client dies between step 2 and > 3? The client has 1. locked file (or directory) on all of the lock > servers and 2. written the change log entries on all servers, and > then dies. Will the lock timeout? Yes, the lock will be released if and when the client dies. > If so, does another client then know how to (is it capable of) > either complete the write or roll it back at this point? What if > the client failure occurs after or during any of the other steps, can > the entire process be either moved forward or rolled back (yet)? > With this 5 step process, it seems like a guaranteed rollback > should be possible anytime before step 3 and a commit should be > possible anytime after step 2 (even if only completed on one > server). Is that correct? What is guaranteed by the 5-step algorithm is that: "as long as an operation (i.e., the actual file operation like write or create) succeeds on atleast one node, the self-heal logic will bring all the other nodes up-to-date as and when they come up" If the procedure fails on all nodes before it gets to stage 3 (i.e., the actual operation like write or create), then it will be as if nothing ever happened (this situation will trigger a spurious self-heal but since none of the copies have been modified this is harmless). The important caveat here is the split-brain case (as explained in the documentation). If a split-brain case happens (either the network being split into two and two independent clients writing to the pieces of the network, or the network being split in time, that is one server goes down and when it comes back up the other server goes down), then there is nothing AFR can do. If the split-brain case happens, AFR gives you two options: 1) Disallow opening of the file (file open fails with "I/O error" and a log message telling you to manually delete one copy). This is the default because we don't want GlusterFS to lose user data accidentally. or 2) You can specify one of the subvolumes as the "favorite-child" and the copy on that server/subvolume will be used as the definitive one and self-heal will sync all other copies with the "favorite-child" copy. In future we hope to make this process easier through either the web interface or some scripts/tools which will allow you to perform the task of deletion in an easier way. Vikas -- Engineer - Z Research http://gluster.com/