--- Krishna Srinivas <krishna@xxxxxxxxxxxxx> wrote: > * replay of the journaling will cause race > conditions. (If we consider 2 or more > clients, each client writes to same offset) Access could be serialized if the journal were abstracted out of AFR into a separate journal translator and this layer were placed above the subvolume which it journaled requiring all access to the subvolume to go through the journaling layer. Would this serialization prevent the race conditions you are describing? This would now look like this: Client AFR Client AFR |\ /| | \/ | |/ \| Journal A Journal B | | Sub A Sub B > A better solution would be to maintain a list of > dirty blocks and use it during selfheal. Agreed, but why not make it infinitely granular and keep a list of dirty file spans instead of blocks? This should be extremely space efficient. > > In terms of work, I'm guessing each write > operation would need to put an additional > (serial,path,offset,bytes,data) to the journal > volume .. Actually just (version, path, offset and bytes), the 'data' does not need to be put in the journal since it is in the subvolume and can be recalled at any moment. > each data volume would need to keep track > of it's most recent serial, then mount would need to > check the journal and run playbacks for each > sub-volume who's serial isn't up to the most recent > in the journal serial ... I was envisioning that it would work similar to the way it works now in that when AFR reads a file it would ask the lower levels (which in this case are the journal layers) what the latest version of the file in each subvolume is and sync on a mismatch. > > If all this is done in a journal translator .. it > doesn't "sound" too onerous or that it would involve > changing any other code ... ?? Well, changes would have to be done in the journal translator AND in the AFR translator in order to be able to recall data from the journal. Currently when the AFR translator needs to request a file from node A to heal node B it just needs to ask the subvolume for the whole file. With a journal translator AFR needs to be smarter and ask for the changed sections of the file instead. But these specific AFR changes would need to be done whether using a journal OR an rsync layer. A separate rsync layer would need to be created that looked similar to my first diagram, like this: Client AFR Client AFR |\ /| | \/ | |/ \| Rsync A Rsync B | | Sub A Sub B The rsync layer would need to reside on the subvolume hosts and cannot be in the client AFR or you would be trashing the network anyway. > > Splitbrain handling of this would be impossible, > > and one version would always have to win. But other > > than that, I can see that would work. Splitbrain with the journal should be exactly the same as without it. -Martin ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ