--- Derek Price <derek@xxxxxxxxxxx> wrote: > Okay, I think we may be operating with slightly > different assumptions about the way things are > currently happening, so to start off: Perhaps, I certainly have been making plenty of assumptions! :) > 2. I had the understanding due to a comment in > another thread that the only operations that were > going across the AFR wire in the case of a > rename was a remove and then a create. If this > assumption is wrong, somebody please correct me. I don't know if this is correct, but on a normal posix filesystem I believe that the reverse would usually happen. First the new name would be linked to the inode and then the old name would be unlinked? I am indeed assuming that this is what glusterfs is doing, but perhaps it is not? ... > If, however, the only operations permitted on > directory listings is to present a new list of > the current files and directories (basically adds > and removes), then your method breaks down because > if a directory version changes then all children of > that directory must be assumed to be changed. > For example, in the second FS state shown below, > there is no way to distinguish which of the three > files has changed. Given: > > /:2 a/:2 b/:2 c/:4 /A:1 > /B:1 > /C:1 Well, you do not appear to be using my versioning scheme here, you are not using the parent ids on creation. The parent id versioning scheme would look something more like: Starting with /a/b/c/A /a/b/c/B /a/b/c/C /:2 /:2 /:2 a:2/2 a:2/2 a:2/2 b:2/2/2 b:2/2/2 b:2/2/2 c:2/2/2/4 c:2/2/2/5 c:2/2/2/6 A:2/2/2/4/1 B:2/2/2/4/1 C:2/2/2/6/1 B and C would not be able to have the same versions as A since c would get bumped beteween each add (this is not important in this example though, just mentioning it to be consistent) > $ cd /a/b/c > $ rm C /a/b/c /:2 a:2/2 b:2/2/2 c:2/2/2/7 > $ echo new content >C /a/b/c /:2 a:2/2 b:2/2/2 c:2/2/2/8 C:2/2/2/8/1 If we now look at the three files, it is easy to see which ones have been modified. In the beginning above we had: A:2/2/2/4/1 B:2/2/2/4/1 C:2/2/2/6/1 Now we have: A:2/2/2/4/1 B:2/2/2/4/1 C:2/2/2/8/1 This means that C is a completely new file and is not even a candidate for a quick sync since more than just the final # (1) has changed. Note that A and B have not changed, their version does not need to be bumped simply because c has changed! Does this make more sense? > renders: > > /:2 a/:2 b/:2 c/:6 /A:1 > /B:1 > /C:1 Again, you are using not the parent ids on creation here. > My solution solves this last problem (if it, in > fact, even exists), As shown above, I believe it does not exist. :) > though not the efficient rename issue (if it, in > fact, even exists). It does exist, doesn't it. What is in question is whether the parent id solves it, right? > I was leaving the transaction journaling issue > (basically what you represent is already being > maintained on a per-directory basis), as a > future update which would be easier to integrate > once the first problem was solved. Well, I was not assuming any form of journaling. I was assuming that renames are efficient and that they consist of link/unlink sequences which do indeed need to be serialized to work. This is the real question, could the directory healing scheme potentially reorder the link/unlink operation? If so, than renames will not work efficiently. Let's walk my previous example through the directory healing process. Before the heal we would have ended with server 1 like this: /a/b/c/ -> /a/b/c/file /:v4 /:v4 a:v4/2 a:v4/2 b:v4/2/2 b:v4/2/1 c:v4/2/2/1 c:v4/2/2/2 file:v4/2/2/2/1 server 2 did not get these updates so it still looks like this: / -> /a/ -> /a/b/ -> /a/b/c/ -> /a/b/c/file /:v1 /:v2 /:v2 /:v2 /:v2 a:v2/1 a:v2/2 a:v2/2 a:v2/2 b:v2/2/1 b:v2/2/2 b:v2/2/1 c:v2/2/2/1 c:v2/2/2/2 file:v2/2/2/2/1 when server 2 rejoins and the cat happens: > cat /a/b/c/file / is out of date so it is updated to: /a/b/c/file -> /z/b/c/file although, this was probably a mistake in my original scheme since it would in fact be two steps, not one. These steps would be: /a -> /a & /z /a & /z -> /z which would have bumped the / version twice. However, I do not believe this would matter. Ultimately, on healing the directory would simply see the latest of the two versions which would allow the rename to persist since the inode for 'a' was never deleted, it just ended up linked at a different spot, at z, in /. However, this was a simple "nice" rename, we renamed a file in the same directory, so on healing, the inode will not be lost. However, what happens if we move a file/directory from a high level directory to a lower level directory? I fear that the higher level directory will probably get healed before the lower level directory which will cause the inode to be lost during the heal! I am not sure how this can be solved without a journal? Perhaps someone who understands AFR renames could help out here? Cheers, -Martin ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ