--- Derek Price <derek@xxxxxxxxxxx> wrote: > If you increment directory version numbers on all > directory listing changes, I still see a major > problem: > > 1. Adding, renaming, or removing a file or > directory in ANY directory now cascades the version > number change up to the root directory, No, there is no need to cascade a version change up the chain. What purpose would that serve? I was not suggestion this, only that when assigning a version # to a directory/file to be sure to include all the version#s of the parents so that we can be sure we are talking about the same version of the element when it was created/edited. In fact, I now realize that this constraint could even be relaxed to simply ensure that the creation version has every parents' versions at the time of creation. When the file is updated, there is no need to update the version # to the any current parents' versions (naturally the parents must be healed though), simply bump the tail of the version # (the file portion), the rest of the version # can stay the same and need no longer match what the parents' versions are. This makes things quicker for file modifications. > effectively incrementing the version > number of ALL files and marking them > as dirty/needing update to all other > servers. No no, they are not dirty simply because the parent version # have changed. This was the false conclusion that I originally made. You don't care if any of the parents have changed as long as you are talking about the same file which will be reflected in the parents' versions when the file was created! Think of the parents' portion of the version # as just a unique ID chosen on file creation. The parents can change all they want, but if this unique ID hasn't changed on either server, we are talking about the same file. If only the file portion changes, we just have a different version of the same file and it is a candidate for extent based quick healing. > I believe that this cascade and healing is necessary > is illustrated in > the following example: given a synchronized > /a/b/c/file, against server 1: OK, to get to this point, the version graph I am suggestion would look like this on both servers (minimal version #s, they could naturally be higher if other events occurred): / -> /a/ -> /a/b/ -> /a/b/c/ -> /a/b/c/file /:v1 /:v2 /:v2 /:v2 /:v2 a:v2/1 a:v2/2 a:v2/2 a:v2/2 b:v2/2/1 b:v2/2/2 b:v2/2/1 c:v2/2/2/1 c:v2/2/2/2 file:v2/2/2/2/1 So: > $ cd / > $ mv a z /a/b/c/file -> /z/b/c/file /:v2 /:v3 a:v2/2 z:v2/2 b:v2/2/1 b:v2/2/1 c:v2/2/2/2 c:v2/2/2/2 file:v2/2/2/2/1 file:v2/2/2/2/1 > $ mkdir -p a/b/c / -> /a/ -> /a/b/ -> /a/b/c/ /:v3 /:v4 /:v4 /:v4 a:v4/1 a:v4/2 a:v4/2 b:v4/2/1 b:v4/2/2 c:v4/2/2/1 > $ echo whatever >file /a/b/c/ -> /a/b/c/file /:v4 /:v4 a:v4/2 a:v4/2 b:v4/2/2 b:v4/2/1 c:v4/2/2/1 c:v4/2/2/2 file:v4/2/2/2/1 > Then, against server 2: > > $ cat /a/b/c/file OK, we need to start with the original synchronized version#s here again, so now on server 2 the version # of /a/b/c/file is v2/2/2/2/1 while on server one it is: v4/2/2/2/1. > Would have to know to heal directory listings all > the way up to its root directory listing to give the > correct answer here. I agree, it would have to know this, but it does, doesn't it? In order to read (cat) /a/b/c/file, a lookup is first done on / right? This would cause / to be healed before it could even lookup a. This healing would cascade down until we are ready to read /a/b/c/file. I see now that indeed directory healing does not have to require modified file data to be healed, only file adds/deletes/moves need to be recorded. The file data can be healed when the file is accessed. Added files can be added as empty version 0 files signifying that they need to heal (perhaps this already happens?) I admit, this probably assumes that moves are recorded as moves, and not just add / deletes which might cause things to fail, or have the same performance problem that I point out below in the "global version#" solution. > I think the single, global version number I > mentioned in the "Client side AFR race conditions" > provides an interesting solution here. > Consider the following commands and their > corresponding file system states starting with an > empty root. In this model, changing the > content/version number of any child element is > considered to change the directory listing of the > parent, and renames update the version number > of all children of the renamed element: > > / v1 > > $ mkdir /a > / v2 > /a v2 > > $ mkdir /b > / v3 > /a v2 > /b v3 > > $ echo whatever > /a/1 > / v4 > /a v4 > /a/1 v4 > /b v3 > > $ echo whatever > /a/2 > / v5 > /a v5 > /a/1 v4 > /a/2 v5 > /b v3 > > $ mv /a /z > / v6 > /b v3 > /z v6 > /z/1 v6 > /z/2 v6 This would force an unneeded resync on /z /z/1 and /z/2 wouldn't it? That could be very expensive since 1 and 2 could be large files! > $ rm /z/2 > / v7 > /b v3 > /a v7 > /a/1 v6 "a"s should be "z"s I assume here. > This glosses over the locking issues we were > discussing in the other thread, but in this > model, a client can quickly determine whether > its copy of any directory listing or file is > up to date based on solely that file or > directory's own version number (locally and > on the server), and giving a parent directory > a new version number does not invalidate the > data of all its children. This seems like it would mostly work, just that it seems like directory renames would require the entire subtree to be resynced needlessly! A directory rename should normally (on unix) be a very small operation, this would bring us back to the old DOS days, where, if I recall correctly, it meant copying the entire subtree. ;) If you think that there are still problems/holes in the "full parent tree version" solution perhaps there is another minor tweak to your "global version #" solution which will make it work more efficiently on directory renames? -Martin ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ