Re: trusted.glusterfs.version xattr

Martin Fick <mogulguy@xxxxxxxxx> · Thu, 8 May 2008 11:40:25 -0700 (PDT)

--- Derek Price <derek@xxxxxxxxxxx> wrote:
> If you increment directory version numbers on all
> directory listing changes, I still see a major 
> problem:
> 
> 1.  Adding, renaming, or removing a file or
> directory in ANY directory now cascades the version 
> number change up to the root directory, 

No, there is no need to cascade a version change 
up the chain.  What purpose would that serve?
I was not suggestion this, only that when
assigning a version # to a directory/file
to be sure to include all the version#s of the
parents so that we can be sure we are talking
about the same version of the element when it
was created/edited.  

In fact, I now realize that this constraint 
could even be relaxed to simply ensure that 
the creation version has every parents'
versions at the time of creation.  When 
the file is updated, there is no need
to update the version # to the any current 
parents' versions (naturally the parents 
must be healed though), simply bump the 
tail of the version # (the file portion), 
the rest of the version # can stay the 
same and need no longer match what the 
parents' versions are.  This makes things 
quicker for file modifications.

> effectively incrementing the version 
> number of ALL files and marking them 
> as dirty/needing update to all other 
> servers. 

No no, they are not dirty simply because
the parent version # have changed.  This 
was the false conclusion that I originally
made.  You don't care if any of the
parents have changed as long as you
are talking about the same file which
will be reflected in the parents' 
versions when the file was created!

Think of the parents' portion of the 
version # as just a unique ID chosen on 
file creation.  The parents can change 
all they want, but if this unique ID 
hasn't changed on either server, we are 
talking about the same file.  If only 
the file portion changes, we just have 
a different version of the same file
and it is a candidate for extent based
quick healing.

> I believe that this cascade and healing is necessary
> is illustrated in 
> the following example:  given a synchronized
> /a/b/c/file, against server 1:

OK, to get to this point, the version graph 
I am suggestion would look like this on both
servers (minimal version #s, they could 
naturally be higher if other events occurred):

/   -> /a/  -> /a/b/ -> /a/b/c/ -> /a/b/c/file
/:v1   /:v2    /:v2     /:v2       /:v2
       a:v2/1  a:v2/2   a:v2/2     a:v2/2
               b:v2/2/1 b:v2/2/2   b:v2/2/1
                        c:v2/2/2/1 c:v2/2/2/2
                                   file:v2/2/2/2/1

So:
> 	$ cd /
> 	$ mv a z

 /a/b/c/file        ->  /z/b/c/file        
 /:v2                   /:v3
 a:v2/2                 z:v2/2  
 b:v2/2/1               b:v2/2/1
 c:v2/2/2/2             c:v2/2/2/2
 file:v2/2/2/2/1        file:v2/2/2/2/1

> 	$ mkdir -p a/b/c

 / -> /a/  -> /a/b/ -> /a/b/c/ 
 /:v3 /:v4    /:v4     /:v4
      a:v4/1  a:v4/2   a:v4/2
              b:v4/2/1 b:v4/2/2
                       c:v4/2/2/1

> 	$ echo whatever >file

 /a/b/c/ -> /a/b/c/file
 /:v4       /:v4
 a:v4/2     a:v4/2
 b:v4/2/2   b:v4/2/1
 c:v4/2/2/1 c:v4/2/2/2
            file:v4/2/2/2/1

> Then, against server 2:
> 
> 	$ cat /a/b/c/file

OK, we need to start with the original
synchronized version#s here again, so
now on server 2 the version # of 
/a/b/c/file is v2/2/2/2/1 while on
server one it is: v4/2/2/2/1.

> Would have to know to heal directory listings all
> the way up to its root directory listing to give the

> correct answer here.

I agree, it would have to know this, but it does,
doesn't it?  In order to read (cat) /a/b/c/file,
a lookup is first done on / right?  This would
cause / to be healed before it could even lookup
a.  This healing would cascade down until we
are ready to read /a/b/c/file.  I see now that
indeed directory healing does not have to require 
modified file data to be healed, only file 
adds/deletes/moves need to be recorded.  The
file data can be healed when the file is 
accessed.  Added files can be added as empty
version 0 files signifying that they need
to heal (perhaps this already happens?)

I admit, this probably assumes that moves 
are recorded as moves, and not just add /
deletes which might cause things to fail,
or have the same performance problem that
I point out below in the "global version#"
solution.

> I think the single, global version number I
> mentioned in the "Client side AFR race conditions" 
> provides an interesting solution here. 
> Consider the following commands and their
> corresponding file system states starting with an 
> empty root.  In this model, changing the 
> content/version number of any child element is
> considered to change the directory listing of the 
> parent, and renames update the version number 
> of all children of the renamed element:
> 
> /			v1
> 
> 	$ mkdir /a
> /			v2
> /a			v2
> 
> 	$ mkdir /b
> /			v3
> /a			v2
> /b			v3
> 
> 	$ echo whatever > /a/1
> /			v4
> /a			v4
> /a/1			v4
> /b			v3
> 
> 	$ echo whatever > /a/2
> /			v5
> /a			v5
> /a/1			v4
> /a/2			v5
> /b			v3
> 
> 	$ mv /a /z
> /			v6
> /b			v3
> /z			v6
> /z/1			v6
> /z/2			v6

This would force an unneeded resync on 
/z /z/1 and /z/2 wouldn't it?  That could 
be very expensive since 1 and 2 could be 
large files!

> 	$ rm /z/2
> /			v7
> /b			v3
> /a			v7
> /a/1			v6

"a"s should be "z"s I assume here.

> This glosses over the locking issues we were
> discussing in the other thread, but in this 
> model, a client can quickly determine whether 
> its copy of any directory listing or file is 
> up to date based on solely that file or 
> directory's own version number (locally and
> on the server), and giving a parent directory 
> a new version number does not invalidate the 
> data of all its children.

This seems like it would mostly work, just that
it seems like directory renames would require the
entire subtree to be resynced needlessly!  A 
directory rename should normally (on unix) be
a very small operation, this would bring us
back to the old DOS days, where, if I recall
correctly, it meant copying the entire 
subtree. ;)

If you think that there are still problems/holes
in the "full parent tree version" solution perhaps
there is another minor tweak to your "global 
version #" solution which will make it work more
efficiently on directory renames?

-Martin

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ