Martin Fick wrote:
--- Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Martin Fick wrote:
/dir1/dir2/file
file and dir2 are deleted
dir2 is re-added
file is re-added and dir2 version is now the
same as it was before it was deleted.
Same problem as before but one level higher.
You would need to version all the way to the
root, "/", for this to work, wouldn't you?
No. You just need to treat directories the same as
any other file. When a create/delete operation
happens, the directory version gets bumped up.
So removing and creating a subdirectory (or a file)
would result in the created file/directory having a
major version 2 versions higher than the
previous instance.
Think parent directory, not subdirectory. If the
parent directory of the file and the file itself
are deleted and recreated you may end up with
the same versions of both the parent directory and
file!
No - because the parent directory will have the incremented major
version due to it's parent directory's minor version having been upped
when the previous version of the subdirectory is deleted.
Original creation process and versioning:
/
v1
/dir1/
v2 v1
/dir1/dir2/
v2 v2 v1
/dir1/dir2/file
v2 v2 v2 v1
Mirror goes off-line with version #s of dir2 and file
as: v2/v1.
-> file deleted
/dir1/dir2/
v2 v2 v3
-> dir2 deleted
/dir1/
v2 v3
-> dir2 recreated
/dir1/dir2/
v2 v4 v1
-> file recreated
/dir1/dir2/file
v2 v4 v2 v1
Uh oh, now if I look just at the version #s of dir2
and file I get: v2/v1 these are the same as they
were above when our mirror went off-line, the file
looks like it is the same when in fact both it and
its parent directory have been recreated!
You're missing the point. The proposal was for having TWO version
numbers: major and minor.
Major is the parent directory's minor version number. This changes on
the parent directory whenever a file/directory in it is created or
deleted. This paren't minor version gets assigned as the file's major
version at creation time only. Further updates to the file cause it's
minor version to increase. When the file gets deleted, this falls under
the creation/deletion, and causes the paren't minor version to increase.
That means that creating a subdirectory or a file will get it's major
version increased, so even if the minor version ends up the same, it'll
still be a different total version.
Thinking about it, I don't think it's necessary to increase the parent's
minor version on file deletes, only on file creates, provided this is
done before the new file gets tagged with the major version (which is
the same as parent's minor version at creation time.
Hmm, OK, I think I just reached the same conclusion as you - parent's
minor version isn't sufficient - it'd have to be the complete version
number of the parent, at which point there's a version chaining problem.
Well spotted, Martin. :)
However, if we were looking at the versions all the
way to the root, when the mirror went off-line we
would have had: /v2/v2/v2/v1 and now we have:
/v2/v4/v2/v1. There is a chance that we are
talking about different files now. Of course, the
problem I see now is that the files could in fact
have been the same even though the version number is
different with this scheme! Since the only version
# that is different is that of dir1 (v4), this could
have been caused by simply adding two new files to
that directory!
Indeed, then we have to resync the files that may well be the same. I
was saying in jest before that we might as well be using a DHT if
reliability and consistency aren't required, but now I'm thinking that
perhaps tagging the file with it's SHA hash might be a possible way.
Then we don't check if the version is newer, merely if it's different.
Of course, the problem there is that calculating the hash of a big file
is _expensive_ if we have to do it on each write.
We now have the reverse problem, possible resyncing
when not needed. This means that possibly every
single subdirectory/file of a directory needs to be
resynced. Yikes, this problem would also be
prevalent (although less intense) even when just
using the parent's version # wouldn't it? Every time
a directory is reversioned, all the files in it are
now reversioned?
Yes, I think so. Not really workable.
> > Directory moves could create a similar problem:
> >
> > /dir1/dir2/file
> > /dir1/dir3/file
> >
> > /file and dir2 deleted.
> > dir3 moved to dir2 and happened to match file
> > and dir2 version #s.
> >
> > but I think that versioning to the root would
again solve this?
You don't need versioning up to the root, you just
have to treat moves the same way as copy+delete from
the versioning point of view. This doesn't mean you
actually have to copy+delete - you just have to
update the metadata as if you did.
I wasn't implying that a move was a special case,
rather that by moving the parent directory to a file
you could end up with the same problem that I describe
above where a file is a completely different file than
on the mirror, but both the file and parent
directory's version #s are the same.
Indeed, you've convinced me. I agree now. I don't think this is the
solution.
Gordan