Re: [wishlist] git-archive -L

Pierre Habouzit <madcoder@xxxxxxxxxx> · Thu, 05 Feb 2009 16:04:29 +0100

On Wed, Feb 04, 2009 at 11:00:18PM +0000, René Scharfe wrote:
> René Scharfe schrieb:
> > Anyway, I'll try to resurrect my old, incomplete symlink following code,
> > but I don't have much time, either. :-/
> 
> After a second and a third look I don't see any salvageable parts in the
> old code any more.  It was a just prototype that taught me something I
> should have been able to find out by thinking alone: that to follow
> links within tracked content we can't simply jump to the target, but we
> have to walk the whole path step by step.
> 
> E.g., consider a repository with these four entries:
> 
> 	Type	Name	Target
> 	-------	-------	------
> 	file	a/f
> 	symlink	a/x	f
> 	symlink a/y	../b/f
> 	symlink	b	a
> 
> Let's say our goal is to follow symlinks pointing to tracked content.
> 
> We can easily follow "a/x" to get to its target "f" by concatenating the
> directory part of the symlink's path ("a/") with the target ("f"), i.e.
> we only need to do a simple string operation.
> 
> If we do the same for "a/y", we'd arrive at "b/f", which is not a
> tracked file by itself, though.  We need to look up each path element
> one by one and follow symlinks at each step.  That can't be done with
> our existing tree walkers, AFAICS, so we'd need to write a new one.

I mostly stumbled on those issues before I gave up having no time to
understand how tree walkers work :/

Because of course, our symlinks are exactly symlinks to directories, so
not supporting'em is unacceptable to us.

> The decision to follow a link can be made by the callback and passed to
> read_tree_recursive() as a return value, with, e.g., READ_TREE_FOLLOW
> and READ_TREE_FOLLOW_NON_MATCHES meaning to follow all internal symlinks
> and to follow only those whose target doesn't match the specified paths,
> respectively.

It has to be more clever. If you consider something like:

    symlink a/b   ..

Or funnier:

    symlink a/b   ../../c
    symlink c/d   ../../a

If you don't pay attention, you end up with a nice busy loop, and really
really really long path names (a/b/b/b/b/b.... for the first one,
and a/b/d/b/d/b/d/b/d/b/... for the latter).

That's why I was thinking of a more straight approach, basicaly doing
that:
  * when meeting a symlink to a blob, see if that blob is tracked or
    not, and if its "real" path in the repository is inside what we're
    archiving or not. Then match that with what the user asked
    (following any symlinks -- if we want to, this looks like a pretty
    big security risk to me, and I see no good reason for that --, only
    tracked symlinks outside of the archived paths, or only tracked
    symlinks no matter what), and do it.

    This one is the almost easy bit.

  * when meeting a symlink to a directory, look at the pointee, and like
    for the file, see if it's "tracked" (IOW contains tracked files) and
    see if the user want symlink replacement or not. If yes, then
    remember the current <path, pointed directory inside the repository>
    and put it in a worklist.

When finishing the first "pass" of archiving, run a new archiving based
on the worklist. Do it a few times. and if you don't converge to a fixed
point where the worklist is empty, then you are likely to be in a
situation like the ones I depict earlier. *phew*.

Though this need quite a reeingeenering of the code, and I had (still
don't really have) no time for it. But I think this is the straight
approach that would work easily (I don't know for zip though, but in tar
where entries are not really sorted, it should work).

-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org
Attachment:
pgpBa9bnTuH3b.pgp

Description: PGP signature