On Wed, Feb 04, 2009 at 11:00:18PM +0000, René Scharfe wrote: > René Scharfe schrieb: > > Anyway, I'll try to resurrect my old, incomplete symlink following code, > > but I don't have much time, either. :-/ > > After a second and a third look I don't see any salvageable parts in the > old code any more. It was a just prototype that taught me something I > should have been able to find out by thinking alone: that to follow > links within tracked content we can't simply jump to the target, but we > have to walk the whole path step by step. > > E.g., consider a repository with these four entries: > > Type Name Target > ------- ------- ------ > file a/f > symlink a/x f > symlink a/y ../b/f > symlink b a > > Let's say our goal is to follow symlinks pointing to tracked content. > > We can easily follow "a/x" to get to its target "f" by concatenating the > directory part of the symlink's path ("a/") with the target ("f"), i.e. > we only need to do a simple string operation. > > If we do the same for "a/y", we'd arrive at "b/f", which is not a > tracked file by itself, though. We need to look up each path element > one by one and follow symlinks at each step. That can't be done with > our existing tree walkers, AFAICS, so we'd need to write a new one. I mostly stumbled on those issues before I gave up having no time to understand how tree walkers work :/ Because of course, our symlinks are exactly symlinks to directories, so not supporting'em is unacceptable to us. > The decision to follow a link can be made by the callback and passed to > read_tree_recursive() as a return value, with, e.g., READ_TREE_FOLLOW > and READ_TREE_FOLLOW_NON_MATCHES meaning to follow all internal symlinks > and to follow only those whose target doesn't match the specified paths, > respectively. It has to be more clever. If you consider something like: symlink a/b .. Or funnier: symlink a/b ../../c symlink c/d ../../a If you don't pay attention, you end up with a nice busy loop, and really really really long path names (a/b/b/b/b/b.... for the first one, and a/b/d/b/d/b/d/b/d/b/... for the latter). That's why I was thinking of a more straight approach, basicaly doing that: * when meeting a symlink to a blob, see if that blob is tracked or not, and if its "real" path in the repository is inside what we're archiving or not. Then match that with what the user asked (following any symlinks -- if we want to, this looks like a pretty big security risk to me, and I see no good reason for that --, only tracked symlinks outside of the archived paths, or only tracked symlinks no matter what), and do it. This one is the almost easy bit. * when meeting a symlink to a directory, look at the pointee, and like for the file, see if it's "tracked" (IOW contains tracked files) and see if the user want symlink replacement or not. If yes, then remember the current <path, pointed directory inside the repository> and put it in a worklist. When finishing the first "pass" of archiving, run a new archiving based on the worklist. Do it a few times. and if you don't converge to a fixed point where the worklist is empty, then you are likely to be in a situation like the ones I depict earlier. *phew*. Though this need quite a reeingeenering of the code, and I had (still don't really have) no time for it. But I think this is the straight approach that would work easily (I don't know for zip though, but in tar where entries are not really sorted, it should work). -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgpBa9bnTuH3b.pgp
Description: PGP signature