Re: 'git clone' doesn't use alternates automatically?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 31, 2009 at 05:19:31PM -0800, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> >   - without either, copy alternates from origin, but _don't_ use
> >     alternates while cloning
> 
> Are you talking about a local clone optimization that does hardlink from
> the source repository?

Sorry, I was wrong about what was happening. From reading James' posts
and not doing any experimenting or looking, I had the impression that
doing this:

  # plain repo
  mkdir repo1 &&
    (cd repo1 && git init &&
     echo content >file && git add . && git commit -m one)

  # repo with alternates, but extra content
  git clone -s repo1 repo2 &&
    (cd repo2 &&
     echo content >>file && git commit -a -m two)

  # clone of repo w/ alternates
  git clone repo2 repo3

would cause the final clone to set up the alternate to repo1, but still
pull in the objects. But that isn't the case, of course. Either:

  1. It is a local hardlink clone, in which case we just pull in the
     objects from repo2.

  2. It isn't, in which case we don't copy over the alternates.

> I am fairly certain that copying alternates from the source repository was
> not an intended behaviour but was a consequence of lazy coding of how we
> copy (or link) everything from it.  The original was literally the simple
> matter of:
> 
>     find objects ! -type d -print | cpio $cpio_quiet_flag -pumd$l "$GIT_DIR/"
> 
> whose intention was to copy objects/?? and objects/pack/. and it wasn't
> even part of the design consideration to worry about what would happen to
> the alternates the source repository might have in objects/info/.

Right, I think that is what is going on. And what I was suggesting in my
other email is that it is actively harmful to have this behavior,
because now repo3 depends on repo1, without the user having explicitly
asked for such a relationship (and they might not even be aware of
repo1).

I was tempted to suggest avoiding copying the alternates from repo2
to repo3. But you can't do that: repo2 is _missing_ objects that repo3
won't have. Without the alternates file pointing to repo1, repo3 is
corrupt. So simply avoiding copying the alternates file doesn't work;
one would have to actually pull the missing objects in from the
alternate before doing so.

But actually, I think there is even more breakage in hardlinking the
alternates file: alternates files can be relative paths. So if repo2
points to "../../../repo1/.git/objects" (which it doesn't in the example
above, as "clone -s" uses absolute paths -- but it is easy enough to
construct a broken case), then repo3 will gain that alternate pointer,
but may be in a totally different directory where that relative path is
broken. And then repo3 is corrupt. So the alternates must be copied and
any relative paths munged for it to work reliably.

The hardlink code operates by default because it was thought to be a
safe optimization that couldn't bite people. But it interacts badly with
the concept of alternates. So I think a sane fix would be to disable
hardlinking if the parent repo is using alternates at all. Then a
vanilla "git clone repo2 repo3" will do the safe but more costly
behavior of actually copying the objects. If the user wants to accept
the risks of alternates, then he can give "-s" explicitly, and git will
track the alternates recursively through repo2 to repo1 at runtime.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux