Re: drop connectivity check for local clones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[I'm cc-ing the list, since I think this answer is of general interest.]

On Wed, Nov 04, 2015 at 11:00:34AM +0100, Matej Buday wrote:

> I have a question somewhat regarding this old commit of yours:
> https://github.com/git/git/commit/125a05fd0b45416558923b753f6418c24208d443
> 
> Let me preface this by saying that I don't completely understand what the
> connectivity check does...

One of the invariants git tries to remain in the repository is that for
any object reachable from a ref (i.e., a branch or tag), we have all of
the ancestor objects. So if you have commit 125a05, you also have the
parent, and its parent, and so on, down to the root.

When we fetch or clone from a remote repository, it sends us some
objects, and we plan to point one of our refs at it. But rather than
trust that the remote sent us everything we need to maintain that
invariant, we actually walk the graph to make sure that is the case.

This can catch bugs or transfer errors early. So the operation is safer,
at the expense of spending some CPU time.

We skip it for local disk-to-disk clones. We trust the source clone
more, and since the point of a local clone is to be very fast, the
safety/CPU tradeoff doesn't make as much sense.

> Well, the question is -- is this check necessary
> for local clones that use the --reference option?

Sort of. If you say:

  git clone --reference /some/local/repo git://some-remote-repo

Then we do check the incoming objects from some-remote-repo. However,
there is an optimization we don't do: we could assume that everything in
/some/local/repo is fine, and stop traversing there. So if you fetch
only a few objects from the remote, that is all you would check.

The optimization would look something like this:

  https://github.com/peff/git/commit/1254ff54b49eff19ec8a09c36e3edd24d490cae1

I wrote that last year, but haven't actually submitted the patch yet.
There are two reasons:

  1. It needs minor cleanup due to the sha1/oid transition that is
     ongoing (see the "ugh" comment). I think this could be fixed by
     refactoring some of the callback interfaces, but I haven't gotten
     around to it.

  2. Using alternates to optimize can backfire at a certain scale. If
     you have a very large number of refs in the alternate repository,
     just accessing and processing those refs can be more expensive than
     walking the history graph in the first place.

     This is the case for us at GitHub, where our alternates have the
     refs for _all_ of the forks of a given project. So I would want
     some flag to turn this behavior off.

     Of course, we are in an exceptional circumstance at GitHub, and
     that is no reason the topic cannot go upstream (we already carry
     custom patches to disable alternates for things like receive-pack,
     and could do the same here).

     So that is not a good reason not to submit, only an explanation why
     I have not yet bothered to spend the time on it. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]