[I'm cc-ing the list, since I think this answer is of general interest.] On Wed, Nov 04, 2015 at 11:00:34AM +0100, Matej Buday wrote: > I have a question somewhat regarding this old commit of yours: > https://github.com/git/git/commit/125a05fd0b45416558923b753f6418c24208d443 > > Let me preface this by saying that I don't completely understand what the > connectivity check does... One of the invariants git tries to remain in the repository is that for any object reachable from a ref (i.e., a branch or tag), we have all of the ancestor objects. So if you have commit 125a05, you also have the parent, and its parent, and so on, down to the root. When we fetch or clone from a remote repository, it sends us some objects, and we plan to point one of our refs at it. But rather than trust that the remote sent us everything we need to maintain that invariant, we actually walk the graph to make sure that is the case. This can catch bugs or transfer errors early. So the operation is safer, at the expense of spending some CPU time. We skip it for local disk-to-disk clones. We trust the source clone more, and since the point of a local clone is to be very fast, the safety/CPU tradeoff doesn't make as much sense. > Well, the question is -- is this check necessary > for local clones that use the --reference option? Sort of. If you say: git clone --reference /some/local/repo git://some-remote-repo Then we do check the incoming objects from some-remote-repo. However, there is an optimization we don't do: we could assume that everything in /some/local/repo is fine, and stop traversing there. So if you fetch only a few objects from the remote, that is all you would check. The optimization would look something like this: https://github.com/peff/git/commit/1254ff54b49eff19ec8a09c36e3edd24d490cae1 I wrote that last year, but haven't actually submitted the patch yet. There are two reasons: 1. It needs minor cleanup due to the sha1/oid transition that is ongoing (see the "ugh" comment). I think this could be fixed by refactoring some of the callback interfaces, but I haven't gotten around to it. 2. Using alternates to optimize can backfire at a certain scale. If you have a very large number of refs in the alternate repository, just accessing and processing those refs can be more expensive than walking the history graph in the first place. This is the case for us at GitHub, where our alternates have the refs for _all_ of the forks of a given project. So I would want some flag to turn this behavior off. Of course, we are in an exceptional circumstance at GitHub, and that is no reason the topic cannot go upstream (we already carry custom patches to disable alternates for things like receive-pack, and could do the same here). So that is not a good reason not to submit, only an explanation why I have not yet bothered to spend the time on it. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html