> On Wed, Apr 03, 2019 at 10:27:21AM -0700, Josh Steadmon wrote: > > > For partial clones, doing a full connectivity check is wasteful; we skip > > promisor objects (which, for a partial clone, is all known objects), and > > excluding them all from the connectivity check can take a significant > > amount of time on large repos. > > > > At most, we want to make sure that we get the objects referred to by any > > wanted refs. For partial clones, just check that these objects were > > transferred. > > This isn't strictly true, since we could get objects from elsewhere via > --shared or --reference. Those might not be promisor objects. I don't think local clones (which --shared or --reference implies) can be partial, but the bigger point is below. > Shouldn't we be able to stop a traversal as soon as we see that an > object is in a promisor pack? > > I.e., here: > > > + if (opt->check_refs_only) { > > + /* > > + * For partial clones, we don't want to walk the full commit > > + * graph because we're skipping promisor objects anyway. We > > + * should just check that objects referenced by wanted refs were > > + * transferred. > > + */ > > + do { > > + if (!repo_has_object_file(the_repository, &oid)) > > + return 1; > > + } while (!fn(cb_data, &oid)); > > + return 0; > > + } > > for each object where you ask "do we have this?" we could, for the same > cost, ask "do we have this in a promisor pack?". And the answer would be > yes for each in a partial clone. > > But that would also cleanly handle --shared, etc, that I mentioned. And > more importantly, it would _also_ work on fetches. If I fetch from you > and get a promisor pack, then there is no point in me enumerating every > tree you sent. As long as you gave me a tip tree, then you have promised > that you'd give me all the others if I ask. > > So it seems like this should be a feature of the child rev-list, to stop > walking the graph at any object that is in a promisor pack. We currently already do a less optimal version of this - we pass --exclude-promisor-objects to rev-list, which indeed stops traversal at any promisor objects (whether in a promisor pack or referenced by one). As far as I know, the problem is that to do so, we currently enumerate all the objects in all promisor packs, and all objects that those objects reference (which means we inflate them too) - so that we have an oidset that we can check objects against. A partial solution is for is_promisor_object() to first check if the given object is in a promisor pack, avoiding generating the set of promisor objects until necessary. This would work in a blob:none clone with the refs pointing all to commits or all to blobs, but would not work in a tree:none clone (or maybe, in this case, the clone would be small enough that performance is not a concern, hmm). Maybe the ideal solution is for rev-list to check if an object is in a promisor pack and if --exclude-promisor-objects is active, we do not follow any outgoing links.