On 2019.04.04 19:08, Jeff King wrote: > On Thu, Apr 04, 2019 at 03:53:56PM -0700, Josh Steadmon wrote: > > > For large repositories, enumerating the list of all promisor objects (in > > order to exclude them from a rev-list walk) can take a significant > > amount of time). > > > > When --exclude-promisor-objects is passed to rev-list, don't enumerate > > the promisor objects. Instead, filter them (and any children objects) > > during the actual graph walk. > > Yeah, this is definitely the approach I was thinking of. > > Did you (or anybody else) have any thoughts on the case where a given > object is referred to both by a promisor and a non-promisor (and we > don't have it)? That's the "shortcut" I think we're taking here: we > would no longer realize that it's available via the promisor when we > traverse to it from the non-promisor. I'm just not clear on whether that > can ever happen. I am not sure either. In process_blob() and process_tree() there are additional checks for whether missing blobs/trees are promisor objects using is_promisor_object()... but if we call that we undo the performance gains from this change. > > Helped-By: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > > Helped-By: Jeff King <peff@xxxxxxxx> > > Helped-By: Jonathan Nieder <jrnieder@xxxxxxxxx> > > > > Signed-off-by: Josh Steadmon <steadmon@xxxxxxxxxx> > > Minor nit, but these should all be part of the same block. Will fix in v3. > > diff --git a/list-objects.c b/list-objects.c > > index dc77361e11..d1eaa0999e 100644 > > --- a/list-objects.c > > +++ b/list-objects.c > > @@ -30,6 +30,7 @@ static void process_blob(struct traversal_context *ctx, > > struct object *obj = &blob->object; > > size_t pathlen; > > enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW; > > + struct object_info oi = OBJECT_INFO_INIT; > > > > if (!ctx->revs->blob_objects) > > return; > > @@ -37,6 +38,11 @@ static void process_blob(struct traversal_context *ctx, > > die("bad blob object"); > > if (obj->flags & (UNINTERESTING | SEEN)) > > return; > > + if (ctx->revs->exclude_promisor_objects && > > + !oid_object_info_extended(the_repository, &obj->oid, &oi, 0) && > > + oi.whence == OI_PACKED && > > + oi.u.packed.pack->pack_promisor) > > + return; > > This conditional gets repeated a lot in your patch. Perhaps it's worth a > helper so we can say: > > if (skip_promisor_object(&ctx->revs, &obj->oid)) > return; > > in each place? Will fix in v3. > One other possible small optimization: we don't look up the object > unless the caller asked to exclude promisors, which is good. But we > could also keep a single flag for "is there a promisor pack at all?". > When there isn't, we know there's no point in looking for the object. > > It might not matter much in practice. The main caller here is going to > be check_connected(), and it only passes --exclude-promisor-objects if > it's in a partial clone. I'm not necessarily opposed, but I'm leaning towards the "won't matter much" side. Where would such a flag live, in this case, and who would be responsible for initializing it? I guess it would only matter for rev-list, so we could initialize it in cmd_rev_list() if --exclude-promisor-objects is passed? > > [...] > > I didn't see any tweaks to the callers, which makes sense; we're already > passing --exclude-promisor-objects as necessary. Which means by itself, > this patch should be making things faster, right? Do you have timings to > show that off? Yeah, for a partial clone of a large-ish Android repo [1], we see the connectivity check go from >180s to ~7s. [1]: https://android.googlesource.com/platform/frameworks/base/