Jeff King <peff@xxxxxxxx> writes: > On Mon, Apr 05, 2021 at 03:02:33AM +0200, Rafael Silva wrote: > >> I'm not entirely sure about this (not this late in the day), but it seems to >> me that we should simply skip the "missing" (promisor) files when >> operating on a partial clone. >> >> Perhaps something like: >> >> --- >8 --- >> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >> index 525c2d8552..fedf58323d 100644 >> --- a/builtin/pack-objects.c >> +++ b/builtin/pack-objects.c >> @@ -3468,6 +3468,8 @@ static int loosened_object_can_be_discarded(const struct object_id *oid, >> { >> if (!unpack_unreachable_expiration) >> return 0; >> + if (exclude_promisor_objects && is_promisor_object(oid)) >> + return 1; >> if (mtime > unpack_unreachable_expiration) >> return 0; >> if (oid_array_lookup(&recent_objects, oid) >= 0) >> --- >8 --- > > you are avoiding writing out the file. But we should realize much > earlier that it is not something we need to even consider loosening. > > In the loop in loosen_unused_packed_objects(), we skip packs that are > marked as "keep", so we'd skip the new promisor pack entirely. But we'd > still see all these objects in the _old_ promisor pack. However, for > each object there, we call has_sha1_pack_kept_or_nonlocal(), so that > would likewise realize that each object is already being kept in the > other pack. > Agreed. Realizing sooner that we shouldn't even consider loosening the objects from the packfile it's better solution. > Something like this seems to work, but I only lightly tested it, and it > could probably use some refactoring to make it less horrible: > > diff --git a/builtin/repack.c b/builtin/repack.c > index fdee8e4578..457525953a 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -574,6 +574,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > repack_promisor_objects(&po_args, &names); > > if (existing_packs.nr && delete_redundant) { > + /* > + * tell pack-objects about our new promisor pack, which > + * we will also be keeping > + */ > + for_each_string_list_item(item, &names) { > + /* > + * yuck, we seem to only have the name with the > + * packdir prefixed > + */ > + const char *prefix; > + if (!skip_prefix(packtmp, packdir, &prefix) || > + *prefix++ != '/') > + BUG("confused by packtmp"); > + strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", > + prefix, item->string); > + } > + > if (unpack_unreachable) { > strvec_pushf(&cmd.args, > "--unpack-unreachable=%s", > > Do you want to try to work with that? > Yes, I'll try to work with that, together with refactoring that you mentioned in the code and the other replies. Thanks for the suggestion. >> A quick benchmark did show some promising result: >> >> # built from: 2e36527f23 (The sixth batch, 2021-04-02) >> Benchmark #1: ./bin-wrappers/git -C git.git gc >> Time (mean ± σ): 135.669 s ± 0.665 s [User: 42.789 s, System: 91.332 s] >> Range (min … max): 134.905 s … 136.115 s 3 runs >> >> # built from: 2e36527f23 + minor patch (from above) >> Benchmark #2: ./bin-wrappers/git -C git.git gc >> Time (mean ± σ): 12.586 s ± 0.031 s [User: 11.462 s, System: 1.365 s] >> Range (min … max): 12.553 s … 12.616 s 3 runs >> >> Summary: >> 'Benchmark #2' ran 10.78 ± 0.06 times faster than 'Benchmark #1' > > It's still quite a bit slower than a non-partial clone because the > traversal with --exclude-promisor-objects is slow. I think that's > because it has to open up all of the objects in the promisor pack to see > what they refer to. I don't know if we can do better (and it's largely > an orthogonal problem to what you're solving here, so it probably makes > sense to just punt on it for now). > > -Peff Make sense. -- Thanks Rafael