> On Mon, Apr 05, 2021 at 03:02:33AM +0200, Rafael Silva wrote: > > > When I was digging into the code and adding trace2_region_*() calls, I > > notice most of the time spent on the `git gc` (for the reported > > situation) was in: > > > > # In builtin/pack-objects.c > > static void get_object_list(int ac, const char **av) > > { > > ... > > if (unpack_unreachable) > > loosen_unused_packed_objects(); > > ... > > } > > Yeah, good find. Agreed! > This is my first time looking at the repacking strategy for partial > clones. It looks like we run an initial pack-objects to cover all the > promisor objects, and then do the "real" repack for everything else, > with "--exclude-promisor-objects". That is correct - we need two separate packs because one of them needs to be accompanied by a separate ".promisor" file to show that those objects are from the promisor remote. > The purpose of loosen_unused_packed_objects() is to catch any objects > that will be lost when our caller deletes all of the packs. But in this > case, those promisor objects are in a pack which won't be deleted, so > they should not be included. Makes sense. > > I'm not entirely sure about this (not this late in the day), but it seems to > > me that we should simply skip the "missing" (promisor) files when > > operating on a partial clone. > > > > Perhaps something like: > > > > --- >8 --- > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index 525c2d8552..fedf58323d 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -3468,6 +3468,8 @@ static int loosened_object_can_be_discarded(const struct object_id *oid, > > { > > if (!unpack_unreachable_expiration) > > return 0; > > + if (exclude_promisor_objects && is_promisor_object(oid)) > > + return 1; > > if (mtime > unpack_unreachable_expiration) > > return 0; > > if (oid_array_lookup(&recent_objects, oid) >= 0) > > --- >8 --- I think this will work. The only thing we might need to watch out for is that if we repack with --exclude-promisor-objects, some might say that every excluded object should be loosened. I don't think that's true in this case, because (1) the object is not unreachable and the argument talks about loosening unreachable objects, (2) promisor objects can be re-obtained from the promisor remote anyway. So I think we're fine. But I think Peff suggested something better below. > In the loop in loosen_unused_packed_objects(), we skip packs that are > marked as "keep", so we'd skip the new promisor pack entirely. But we'd > still see all these objects in the _old_ promisor pack. However, for > each object there, we call has_sha1_pack_kept_or_nonlocal(), so that > would likewise realize that each object is already being kept in the > other pack. > > Something like this seems to work, but I only lightly tested it, and it > could probably use some refactoring to make it less horrible: > > diff --git a/builtin/repack.c b/builtin/repack.c > index fdee8e4578..457525953a 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -574,6 +574,23 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > repack_promisor_objects(&po_args, &names); > > if (existing_packs.nr && delete_redundant) { > + /* > + * tell pack-objects about our new promisor pack, which > + * we will also be keeping > + */ > + for_each_string_list_item(item, &names) { > + /* > + * yuck, we seem to only have the name with the > + * packdir prefixed > + */ > + const char *prefix; > + if (!skip_prefix(packtmp, packdir, &prefix) || > + *prefix++ != '/') > + BUG("confused by packtmp"); > + strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", > + prefix, item->string); > + } > + > if (unpack_unreachable) { > strvec_pushf(&cmd.args, > "--unpack-unreachable=%s", > > Do you want to try to work with that? It seems to me that this would work. The only part I was confused about is "packtmp", but that is just the pack directory plus a specific prefix, so "pack-objects" will indeed see that packfile as being part of the repository - no problem. repack_promisor_objects() might be able to be refactored to provide the names in a format that we want, but looking at it, I don't think it's possible (it just uses "packtmp", so we have the same "packtmp" problem).