On Fri, May 20, 2022 at 01:41:49PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > >> *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); > >> geometry = *geometry_p; > >> > >> + string_list_sort(existing_kept_packs); > > > > Would it be worth sorting this as early as in collect_pack_filenames()? > > For our purposes in this patch, this works as-is, but it may be > > defensive to try and minimize the time that list has unsorted contents. > > It does not matter too much but after reading the latest one before > coming back to this original thread, I actually thought that sorting > it too early was questionable. Nobody other than this code cares > about the list being sorted, and if it turns out that the look-up in > the loop need to be optimized, it is plausible that we do not need > this list sorted when we populate a hashmap out of its contents, for > example. My thinking was simply that sorting it as often as possible (by doing so immediately after appending all of our newly written packs into it) was the most defensive approach that we could take. At least in the sense of another caller which _does_ depend on `names` sorted, and makes the same mistake of forgetting to call `string_list_sort()` itself. I suspect that this is mostly academic anyway, and that either would be fine, so if others feel strongly I'm happy to send an updated version (though TBH I would be just as happy with what we already have in the series I posted here). > >> for (p = get_all_packs(the_repository); p; p = p->next) { > >> - if (!pack_kept_objects && p->pack_keep) > >> - continue; > >> + if (!pack_kept_objects) { > >> + if (p->pack_keep) > >> + continue; > > > > (You mentioned this to me off-list, but I'll repeat it here since it > > wasn't obvious to me on first read): this check for `p->pack_keep` isn't > > strictly necessary, since any packs that have their `pack_keep` bit set > > will appear in the `existing_kept_packs` list. > > > > But it does give us a fast path to avoid having to check that list, so > > it's worth checking that bit to avoid a slightly more expensive check > > where possible. > > That invites a related but different question. Doesn't p->pack_keep > and string_list_has_string(existing_kept_packs, name_of_pack(p)) > mirror each other? Can a pack appear on the existing_kept_packs > list without having .pack_keep bit set and under what condition? Unfortunately not. `repack` is much more keen to manipulate the results of calling readdir() over $GIT_DIR/objects/pack than it is to, say, call `get_all_packs()` and mark `p->pack_keep_in_core`. In a future where `repack` does not interact with the packdir at such a low level, I don't think we would have a use for a function like `collect_pack_filenames()` at all. But that is a much larger change than this one ;). > It is answered by the comment below, I presume? > > >> + /* > >> + * The pack may be kept via the --keep-pack option; > >> + * check 'existing_kept_packs' to determine whether to > >> + * ignore it. > >> + */ > > OK. So there are two classes of packs we want to exclude from the > geometry repacking. Those that already have .pack_keep bit set, and > those that are _are_ newly making into kept packs that do not yet > have .pack_keep bit set. Makes sense. Yep (and also "yep" to your note below about adding cruft packs into the mix). Thanks, Taylor