On Thu, Nov 30, 2023 at 11:18:57AM +0100, Patrick Steinhardt wrote: > > Instead, teach `pack-objects` a special `--ignore-disjoint` which is the > > moral equivalent of marking the set of disjoint packs as kept, and > > ignoring their contents, even if it would have otherwise been packed. In > > fact, this similarity extends down to the implementation, where each > > disjoint pack is first loaded, then has its `pack_keep_in_core` bit set. > > > > With this in place, we can use the kept-pack cache from 20b031fede > > (packfile: add kept-pack cache for find_kept_pack_entry(), 2021-02-22), > > which looks up objects first in a cache containing just the set of kept > > (in this case, disjoint) packs. Assuming that the set of disjoint packs > > is a relatively small portion of the entire repository (which should be > > a safe assumption to make), each object lookup will be very inexpensive. > > This cought me by surprise a bit. I'd have expected that in the end, > most of the packfiles in a repository would be disjoint. Using for > example geometric repacks, my expectation was that all of the packs that > get written via geometric repacking would eventually become disjoint > whereas new packs added to the repository would initially not be. Which part are you referring to here? If you're referring to the part where I say that the set of disjoint packs is relatively small in proposition to the rest of the packs, I think I know where the confusion is. I'm not saying that the set of disjoint packs is small in comparison to the rest of the repository by object count, but rather by count of packs overall. You're right that packs from pushes will not be guaranteed to be disjoint upon entering the repository, but will become disjoint when geometrically repacked (assuming that the caller uses --ignore-disjoint when repacking). Thanks, Taylor