On Tue, Nov 28, 2023 at 02:08:18PM -0500, Taylor Blau wrote: > Now that we can generate packs which are disjoint with respect to the > set of currently-disjoint packs, implement a mode of `git repack` which > extends the set of disjoint packs with any new (non-cruft) pack(s) > generated during the repack. > > The idea is mostly straightforward, with a couple of gotcha's. The > straightforward part is to make sure that any new packs are disjoint > with respect to the set of currently disjoint packs which are _not_ > being removed from the repository as a result of the repack. > > If a pack which is currently marked as disjoint is, on the other hand, > about to be removed from the repository, it is OK (and expected) that > new pack(s) will contain some or all of its objects. Since the pack > originally marked as disjoint will be removed, it will necessarily leave > the disjoint set, making room for new packs with its same objects to > take its place. In other words, the resulting set of disjoint packs will > be disjoint with respect to one another. > > The gotchas mostly have to do with making sure that we do not generate a > disjoint pack in the following scenarios: Okay, let me verify whether I understand the reasons: > - promisor packs Which is because promisor packs actually don't contain any objects? > - cruft packs (which may necessarily need to include an object from a > disjoint pack in order to freshen it in certain circumstances) This one took me a while to figure out. If we'd mark crufts as disjoint, then it would mean that new packfiles cannot be marked as disjoint if objects which were previously unreachable do become reachable again. So we'd be pessimizing packfiles for live objects in favor of others which aren't. > - all-into-one repacks without '-d' Because here the old packfiles that this would make redundant aren't deleted and thus the objects are duplicate now. > - `--filter-to`, which conceptually could work with the new > `--extend-disjoint` option, but only in limited circumstances We're probably also not properly set up to handle the new alternate object directory and exclude objects that are part of a potentially disjoint packfile that exists already. Also, the current MIDX may not even cover the alternate. > Otherwise, we mark which packs were created as disjoint by using a new > bit in the `generated_pack_data` struct, and then marking those pack(s) > as disjoint accordingly when generating the MIDX. Non-deleted packs > which are marked as disjoint are retained as such by passing the > equivalent of `--retain-disjoint` when calling the MIDX API to update > the MIDX. Okay. I had a bit of trouble to sift through the various different flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint" and so on. But well, they do different things and it's been a few days since I've reviewed the preceding patches, so this is probably fine. One thing I wondered: do we need to consider the `-l` flag? When using an alternate object directory it is totally feasible that the alternate may be creating new disjoint packages without us knowing, and thus we may not be able to guarantee the disjoint property anymore. Patrick
Attachment:
signature.asc
Description: PGP signature