On Thu, Nov 30, 2023 at 11:18:45AM +0100, Patrick Steinhardt wrote: > > diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt > > index 9fcb29a9c8..658682ddd5 100644 > > --- a/Documentation/gitformat-pack.txt > > +++ b/Documentation/gitformat-pack.txt > > @@ -396,6 +396,22 @@ CHUNK DATA: > > is padded at the end with between 0 and 3 NUL bytes to make the > > chunk size a multiple of 4 bytes. > > > > + Disjoint Packfiles (ID: {'D', 'I', 'S', 'P'}) > > + Stores a table of three 4-byte unsigned integers in network order. > > + Each table entry corresponds to a single pack (in the order that > > + they appear above in the `PNAM` chunk). The values for each table > > + entry are as follows: > > + - The first bit position (in psuedo-pack order, see below) to > > s/psuedo/pseudo/ Good catch, thanks. Not sure how that escaped my spell-checker... > > +=== `DISP` chunk and disjoint packs > > + > > +The Disjoint Packfiles (`DISP`) chunk encodes additional information > > +about the objects in the multi-pack index's reachability bitmap. Recall > > +that objects from the MIDX are arranged in "pseudo-pack" order (see: > > The colon feels a bit out-of-place here, so: s/see:/see/ Thanks, I'll fix that up. > > +above) for reachability bitmaps. > > + > > +From the example above, suppose we have packs "a", "b", and "c", with > > +10, 15, and 20 objects, respectively. In pseudo-pack order, those would > > +be arranged as follows: > > + > > + |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| > > + > > +When working with single-pack bitmaps (or, equivalently, multi-pack > > +reachability bitmaps without any packs marked as disjoint), > > +linkgit:git-pack-objects[1] performs ``verbatim'' reuse, attempting to > > +reuse chunks of the existing packfile instead of adding objects to the > > +packing list. > > I'm not sure I full understand this paragraph. In the context of a > single pack bitmap it's clear enough. But I stumbled over the MIDX case, > because here we potentially have multiple packfiles, so it's not exactly > clear to me what you refer to with "the existing packfile" in that case. > I'd think that we perform verbatim reuse of the preferred packfile, > right? If so, we might want to make that a bit more explicit. Yep, sorry, I can see how that would be confusing. Since we're talking about the existing behavior at this point in the series (before multi-pack reuse is implemented), I changed this to: "reuse chunks of the bitmapped or preferred packfile [...]" Thanks for carefully reading and spotting my errors ;-). > > +object. This introduces an additional constraint over the set of packs > > +we may want to reuse. The most straightforward approach is to mandate > > +that the set of packs is disjoint with respect to the set of objects > > +contained in each pack. In other words, for each object `o` in the union > > +of all objects stored by the disjoint set of packs, `o` is contained in > > +exactly one pack from the disjoint set. > > Is this a property that usually holds for our normal housekeeping, or > does it require careful managing by the user/admin? How about geometric > repacking? At this point in the series, it would require careful managing to ensure that this is the case. In practice MIDX'd packs generated with a geometric repack are mostly disjoint, but definitely not guaranteed to be. Further down in this series we'll introduce new options to generate packs which are guaranteed to be disjoint with respect to the currently-marked set of packs in the DISP chunk. > > @@ -764,14 +807,22 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, > > * Take only the first duplicate. > > */ > > for (cur_object = 0; cur_object < fanout.nr; cur_object++) { > > - if (cur_object && oideq(&fanout.entries[cur_object - 1].oid, > > - &fanout.entries[cur_object].oid)) > > - continue; > > + struct pack_midx_entry *ours = &fanout.entries[cur_object]; > > + if (cur_object) { > > + struct pack_midx_entry *prev = &fanout.entries[cur_object - 1]; > > + if (oideq(&prev->oid, &ours->oid)) { > > + if (prev->disjoint && ours->disjoint) > > + die(_("duplicate object '%s' among disjoint packs '%s', '%s'"), > > + oid_to_hex(&prev->oid), > > + info[prev->pack_int_id].pack_name, > > + info[ours->pack_int_id].pack_name); > > Shouldn't we die if `prev->disjoint || ours->disjoint` instead of `&&`? > Even if one of the packs isn't marked as disjoint, it's still wrong if > the other one is and one of its objects exists in multiple packs. > > Or am I misunderstanding, and we only guarantee the disjoint property > across packfiles that are actually marked as such? Right, we only guarantee disjointed-ness among the set of packs that are marked disjoint. It's fine for the same object to appear in a disjoint and non-disjoint pack, and for both of those packs to end up in the MIDX. But that is only because we'll use the disjoint copy in our bitmap. If there were two packs that are marked as supposedly disjoint, but contain at least one duplicate of an object, then we will reject those packs as non-disjoint. Thanks, Taylor