On Fri, Mar 14, 2025 at 1:18 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > Prepare to implement support for reachability bitmaps for the new > incremental multi-pack index (MIDX) feature over the following commits. > > This commit begins by first describing the relevant format and usage > details for incremental MIDX bitmaps. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > Documentation/technical/multi-pack-index.adoc | 71 +++++++++++++++++++ > 1 file changed, 71 insertions(+) > > diff --git a/Documentation/technical/multi-pack-index.adoc b/Documentation/technical/multi-pack-index.adoc > index cc063b30be..ab98ecfeb9 100644 > --- a/Documentation/technical/multi-pack-index.adoc > +++ b/Documentation/technical/multi-pack-index.adoc > @@ -164,6 +164,77 @@ objects_nr($H2) + objects_nr($H1) + i > (in the C implementation, this is often computed as `i + > m->num_objects_in_base`). > > +=== Pseudo-pack order for incremental MIDXs > + > +The original implementation of multi-pack reachability bitmaps defined > +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section > +titled "multi-pack-index reverse indexes") roughly as follows: > + > +____ > +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of > +objects in packs stored by the MIDX, laid out in pack order, and the > +packs arranged in MIDX order (with the preferred pack coming first). > +____ > + > +In the incremental MIDX design, we extend this definition to include > +objects from multiple layers of the MIDX chain. The pseudo-pack order > +for incremental MIDXs is determined by concatenating the pseudo-pack > +ordering for each layer of the MIDX chain in order. Formally two objects > +`o1` and `o2` are compared as follows: > + > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then > + `o1` is considered less than `o2`. For sorting order, 'less than' doesn't tell us if you are sorting smallest to greatest or greatest to smallest. Maybe "less than (so its order is earlier than) `o2'" ? > + > +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that > + MIDX layer has no base, then if one of `pack(o1)` and `pack(o2)` is > + preferred and the other is not, then the preferred one sorts first. If > + there is a base layer (i.e. the MIDX layer is not the first layer in > + the chain), then if `pack(o1)` appears earlier in that MIDX layer's > + pack order, than `o1` is less than `o2`. Likewise if `pack(o2)` s/than/then/ > + appears earlier, than the opposite is true. s/than/then/ > + > +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the > + same MIDX layer. Sort `o1` and `o2` by their offset within their > + containing packfile. > + > +Note that the preferred pack is a property of the MIDX chain, not the > +individual layers themselves. Fundamentally we could introduce a > +per-layer preferred pack, but this is less relevant now that we can > +perform multi-pack reuse across the set of packs in a MIDX. > + > +=== Reachability bitmaps and incremental MIDXs > + > +Each layer of an incremental MIDX chain may have its objects (and the > +objects from any previous layer in the same MIDX chain) represented in > +its own `*.bitmap` file. > + > +The structure of a `*.bitmap` file belonging to an incremental MIDX > +chain is identical to that of a non-incremental MIDX bitmap, or a > +classic single-pack bitmap. Since objects are added to the end of the > +incremental MIDX's pseudo-pack order (see: above), it is possible to drop the colon? > +extend a bitmap when appending to the end of a MIDX chain. > + > +(Note: it is possible likewise to compress a contiguous sequence of MIDX > +incremental layers, and their `*.bitmap`(s) into a single layer and > +`*.bitmap`, but this is not yet implemented.) "`*.bitmap`(s)" feels slightly awkward and only saves 2 characters. Maybe just "`*.bitmap` files"? > + > +The object positions used are global within the pseudo-pack order, so > +subsequent layers will have, for example, `m->num_objects_in_base` > +number of `0` bits in each of their four type bitmaps. This follows from > +the fact that we only write type bitmap entries for objects present in > +the layer immediately corresponding to the bitmap). > + > +Note also that only the bitmap pertaining to the most recent layer in an > +incremental MIDX chain is used to store reachability information about > +the interesting and uninteresting objects in a reachability query. > +Earlier bitmap layers are only used to look up commit and pseudo-merge > +bitmaps from that layer, as well as the type-level bitmaps for objects > +in that layer. > + > +To simplify the implementation, type-level bitmaps are iterated > +simultaneously, and their results are OR'd together to avoid recursively > +calling internal bitmap functions. > + > Future Work > ----------- Should the patch also remove the first item from Future Work, since this series is implementing it? > -- > 2.49.0.13.gd0d564685b