Re: [PATCH v3 01/13] Documentation: describe incremental MIDX bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 28, 2025 at 11:01:04AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:19PM -0500, Taylor Blau wrote:
> > diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
> > index cc063b30bea..a063262c360 100644
> > --- a/Documentation/technical/multi-pack-index.txt
> > +++ b/Documentation/technical/multi-pack-index.txt
> > @@ -164,6 +164,70 @@ objects_nr($H2) + objects_nr($H1) + i
> >  (in the C implementation, this is often computed as `i +
> >  m->num_objects_in_base`).
> >
> > +=== Pseudo-pack order for incremental MIDXs
> > +
> > +The original implementation of multi-pack reachability bitmaps defined
> > +the pseudo-pack order in linkgit:gitformat-pack[5] (see the section
> > +titled "multi-pack-index reverse indexes") roughly as follows:
> > +
> > +____
> > +In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
> > +objects in packs stored by the MIDX, laid out in pack order, and the
> > +packs arranged in MIDX order (with the preferred pack coming first).
> > +____
> > +
> > +In the incremental MIDX design, we extend this definition to include
> > +objects from multiple layers of the MIDX chain. The pseudo-pack order
> > +for incremental MIDXs is determined by concatenating the pseudo-pack
> > +ordering for each layer of the MIDX chain in order. Formally two objects
> > +`o1` and `o2` are compared as follows:
> > +
> > +1. If `o1` appears in an earlier layer of the MIDX chain than `o2`, then
> > +  `o1` is considered less than `o2`.
>
> Just as a refresher for myself: what is the consequence of an object
> `o1` sorting earlier than `o2`? In the case where those refer to
> different objects it is only used to establish the pseudo-pack order so
> that we know how to interpret the bitmaps. But in the case where those
> two objects refer to the same underlying object, e.g. because the object
> is contained in two packs, it also impacts which of both objects would
> be preferred e.g. during a clone, right?

Great question -- the pseudo-pack order here is how we translate the set
of objects in a MIDX into their corresponding bit positions in the
bitmap.

So if "o1" sorts ahead of "o2", that means that "o1" will appear in an
earlier bit position than "o2". But note that we're talking about
objects in a MIDX chain here, comprised of objects from each MIDX'd layer of
that chain. So by that point the duplicates have already been filtered
out, since:

  - The MIDX only stores one copy of an object in any given MIDX, and

  - The incremental MIDX design avoids putting objects from earlier
    layers in later ones.

I tried to get at this a few lines up with "[...] a MIDX's pseudo-pack
is the de-duplicated concatenation of [...]" to make clear that o1 != o2
here. But let me know if you think I should clarify or emphasize that
point further.

> > +2. Otherwise, if `o1` and `o2` appear in the same MIDX layer, and that
> > +  MIDX layer has no base, then If one of `pack(o1)` and `pack(o2)` is
>
> s/If/if

Great catch, thanks!

> > +  preferred and the other is not, then the preferred one sorts first. If
> > +  there is a base layer (i.e. the MIDX layer is not the first layer in
> > +  the chain), then if `pack(o1)` appears earlier in that MIDX layer's
> > +  pack order, than `o1` is less than `o2`. Likewise if `pack(o2)`
> > +  appears earlier, than the opposite is true.
>
> Another question for my own understanding: why is it relevant whether we
> have a base or not? I would have expected that the case where the
> objects appear in two different layers is already covered by (1), so
> from thereon we only need to care about two objects existing in the same
> layer.

Good question. Throughout this design I'm trying to get rid of the
concept of a "preferred" pack with respect to the MIDX. Before
multi-pack reuse existed, the idea behind having a preferred pack was
that it was a way to indicate which pack we should prioritize for
pack-reuse.

But now that we can reuse objects from any pack stored in a MIDX, the
concept of a preferred pack doesn't need to exist. It still makes some
sense (if you want to use single-pack reuse for some reason, etc.) but
I'm trying to push us away from it.

So to answer your question of "why does it matter if there is a base
or not?" the reason is that this series treats the preferred pack as a
property of the chain instead of the individual layers. And the
mention of it here is to differentiate between how we compare packs in
the base (favoring the preferred pack) versus subsequent layers
(reflecting the pack order within that layer).

> > +3. Otherwise, `o1` and `o2` appear in the same pack, and thus in the
> > +  same MIDX layer. Sort `o1` and `o2` by their offset within their
> > +  containing packfile.
> > +
> > +=== Reachability bitmaps and incremental MIDXs
> > +
> > +Each layer of an incremental MIDX chain may have its objects (and the
> > +objects from any previous layer in the same MIDX chain) represented in
> > +its own `*.bitmap` file.
> > +
> > +The structure of a `*.bitmap` file belonging to an incremental MIDX
> > +chain is identical to that of a non-incremental MIDX bitmap, or a
> > +classic single-pack bitmap. Since objects are added to the end of the
> > +incremental MIDX's pseudo-pack order (see: above), it is possible to
> > +extend a bitmap when appending to the end of a MIDX chain.
> > +
> > +(Note: it is possible likewise to compress a contiguous sequence of MIDX
> > +incremental layers, and their `*.bitmap`(s) into a single layer and
> > +`*.bitmap`, but this is not yet implemented.)
>
> Fair. What do we currently do in this context? Do we just keep on
> appending layer after layer?

That's right. At this point in the project we only know how to append
layers and compress the whole chain into a single layer. But
fundamentally it is possible to compress any contiguous sub-sequence of
the chain into a single layer, and part three of this project will do
just that.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux