Re: [PATCH v3 02/13] pack-revindex: prepare for incremental MIDX bitmaps

Taylor Blau <me@xxxxxxxxxxxx> · Fri, 28 Feb 2025 18:39:07 -0500

On Fri, Feb 28, 2025 at 11:01:12AM +0100, Patrick Steinhardt wrote:
> On Tue, Nov 19, 2024 at 05:07:22PM -0500, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index 4fa9dfc771a..bba9c6a905a 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
> >  	return read_bitmap(index->map, index->map_size, &index->map_pos);
> >  }
> >
> > +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> > +{
> > +	if (index->midx) {
> > +		struct multi_pack_index *m = index->midx;
> > +		return m->num_objects + m->num_objects_in_base;
> > +	}
> > +	return index->pack->num_objects;
> > +}
> >
> >  static uint32_t bitmap_num_objects(struct bitmap_index *index)
> >  {
> >  	if (index->midx)
>
> Okay, despite counting our own objects, we also need to account for any
> objects that the MIDX layer that we depend on may refer to. I assume
> that this is basically recursive, and that the base itself would also
> account for its next layer, if any.
>
> What is interesting to see after this commit is what callsites remain
> for `bitmap_num_objects()`. Most of them are converted, but some still
> exist:
>
>   - `load_bitmap_header()`, where we use it to determine the size of the
>     hash cache. Makes sense.
>
>   - `pseudo_merge_bitmap_for_commit()`, where we use it to compute the
>     merged bitmap of a specific commit. This one feels weird to me, I
>     would have expected to use `bitmap_non_extended_bits()` here.

Great question. The reason is that this function determines a bitmap
which identifies the parents of a given commit, and that bitmap is
compared against the set of pseudo-merge commits we know about in a
given layer to determine whether or not we have a matching pseudo-merge.

I left a comment to that effect nearby since this is far from obvious
(including to me -- I had to take a few minutes to remember how all of
this works!).

>   - `filter_bitmap_blob_limit()`, where we seem to filter through the
>     bitmap of the current layer. I _think_ it makes sense to retain.
>
>   - `create_bitmap_mapping()`, which feels like it should be converted?

I *think* that this is OK because we are only remapping one layer at a
time, but I'll have to double check. I'd do so now, but I'm trying to
respond as much as I can before my week is over ;-).

> It would be nice to document in the commit message why those functions
> don't need to be converted to help guide the reader a bit.

The conversions here are case-by-case, so I lean towards documenting any
non-obvious ones inline with a brief comment (like the adjustment I made
above for pseudo_merge_bitmap_for_commit()).

> > @@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
> >  	const struct midx_pack_key *key = va;
> >  	struct multi_pack_index *midx = key->midx;
> >
> > -	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
> > +	size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;
>
> Micronit: missing space between `uint32_t` and `*`.

Great catch, thanks!

> > +	uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
> >  	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
> >  	off_t versus_offset;
> >
>
> Okay, the calculation to calculate the position is basically the same,
> but we now also offset the position by the number of objects in
> preceding layers. Makes sense.

Exactly.

Thanks,
Taylor