Re: [PATCH v3 02/13] pack-revindex: prepare for incremental MIDX bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 19, 2024 at 05:07:22PM -0500, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 4fa9dfc771a..bba9c6a905a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -170,6 +170,15 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
>  	return read_bitmap(index->map, index->map_size, &index->map_pos);
>  }
>  
> +static uint32_t bitmap_non_extended_bits(struct bitmap_index *index)
> +{
> +	if (index->midx) {
> +		struct multi_pack_index *m = index->midx;
> +		return m->num_objects + m->num_objects_in_base;
> +	}
> +	return index->pack->num_objects;
> +}
> 
>  static uint32_t bitmap_num_objects(struct bitmap_index *index)
>  {
>  	if (index->midx)

Okay, despite counting our own objects, we also need to account for any
objects that the MIDX layer that we depend on may refer to. I assume
that this is basically recursive, and that the base itself would also
account for its next layer, if any.

What is interesting to see after this commit is what callsites remain
for `bitmap_num_objects()`. Most of them are converted, but some still
exist:

  - `load_bitmap_header()`, where we use it to determine the size of the
    hash cache. Makes sense.

  - `pseudo_merge_bitmap_for_commit()`, where we use it to compute the
    merged bitmap of a specific commit. This one feels weird to me, I
    would have expected to use `bitmap_non_extended_bits()` here.

  - `filter_bitmap_blob_limit()`, where we seem to filter through the
    bitmap of the current layer. I _think_ it makes sense to retain.

  - `create_bitmap_mapping()`, which feels like it should be converted?

It would be nice to document in the commit message why those functions
don't need to be converted to help guide the reader a bit.

> diff --git a/pack-revindex.c b/pack-revindex.c
> index 22d3c234648..ce3f7ae2149 100644
> --- a/pack-revindex.c
> +++ b/pack-revindex.c
> @@ -383,8 +383,12 @@ int load_midx_revindex(struct multi_pack_index *m)
>  	trace2_data_string("load_midx_revindex", the_repository,
>  			   "source", "rev");
>  
> -	get_midx_filename_ext(&revindex_name, m->object_dir,
> -			      get_midx_checksum(m), MIDX_EXT_REV);
> +	if (m->has_chain)
> +		get_split_midx_filename_ext(&revindex_name, m->object_dir,
> +					    get_midx_checksum(m), MIDX_EXT_REV);
> +	else
> +		get_midx_filename_ext(&revindex_name, m->object_dir,
> +				      get_midx_checksum(m), MIDX_EXT_REV);
>  
>  	ret = load_revindex_from_disk(revindex_name.buf,
>  				      m->num_objects,

Here we teach the reverse index to read indices from MIDX chains.

> @@ -471,11 +475,15 @@ off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
>  
>  uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos)
>  {
> +	while (m && pos < m->num_objects_in_base)
> +		m = m->base_midx;
> +	if (!m)
> +		BUG("NULL multi-pack-index for object position: %"PRIu32, pos);
>  	if (!m->revindex_data)
>  		BUG("pack_pos_to_midx: reverse index not yet loaded");
> -	if (m->num_objects <= pos)
> +	if (m->num_objects + m->num_objects_in_base <= pos)
>  		BUG("pack_pos_to_midx: out-of-bounds object at %"PRIu32, pos);
> -	return get_be32(m->revindex_data + pos);
> +	return get_be32(m->revindex_data + pos - m->num_objects_in_base);
>  }
>  
>  struct midx_pack_key {

Here we teach the reverse index logic to walk the MIDX layers so that we
find the one that is supposed to contain a given position.

> @@ -491,7 +499,8 @@ static int midx_pack_order_cmp(const void *va, const void *vb)
>  	const struct midx_pack_key *key = va;
>  	struct multi_pack_index *midx = key->midx;
>  
> -	uint32_t versus = pack_pos_to_midx(midx, (uint32_t*)vb - (const uint32_t *)midx->revindex_data);
> +	size_t pos = (uint32_t*)vb - (const uint32_t *)midx->revindex_data;

Micronit: missing space between `uint32_t` and `*`.

> +	uint32_t versus = pack_pos_to_midx(midx, pos + midx->num_objects_in_base);
>  	uint32_t versus_pack = nth_midxed_pack_int_id(midx, versus);
>  	off_t versus_offset;
>  

Okay, the calculation to calculate the position is basically the same,
but we now also offset the position by the number of objects in
preceding layers. Makes sense.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux