Re: [PATCH 7/9] pack-revindex: read multi-pack reverse indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/10/21 6:03 PM, Taylor Blau wrote:
> Implement reading for multi-pack reverse indexes, as described in the
> previous patch.
> 
> Note that these functions don't yet have any callers, and won't until
> multi-pack reachability bitmaps are introduced in a later patch series.
> In the meantime, this patch implements some of the infrastructure
> necessary to support multi-pack bitmaps.
> 
> There are three new functions exposed by the revindex API:
> 
>   - load_midx_revindex(): loads the reverse index corresponding to the
>     given multi-pack index.
> 
>   - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
>     multi-pack index and pseudo-pack order.
> 
> load_midx_revindex() and pack_pos_to_midx() are both relatively
> straightforward.
> 
> load_midx_revindex() needs a few functions to be exposed from the midx
> API. One to get the checksum of a midx, and another to get the .rev's
> filename. Similar to recent changes in the packed_git struct, three new
> fields are added to the multi_pack_index struct: one to keep track of
> the size, one to keep track of the mmap'd pointer, and another to point
> past the header and at the reverse index's data.
> 
> pack_pos_to_midx() simply reads the corresponding entry out of the
> table.
> 
> midx_to_pack_pos() is the trickiest, since it needs to find an object's
> position in the psuedo-pack order, but that order can only be recovered
> in the .rev file itself. This mapping can be implemented with a binary
> search, but note that the thing we're binary searching over isn't an
> array, but rather a _permutation_.
> 
> So, when comparing two items, it's helpful to keep in mind the
> difference. Instead of a traditional binary search, where you are
> comparing two things directly, here we're comparing a (pack, offset)
> tuple with an index into the multi-pack index. That index describes
> another (pack, offset) tuple, and it is _those_ two tuples that are
> compared.
> 
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  midx.c          |  11 +++++
>  midx.h          |   6 +++
>  pack-revindex.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++
>  pack-revindex.h |  46 ++++++++++++++++++++
>  packfile.c      |   3 ++
>  5 files changed, 178 insertions(+)
> 
> diff --git a/midx.c b/midx.c
> index bf258c4fde..12bfce8bb1 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -48,11 +48,22 @@ static uint8_t oid_version(void)
>  	}
>  }
>  
> +static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
> +{
> +	return m->data + m->data_len - the_hash_algo->rawsz;

'struct multi_pack_index' has a 'hash_len' member that you could
use here. It would allow a different hash length in the stored
file than the one required by the repository. Except...

> +}
> +
>  static char *get_midx_filename(const char *object_dir)
>  {
>  	return xstrfmt("%s/pack/multi-pack-index", object_dir);
>  }
>  
> +char *get_midx_rev_filename(struct multi_pack_index *m)
> +{
> +	return xstrfmt("%s/pack/multi-pack-index-%s.rev",
> +		       m->object_dir, hash_to_hex(get_midx_checksum(m)));

...this assumes the hash is of the same length as the_hash_algo,
so you are doing the right thing. Currently, I think we check
that 'm->hash_len == the_hash_algo->rawsz' on load. We'll need
to check this again later when in the transition phase of the
new hash work.

(No changes are needed to your patch.)

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux