On 2/10/21 6:03 PM, Taylor Blau wrote: > Implement reading for multi-pack reverse indexes, as described in the > previous patch. > > Note that these functions don't yet have any callers, and won't until > multi-pack reachability bitmaps are introduced in a later patch series. > In the meantime, this patch implements some of the infrastructure > necessary to support multi-pack bitmaps. > > There are three new functions exposed by the revindex API: > > - load_midx_revindex(): loads the reverse index corresponding to the > given multi-pack index. > > - midx_to_pack_pos() and pack_pos_to_midx(): these convert between the > multi-pack index and pseudo-pack order. > > load_midx_revindex() and pack_pos_to_midx() are both relatively > straightforward. > > load_midx_revindex() needs a few functions to be exposed from the midx > API. One to get the checksum of a midx, and another to get the .rev's > filename. Similar to recent changes in the packed_git struct, three new > fields are added to the multi_pack_index struct: one to keep track of > the size, one to keep track of the mmap'd pointer, and another to point > past the header and at the reverse index's data. > > pack_pos_to_midx() simply reads the corresponding entry out of the > table. > > midx_to_pack_pos() is the trickiest, since it needs to find an object's > position in the psuedo-pack order, but that order can only be recovered > in the .rev file itself. This mapping can be implemented with a binary > search, but note that the thing we're binary searching over isn't an > array, but rather a _permutation_. > > So, when comparing two items, it's helpful to keep in mind the > difference. Instead of a traditional binary search, where you are > comparing two things directly, here we're comparing a (pack, offset) > tuple with an index into the multi-pack index. That index describes > another (pack, offset) tuple, and it is _those_ two tuples that are > compared. > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > midx.c | 11 +++++ > midx.h | 6 +++ > pack-revindex.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++ > pack-revindex.h | 46 ++++++++++++++++++++ > packfile.c | 3 ++ > 5 files changed, 178 insertions(+) > > diff --git a/midx.c b/midx.c > index bf258c4fde..12bfce8bb1 100644 > --- a/midx.c > +++ b/midx.c > @@ -48,11 +48,22 @@ static uint8_t oid_version(void) > } > } > > +static const unsigned char *get_midx_checksum(struct multi_pack_index *m) > +{ > + return m->data + m->data_len - the_hash_algo->rawsz; 'struct multi_pack_index' has a 'hash_len' member that you could use here. It would allow a different hash length in the stored file than the one required by the repository. Except... > +} > + > static char *get_midx_filename(const char *object_dir) > { > return xstrfmt("%s/pack/multi-pack-index", object_dir); > } > > +char *get_midx_rev_filename(struct multi_pack_index *m) > +{ > + return xstrfmt("%s/pack/multi-pack-index-%s.rev", > + m->object_dir, hash_to_hex(get_midx_checksum(m))); ...this assumes the hash is of the same length as the_hash_algo, so you are doing the right thing. Currently, I think we check that 'm->hash_len == the_hash_algo->rawsz' on load. We'll need to check this again later when in the transition phase of the new hash work. (No changes are needed to your patch.) Thanks, -Stolee