On Wed, 13 May 2020 at 02:56, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > > Both v2 pack index files and the v3 format specified as part of the > NewHash work have similar data starting at the CRC table. Much of the > existing code wants to read either this table or the offset entries > following it, and in doing so computes the offset each time. > > In order to share as much code between v2 and v3, compute the offset of > the CRC table and store it when the pack is opened. Use this value to > compute offsets to not only the CRC table, but to the offset entries > beyond it. > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p, > { > const uint32_t *idx1, *idx2; > uint32_t i; > - const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t); > > /* The address of the 4-byte offset table */ > - idx1 = (((const uint32_t *)p->index_data) > - + 2 /* 8-byte header */ > - + 256 /* fan out */ > - + hashwords * p->num_objects /* object ID table */ > + idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset)) > + p->num_objects /* CRC32 table */ > ); This counts in four-byte words (so `+ 2` skips ahead 8B as the comment notes). And that's why we need to use "rawsz/4". Not new in this patch, but that outer pair of parenthesis just makes this harder to read, IMHO. I keep scanning back and forth wondering, "where is this whole thing going to get multiplied or something?" idx1 = (const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset) + p->num_objects /* CRC32 table */; The double-casting can be avoided with something like this, but I'm not sure it's really any better: idx1 = (const uint32_t *)p->index_data + p->crc_offset/sizeof(uint32_t) + p->num_objects /* CRC32 table */; > --- a/packfile.c > +++ b/packfile.c > @@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map, > */ > (sizeof(off_t) <= 4)) > return error("pack too large for current definition of off_t in %s", path); > + p->crc_offset = 8 + 4 * 256 + nr * hashsz; > } > > p->index_version = version; It doesn't fit in the context, but `nr` will be assigned to `p->num_objects`. And now we can just use `hashsz` without dividing by 4, so this does the same calculation as the old one above. Martin