Re: [PATCH v2 01/24] builtin/pack-objects: make hash agnostic

Jeff King <peff@xxxxxxxx> · Sun, 23 Feb 2020 16:57:59 -0500

On Sat, Feb 22, 2020 at 08:17:26PM +0000, brian m. carlson wrote:

> Avoid hard-coding a hash size, instead preferring to use the_hash_algo.
> [...]
> -			hashwrite(out, base_sha1, 20);
> +			hashwrite(out, base_sha1, the_hash_algo->rawsz);

Yeah, looks obviously correct (and I think this is new from the
pack-reuse patches of mine that Christian sent recently).

The name "base_sha1" is clearly not great either. It could be changed
trivially, but the real sin is that it comes from
nth_packed_object_sha1(). It could be replaced with
nth_packed_object_oid() at the cost of an extra hash copy, which isn't
too bad.

It would be nice to get rid of that function entirely. In most cases,
it's either free to do so (we end up copying the result into an oid
anyway) or we pay for one extra hashcpy (out of the mmap into a local
struct). But the one in pack-check.c:verify_packfile() is tricky; we
store a pointer per object into the index mmap, and we'd have to switch
to storing an oid per object. Given that the code isn't commonly run
(and other operations like _generating_ the index in the first place are
clearly OK with the extra memory hit), I think I'd be OK with that in
the name of cleanliness.

All of that is sort of orthogonal to your goals, I think, so I don't
mind at all calling it out of scope for your series. As long as we use
the_hash_algo->rawsz, these bare pointer accesses aren't wrong (it's
just that it's easy to accidentally get it wrong, as this code shows).

I think it would also be correct to cast the mmap'd bytes to a "struct
object_id" given that the struct contains the hash bytes as the first
member. I worry a little that we'd get no compiler warning of the
breakage if that assumption changes, but it also seems unlikely to do
so.

-Peff