Re: [PATCH 35/53] Convert the verify_pack callback to struct object_id

Jeff King <peff@xxxxxxxx> · Thu, 27 Apr 2017 01:52:09 -0400

On Sun, Apr 23, 2017 at 09:34:35PM +0000, brian m. carlson wrote:

> Make the verify_pack_callback take a pointer to struct object_id.

That sounds good.

> Use a
> struct object_id to hold the pack checksum, even though it is not
> strictly an object ID.  Doing so ensures resilience against future hash
> size changes, and allows us to remove hard-coded assumptions about how
> big the buffer needs to be.

But this part seems questionable to me. Sure, we may change the pack
checksum in the future. But there is a reasonable chance that it won't
follow the same rules for selecting a hash as object_id. And even if it
does, calling it object_id just seems misleading.

What's the gain in converting it here? I know we want to get rid of the
bare "20", but we could switch it out for GIT_SHA1_RAWSZ. I suspect you
prefer in the long run to get rid of even those GIT_SHA1_RAWSZ defines,
though.  Could we define a new struct csumfile_hash, csumfile_cmp, etc
(and arguably change the name of "struct sha1file" and friends).  They'd
be boring wrappers around sha1 now, but I think the endgame will involve
us being able to read multiple versions of packs, with distinct
checksum algorithms.

> Also, use a union to convert the pointer from nth_packed_object_sha1 to
> to a pointer to struct object_id.  This behavior is compatible with GCC
> and clang and explicitly sanctioned by C11.  The alternatives are to
> just perform a cast, which would run afoul of strict aliasing rules, but
> should just work, and changing the pointer into an instance of struct
> object_id and copying the value.  The latter operation could seriously
> bloat memory usage on fsck, which already uses a lot of memory on some
> repositories.

Nasty, but it's probably the least-bad option.

-Peff