Re: t9210-scalar.sh fails with SANITIZE=undefined

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 22 Sep 2022 15:56:21 -0700

Jeff King <peff@xxxxxxxx> writes:

> I also wondered why other versions do not have a similar problem. After
> all, cache entries contain pathnames which are going to be of varying
> lengths. But this seems telling:
>
>   $ git grep -m1 -B1 -A2 align_padding_size
>   read-cache.c-/* These are only used for v3 or lower */
>   read-cache.c:#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len)
>   read-cache.c-#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,data) + (len) + 8) & ~7)
>   read-cache.c-#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)
>
> So we actually pad the entries in earlier versions to align them, but
> don't in v4. I'm not sure if that was a conscious choice to save space,
> or an unintended consequence (though it is mentioned in the docs, I
> think that came after the code).

I think we didn't even have on-disk vs in-core distinction in the
early index code.  The active_cache[] array used to be an array of
pointers into the (read-only) mmapped memory, peeking into on-disk
index we just "read".  Back when v4 was introduced, that arrangement
was (thought to be) long gone---we iterated over the mmapped memory
and used create_from_disk() to munge the on-disk representation into
a machine native form.  At that point, there was no point in having
the padding---we are supposed to be using get_beXX() and stuff
without having to worry about alignment requirements.