Am 08.08.21 um 00:49 schrieb Eric Wong: > René Scharfe <l.s.r@xxxxxx> wrote: >> Am 06.08.21 um 17:31 schrieb Andrzej Hunt: >>> On 29/06/2021 22:53, Eric Wong wrote: >>>> [...snip...] >>>> diff --git a/oidtree.c b/oidtree.c >>>> new file mode 100644 >>>> index 0000000000..c1188d8f48 >>>> --- /dev/null >>>> +++ b/oidtree.c > >>>> +struct oidtree_node { >>>> + /* n.k[] is used to store "struct object_id" */ >>>> + struct cb_node n; >>>> +}; >>>> + >>>> [... snip ...] >>>> + >>>> +void oidtree_insert(struct oidtree *ot, const struct object_id *oid) >>>> +{ >>>> + struct oidtree_node *on; >>>> + >>>> + if (!ot->mempool) >>>> + ot->mempool = allocate_alloc_state(); >>>> + if (!oid->algo) >>>> + BUG("oidtree_insert requires oid->algo"); >>>> + >>>> + on = alloc_from_state(ot->mempool, sizeof(*on) + sizeof(*oid)); >>>> + oidcpy_with_padding((struct object_id *)on->n.k, oid); >>> >>> I think this object_id cast introduced undefined behaviour - here's >>> my layperson's interepretation of what's going on (full UBSAN output >>> is pasted below): >>> >>> cb_node.k is a uint8_t[], and hence can be 1-byte aligned (on my >>> machine: offsetof(struct cb_node, k) == 21). We're casting its >>> pointer to "struct object_id *", and later try to access >>> object_id.hash within oidcpy_with_padding. My compiler assumes that >>> an object_id pointer needs to be 4-byte aligned, and reading from a >>> misaligned pointer means we hit undefined behaviour. (I think the >>> 4-byte alignment requirement comes from the fact that object_id's >>> largest member is an int?) > > I seem to recall struct alignment requirements being > architecture-dependent; and x86/x86-64 are the most liberal > w.r.t alignment requirements. > >>> I'm not sure what an elegant and idiomatic fix might be - IIUC it's >>> hard to guarantee misaligned access can't happen with a flex array >>> that's being used for arbitrary data (you would presumably have to >>> declare it as an array of whatever the largest supported type is, so >>> that you can guarantee correct alignment even when cbtree is used >>> with that type) - which might imply that k needs to be declared as a >>> void pointer? That in turn would make cbtree.c harder to read. >> >> C11 has alignas. We could also make the member before the flex array, >> otherbits, wider, e.g. promote it to uint32_t. > > Ugh, no. cb_node should be as small as possible and (for our > current purposes) ->byte could be uint8_t. Well, we can make both byte and otherbits uint16_t. That would require a good comment explaining the reasoning and probably some rework later, but might be the least intrusive solution for now. >> A more parsimonious solution would be to turn the int member of struct >> object_id, algo, into an unsigned char for now and reconsider the issue >> once we support our 200th algorithm or so. > > Yes, making struct object_id smaller would benefit all git users > (at least for the next few centuries :P). True, we're currently using 4 bytes to distinguish between SHA-1 and SHA-256, i.e. to represent a single bit. Reducing the size of struct object_id from 36 bytes to 33 bytes seems quite significant. I don't know how important the 4-byte alignment is, though. cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) doesn't mention it, but the notes code seems to rely on it -- strange. Overall this seems to be a good way to go -- after the next release. René