René Scharfe <l.s.r@xxxxxx> wrote: > Am 06.08.21 um 17:31 schrieb Andrzej Hunt: > > On 29/06/2021 22:53, Eric Wong wrote: > >> [...snip...] > >> diff --git a/oidtree.c b/oidtree.c > >> new file mode 100644 > >> index 0000000000..c1188d8f48 > >> --- /dev/null > >> +++ b/oidtree.c > >> +struct oidtree_node { > >> + /* n.k[] is used to store "struct object_id" */ > >> + struct cb_node n; > >> +}; > >> + > >> [... snip ...] > >> + > >> +void oidtree_insert(struct oidtree *ot, const struct object_id *oid) > >> +{ > >> + struct oidtree_node *on; > >> + > >> + if (!ot->mempool) > >> + ot->mempool = allocate_alloc_state(); > >> + if (!oid->algo) > >> + BUG("oidtree_insert requires oid->algo"); > >> + > >> + on = alloc_from_state(ot->mempool, sizeof(*on) + sizeof(*oid)); > >> + oidcpy_with_padding((struct object_id *)on->n.k, oid); > > > > I think this object_id cast introduced undefined behaviour - here's > > my layperson's interepretation of what's going on (full UBSAN output > > is pasted below): > > > > cb_node.k is a uint8_t[], and hence can be 1-byte aligned (on my > > machine: offsetof(struct cb_node, k) == 21). We're casting its > > pointer to "struct object_id *", and later try to access > > object_id.hash within oidcpy_with_padding. My compiler assumes that > > an object_id pointer needs to be 4-byte aligned, and reading from a > > misaligned pointer means we hit undefined behaviour. (I think the > > 4-byte alignment requirement comes from the fact that object_id's > > largest member is an int?) I seem to recall struct alignment requirements being architecture-dependent; and x86/x86-64 are the most liberal w.r.t alignment requirements. > > I'm not sure what an elegant and idiomatic fix might be - IIUC it's > > hard to guarantee misaligned access can't happen with a flex array > > that's being used for arbitrary data (you would presumably have to > > declare it as an array of whatever the largest supported type is, so > > that you can guarantee correct alignment even when cbtree is used > > with that type) - which might imply that k needs to be declared as a > > void pointer? That in turn would make cbtree.c harder to read. > > C11 has alignas. We could also make the member before the flex array, > otherbits, wider, e.g. promote it to uint32_t. Ugh, no. cb_node should be as small as possible and (for our current purposes) ->byte could be uint8_t. > A more parsimonious solution would be to turn the int member of struct > object_id, algo, into an unsigned char for now and reconsider the issue > once we support our 200th algorithm or so. Yes, making struct object_id smaller would benefit all git users (at least for the next few centuries :P). > This breaks notes, though. > Its GET_PTR_TYPE seems to require struct leaf_node to have 4-byte > alignment for some reason. That can be ensured by adding an int member. Adding a 4-byte int to leaf_node after shaving 6-bytes off two object_id structs would mean a net savings of 2 bytes; sounds good to me. I don't know much about notes nor the associated code, but I also wonder if crit-bit tree can be used there, too. > Anyway, with either of these fixes UBSan is still unhappy about a > different issue. Here's a patch for that: Thanks <snip>