Am 06.08.21 um 17:31 schrieb Andrzej Hunt: > > > On 29/06/2021 22:53, Eric Wong wrote: >> [...snip...] >> diff --git a/oidtree.c b/oidtree.c >> new file mode 100644 >> index 0000000000..c1188d8f48 >> --- /dev/null >> +++ b/oidtree.c >> @@ -0,0 +1,94 @@ >> +/* >> + * A wrapper around cbtree which stores oids >> + * May be used to replace oid-array for prefix (abbreviation) matches >> + */ >> +#include "oidtree.h" >> +#include "alloc.h" >> +#include "hash.h" >> + >> +struct oidtree_node { >> + /* n.k[] is used to store "struct object_id" */ >> + struct cb_node n; >> +}; >> + >> [... snip ...] >> + >> +void oidtree_insert(struct oidtree *ot, const struct object_id *oid) >> +{ >> + struct oidtree_node *on; >> + >> + if (!ot->mempool) >> + ot->mempool = allocate_alloc_state(); >> + if (!oid->algo) >> + BUG("oidtree_insert requires oid->algo"); >> + >> + on = alloc_from_state(ot->mempool, sizeof(*on) + sizeof(*oid)); >> + oidcpy_with_padding((struct object_id *)on->n.k, oid); > > I think this object_id cast introduced undefined behaviour - here's > my layperson's interepretation of what's going on (full UBSAN output > is pasted below): > > cb_node.k is a uint8_t[], and hence can be 1-byte aligned (on my > machine: offsetof(struct cb_node, k) == 21). We're casting its > pointer to "struct object_id *", and later try to access > object_id.hash within oidcpy_with_padding. My compiler assumes that > an object_id pointer needs to be 4-byte aligned, and reading from a > misaligned pointer means we hit undefined behaviour. (I think the > 4-byte alignment requirement comes from the fact that object_id's > largest member is an int?) > > I'm not sure what an elegant and idiomatic fix might be - IIUC it's > hard to guarantee misaligned access can't happen with a flex array > that's being used for arbitrary data (you would presumably have to > declare it as an array of whatever the largest supported type is, so > that you can guarantee correct alignment even when cbtree is used > with that type) - which might imply that k needs to be declared as a > void pointer? That in turn would make cbtree.c harder to read. C11 has alignas. We could also make the member before the flex array, otherbits, wider, e.g. promote it to uint32_t. A more parsimonious solution would be to turn the int member of struct object_id, algo, into an unsigned char for now and reconsider the issue once we support our 200th algorithm or so. This breaks notes, though. Its GET_PTR_TYPE seems to require struct leaf_node to have 4-byte alignment for some reason. That can be ensured by adding an int member. Anyway, with either of these fixes UBSan is still unhappy about a different issue. Here's a patch for that: --- >8 --- Subject: [PATCH] object-file: use unsigned arithmetic with bit mask 33f379eee6 (make object_directory.loose_objects_subdir_seen a bitmap, 2021-07-07) replaced a wasteful 256-byte array with a 32-byte array and bit operations. The mask calculation shifts a literal 1 of type int left by anything between 0 and 31. UndefinedBehaviorSanitizer doesn't like that and reports: object-file.c:2477:18: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' Make sure to use an unsigned 1 instead to avoid the issue. Signed-off-by: René Scharfe <l.s.r@xxxxxx> --- object-file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/object-file.c b/object-file.c index 3d27dc8dea..a8be899481 100644 --- a/object-file.c +++ b/object-file.c @@ -2474,7 +2474,7 @@ struct oidtree *odb_loose_cache(struct object_directory *odb, struct strbuf buf = STRBUF_INIT; size_t word_bits = bitsizeof(odb->loose_objects_subdir_seen[0]); size_t word_index = subdir_nr / word_bits; - size_t mask = 1 << (subdir_nr % word_bits); + size_t mask = 1u << (subdir_nr % word_bits); uint32_t *bitmap; if (subdir_nr < 0 || -- 2.32.0