Jeff King wrote: > On Fri, Sep 23, 2022 at 07:43:55PM +0000, Victoria Dye via GitGitGadget wrote: >> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, >> size_t len; >> const char *name; >> const unsigned hashsz = the_hash_algo->rawsz; >> - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz); >> + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz; > > Now we use the "const char *" pointer instead of the cast to the > ondisk_cache_entry struct, which is good, and is what fixes the > alignment question. > > But we also convert flagsp from being a uint16_t into a byte pointer. > I'm not sure if that's strictly necessary from an alignment perspective, > as we'd dereference it only via get_be16(), which handles alignment and > type conversion itself. > > I'd imagine the standard probably says that even forming such a pointer > is illegal, so in that sense, it probably is undefined behavior. But I > think it's one of those things that's OK in practice. Yep, per the C standard §6.3.2.3 #7 [1]: A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. To your point, it is probably fine in practice, but I'd lean towards sticking with a 'char *' to play it safe. [1] https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf >> @@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, >> >> ce = mem_pool__ce_alloc(ce_mem_pool, len); >> >> - ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec); >> [...] >> + ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime) >> + + offsetof(struct cache_time, sec)); > > I had figured we'd be able to drop ondisk_cache_entry entirely. But here > you're using it essentially as a template for a set of constants > retrieved via offsetof(). > > That's OK from an alignment perspective. It does mean we'd be in trouble > if a compiler ever decided to introduce padding into the struct. That's > probably unlikely. We don't use __attribute__((packed)) because it's not > portable, and our existing uses have generally been OK, because our > data structures are organized around 8-byte alignment. We might have > problems on a theoretical 128-bit processor or something. In addition to portability, using '__attribute__((packed))' could hurt performance (and, in a large index, that might have a noticeable effect). As for dropping 'ondisk_cache_entry()', I didn't want to drop it only from the "read" operation (and use something like the "parse left-to-right" strategy below) while leaving it in "write." And, as you mentioned later, changing 'ce_write_entry()' is a lot more invasive than what's already in this patch and possibly out-of-scope. > Another strategy is to just parse left-to-right, advancing the byte > pointer. Like: > > ce->ce_state_data.sd_ctime.sec = get_be32(ondisk); > ondisk += sizeof(uint32_t); > ce->ce_state_data.sd_mtime.sec = get_be32(ondisk); > ondisk += sizeof(uint32_t); > ...etc... > > You can even stick that in a helper function that does the get_b32() and > advances, so you know they're always done in sync. See pack-bitmap.c's > read_be32(), etc. IMHO this produces a nice result because the reading > code itself becomes the source of truth for the format. > ... > One final note, though: > >> + ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime) >> + + offsetof(struct cache_time, sec)); > > Here (and elsewhere), you can assume that the offsetof() "sec" in > cache_time is 0, for two reasons: > > - I didn't look up chapter and verse, but I'm pretty sure the standard > does guarantee that the first field of a struct is at the beginning. > > - If there's any padding, this whole scheme is hosed anyway, because > it means sizeof(cache_time) is bigger than we expect, which messes > up the offsetof() the entry after us (in this case sd_dev). > > So this can just be: > > ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)); > > which is mercifully shorter. > > Assuming we dismiss the rest of what I said as not worth it for a > minimal fix, I do think that simplification is worth rolling a v2. That makes sense from a technical perspective, but I included the starting entry offset for readability reasons. It might be confusing to someone unfamiliar with C struct memory alignment to see every other 'get_be32' refer to the exact entry it's reading via the 'offsetof()', but have that information absent only for a few entries. And, the double 'offsetof()' would still be used by the 'mtime.nsec'/'ctime.nsec' fields anyway. In any case, if this patch is intended to be a short-lived change on the way to a more complete refactor and/or I'm being overzealous on the readability, I'd be happy to change it. :) Thanks! > > -Peff > > PS BTW, I mentioned earlier "can we just get rid of ondisk_cache_entry". > We also use it for the writing side, of course. That doesn't have > alignment issues, but it does have the same "I hope there's never any > padding" question. In an ideal world, it would be using the > equivalent put_be32(), but again, that's getting out of the "minimal > fix" territory.