Jeff King wrote: > On Thu, Sep 22, 2022 at 08:35:22AM -0400, Derrick Stolee wrote: > >>> I didn't dig further. It's obviously new in v2.38.0-rc1, but I'm not >>> sure it's a show-stopper. It _might_ have been there all along, and is >>> just now surfacing. Or it might be in an existing experimental feature >>> that just wasn't exercised enough in the tests. Or if it really is new >>> in scalar, then it will only hurt people using scalar, which didn't >>> exist before. So I don't think it's a regression in the strictest sense, >>> but it might be nice to get a more accurate understanding of the problem >>> before the release. >> >> Interesting find! >> >> Here are the index-related settings that Scalar sets as of -rc1: >> >> * core.preloadIndex=true >> * index.threads=true >> * index.version=4 >> >> My gut feeling is that index.version=4 might be the culprit. I thought >> we tested GIT_TEST_INDEX_VERSION=4 in some CI modes, but apparently we >> do not. Do you get the same error in other tests with that environment >> variable? > > Yeah, that seems by far the most likely of those three. And indeed, > running with GIT_TEST_INDEX_VERSION=4 causes even t0000 to fail with the > same problem. A minimal reproduction in git.git is just: > > make SANITIZE=undefined > git clone . tmp > cd tmp > rm .git/index > export GIT_INDEX_VERSION=4 > ../git reset --hard ;# ok, writes v4 index > ../git reset --hard ;# fails reading unaligned v4 index > > So it seems like a problem with the v4 format in general. Which...yikes. Other than allowing us to use a (non-packed) 'struct ondisk_cache_entry' to parse the index entries, is there any reason why the on-disk index entries should (or need to be) 4-byte aligned? If not, we could update how we read the 'ondisk' index entry in 'create_from_disk()' to avoid the misalignment. E.g.: ------------------8<------------------8<------------------8<------------------ diff --git a/read-cache.c b/read-cache.c index b09128b188..f132a3f256 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate, static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, unsigned int version, - struct ondisk_cache_entry *ondisk, + const char *ondisk, unsigned long *ent_size, const struct cache_entry *previous_ce) { @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, size_t len; const char *name; const unsigned hashsz = the_hash_algo->rawsz; - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz); + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz; unsigned int flags; size_t copy_len = 0; /* ------------------>8------------------>8------------------>8------------------ the do the same sort of conversion with the rest of the function. It's certainly uglier than just using the 'struct ondisk_index_entry *' pointer, but it should avoid the misaligned addressing error. > > -Peff