On 8/23/2021 2:18 PM, Jeff Hostetler wrote: > On 8/23/21 10:25 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Mon, Aug 16 2021, Derrick Stolee via GitGitGadget wrote: >> >>> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> >>> >>> Copy the 'index_state->dir_hash' back to the real istate after expanding >>> a sparse index. >>> >>> A crash was observed in 'git status' during some hashmap lookups with >>> corrupted hashmap entries. During an index expansion, new cache-entries >>> are added to the 'index_state->name_hash' and the 'dir_hash' in a >>> temporary 'index_state' variable 'full'. However, only the 'name_hash' >>> hashmap from this temp variable was copied back into the real 'istate' >>> variable. The original copy of the 'dir_hash' was incorrectly >>> preserved. If the table in the 'full->dir_hash' hashmap were realloced, >>> the stale version (in 'istate') would be corrupted. >>> >>> The test suite does not operate on index sizes sufficiently large to >>> trigger this reallocation, so they do not cover this behavior. >>> Increasing the test suite to cover such scale is fragile and likely >>> wasteful. >> >> How large does the index need to be to trigger this? I don't know if a >> test here is useful, but FWIW if we had such a test then the EXPENSIVE >> prereq + GIT_TEST_LONG=true might be a good fit for it. >> > > There would need to be enough directories in the repo to cause > the dir_hash to grow during an insert into the dir_hash. > IIRC the hashmap table is initialized to 64 and auto reallocs when > it hits 80% capacity, so somewhere in the area of 50 directories > at minimum. Further complicating this is that the sparse index size matters relative to the full index size, since the dir_hash has a given size going into ensure_full_index() that depends on the sparse-index size and the reallocation patterns. > Whether the error is observed would also depend on free() trashing > the contents of the memory and/or whether the memory was recycled > by something else. Issues like this cause this to be hard to reproduce. We did not catch it in the performance test p2000-sparse-operations.sh, which aims to measure how the index works at scale. This makes me incredibly pessimistic that such a test will be effective at preventing a regression, let alone catching similar issues in the future. Thanks, -Stolee