On Mon, Aug 23, 2021 at 6:14 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 8/16/2021 1:48 PM, Derrick Stolee via GitGitGadget wrote: > > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > > > Copy the 'index_state->dir_hash' back to the real istate after expanding > > a sparse index. > > > > A crash was observed in 'git status' during some hashmap lookups with > > corrupted hashmap entries. During an index expansion, new cache-entries > > are added to the 'index_state->name_hash' and the 'dir_hash' in a > > temporary 'index_state' variable 'full'. However, only the 'name_hash' > > hashmap from this temp variable was copied back into the real 'istate' > > variable. The original copy of the 'dir_hash' was incorrectly > > preserved. If the table in the 'full->dir_hash' hashmap were realloced, > > the stale version (in 'istate') would be corrupted. > > > > The test suite does not operate on index sizes sufficiently large to > > trigger this reallocation, so they do not cover this behavior. > > Increasing the test suite to cover such scale is fragile and likely > > wasteful. > > > > Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > --- > > sparse-index: copy dir_hash in ensure_full_index() > > > > This fix is an issue we discovered in our first experimental release of > > the sparse index in the microsoft/git fork. We fixed it in the latest > > experimental release [1] and then I almost forgot about it until we > > started rebasing sparse-index work on top of the 2.33.0 release > > candidates. > > > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > > > This is a change that can be taken anywhere since 4300f8 (sparse-index: > > implement ensure_full_index(), 2021-03-30), but this version is based on > > v2.33.0-rc2. > > I sent this patch on the day of v2.33.0, so I'm not surprised that it > got lost in the shuffle. Could someone please take a look? > > It also has not been picked up for the What's Cooking email. Explanation and the code make sense to me. This function uses some pretty low-level knowledge of how index_state is structured to do its job, and if index_state is updated in some significant way (which isn't likely), then someone would have to update this function. I was trying to think whether there was a way to future proof this function against such possibilities to ensure the need to update it is noticed, but I'm not sure there is one. But index_state doesn't seem particularly likely to change, so this is probably fine. So, here's my ack.