Re: [PATCH 1/3] sparse-index: sparse index is disallowed when split index is active

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sat, 22 Jan 2022 10:30:55 +0100 (CET)

Hi Junio,

On Fri, 21 Jan 2022, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@xxxxxxxxx>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@xxxxxx>
> >
> > In 6e773527b6b (sparse-index: convert from full to sparse, 2021-03-30),
> > we introduced initial support for a sparse index, and were careful to
> > avoid converting to a sparse index in the presence of a split index.
> >
> > However, when we _just_ read a freshly-initialized index, it might not
> > contain a split index even if _writing_ it will add one by virtue of
> > being asked for via the `GIT_TEST_SPLIT_INDEX` variable.
> >
> > We did not notice any problems with checking _only_ for `split_index`
> > (and not `GIT_TEST_SPLIT_INDEX`) right until both
> > `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged.
> >
> > Those two topics' interplay triggers a bug in conjunction with running
> > t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way:
> > `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse
> > right after reading, and `vd/sparse-reset` ensures that the index is
> > made non-sparse again unless running in the `--soft` mode. Since the
> > split index feature is incompatible with the sparse index feature, we
> > see a symptom like this:
> >
> > 	fatal: position for replacement 4 exceeds base index size 4
> >
> > Let's fix this by avoiding the conversion to a sparse index when
> > `GIT_TEST_SPLIT_INDEX=true`.
>
> Does [2/3] allow you to sidestep that issue entirely by skipping
> 1091 altogether?

You are right that reverting this patch after applying the next patch
_still_ works around the issue. That's because currently only t1091
exposes the bug where `GIT_TEST_SPLIT_INDEX` writes a split index that was
initially non-split (because it was just created).

My intention here was to avoid any problems in case _another_ test script
triggers the same condition. After all, we do have an entire CI job that
forces `GIT_TEST_SPLIT_INDEX=1`, and in microsoft/git we enable sparse
index by default. The entire test suite is therefore surface for the bug
to show.

> There are 4 hits of "if (istate->split_index" in the codebase, and
> this patch makes me wonder why it is suffice to patch only one of
> them.

I tried to catch only those locations where we run the danger of trying to
make a split index also sparse, or a sparse index also split.

> I also wondered why we test both split_index and environment
> separately, instead of splitting the index very early when the
> environment variable is set, so that the rest of the runtime does
> not have to worry about the environment, but is the reason why such
> an approach was not taken was because GIT_TEST_SPLIT_INDEX can later
> allow the index to be splitted, even if istate->split_index is still
> NULL right now when this function is called?

Indeed. A freshly created index will never be split, even if
`GIT_TEST_SPLIT_INDEX` is set. Only when we _write_ this to disk will it
be turned into a split index. At this point, we might have mistakenly
converted the index into a sparse one, though, and this here patch wants
to prevent that from happening.

> If that is the reason, it leads to another question.  Even if we
> ignore GIT_TEST_SPLIT_INDEX and concentrate only on real workload,
> if the in-core index can be NULL when this function is called and
> then later can become split, there still must be somebody who
> notices that sparse index is unallowed when (or after) such a split
> happens, no?  If there is no such code, then this patch would not be
> the whole fix and there needs more change to do so, no?

Indeed. Hence 3/3 which tries specifically to prevent the user from
turning an already-sparse index into a split index. You will note that
3/3 calls die() instead BUG() in such a case because it would constitute a
pilot error, not a bug in Git's code.

Ciao,
Dscho