Re: [PATCH v4 0/3] Use default values from settings instead of config

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 12 Oct 2021 22:34:59 +0200

On Tue, Oct 12 2021, Glen Choo wrote:

> Hi everyone! This patch was created in response to something we observed in
> Google, where fsck failed to detect that the commit graph was invalid. We
> initially assumed that fsck never checked the commit graph, but it turns out
> that it does so only when core.commitgraph is set, even though we set defaults
> for "whether to use the commit graph" in the repository settings.
>
> Instead of using the config, let's use repository settings where
> available. Replace core.commitGraph and core.multiPackIndex with their
> equivalent repository settings in fsck and gc.
>
> This re-roll is primarily motivated by the CI failures noted by Junio in [1].
> The underlying cause is that GIT_TEST_COMMIT_GRAPH=1 (enabled in the linux-gcc
> job) causes commands like "git commit" to write commit-graphs, but certain tests
> like t/t0410-partial-clone.sh and t/t3800-mktag.sh intentionally create
> corruptions that cause commit-graphs to become out-of-sync/invalid. Patch 1
> fixes a bug where fsck did not check commit-graphs by default, which means that
> these tests will now fail because they have invalid commit-graphs. The easiest
> solution I found is to disable this confusing and noisy behavior with
> GIT_TEST_COMMIT_GRAPH=0.

That's easy, but...

> And since I am re-rolling, I incorporated Ævar's feedback regarding the commit
> messages (thanks!). I considered combining patches 1 and 2, but patch 1 has a
> grown a little to fix the CI issues, so I've decided to keep patches 1 and 2
> separate.
>
> This series has also seen a healthy amount of interest in test style and
> coverage. We haven't converged on a path for the future, but I'd like to think
> that the tests here are still a step in (approximately) the right direction.
> Hopefully this version LGT us while we figure out what to do with tests :)
>
> [1] https://lore.kernel.org/git/xmqqfstafyox.fsf@gitster.g/
>
> Changes since v3
> * Disable GIT_TEST_COMMIT_GRAPH in tests that intentionally corrupt things in a
>   way that is incompatible with commit-graphs.
> * Make patch 1 and 2's commit messages more concise (thanks Ævar!).

...how isn't disabling those t3800-mktag.sh tests just plasting over
corruption that we're noticing because of your changes to (rightly) fix
the bug where "fsck" wasn't checking the graph at all?

IOW haven't we just found exactly the sort of bug that
"GIT_TEST_COMMIT_GRAPH" is put in place to find for us, but now instead
of fixing it we're hiding it?

If I comment yout your addition of GIT_TEST_COMMIT_GRAPH=0 in that file
I see that we fail N number of tests, but all of them are actually
fallout of just this test:

        git replace $head_parent $head && 
        git replace -f $tree $blob 

I.e. we've created a replacement object replacing a tree with a blob, as
part of tests I added to test how mktag handles those sorts of weird
edge cases.

This then causes the graph verify code to throw a hissy fit with:

    root tree OID for commit 0ddfaf193ff13d6ab39b7cbd9eed645e3ee2f050 in
    commit-graph is da5497437fd67ca928333aab79c4b4b55036ea66 !=
    0fbca9850869684085d654f9e1380c9780802570

I.e. when we wrote the graph we somehow didn't notice that the root tree
node we wrote is to an object that's not actually a tree? Isn't this a
bug where some part of the commit graph writing should be doing its own
extended OID lookup that's replacement-object aware, it didn't, and we
wrote a corrupt graph as a result?

If there is a legitimate reason why we're not just hiding a bug we've
turned up with these fixes let's disable that one test, not the entire
test file.

If you don't run the one test that fails (which is split up into 3
individual pieces) there's still 143 other tests that are run, all of
those presumably benefit from finding future bugs with
GIT_TEST_COMMIT_GRAPH=true, particularly since the test file seems to
just have turned up one just now...