Re: [PATCH v6 18/20] Reftable support for git-core

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 16 Apr 2021 16:55:40 +0200

On Wed, Apr 14 2021, Han-Wen Nienhuys wrote:

> On Tue, Apr 13, 2021 at 9:18 AM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
> [...]
>> I understand that getting this series to land has been a pain, but this
>> sort of thing doesn't inspire confidence.
>>
>> Manually running it under reftable anyway I have 17 out of 81 tests
>> failing. A lot of tests won't run under reftable because they're of the
>> form:
>>
>>         test_when_finished "remove_object $tag" &&
>>         echo $tag >.git/refs/tags/wrong &&
>>         test_when_finished "git update-ref -d refs/tags/wrong" &&
>>
>> I.e. we have some hack to get around update-ref guarding things for us.
>>
>> I would think that a better approach here would be to start with some
>> (per-se unrelated) series to teach update-ref some mode like
>> hash-object's --literally, i.e. "YOLO this ref update".
>
> I disagree.  I think this would be a job better suited to a
> test-helper. Then we don't put tools into users' hands that
> potentially corrupt the repository. I don't understand why hash-object
> --literally is not a test helper either.

I don't feel too strongly either way, and in any case that's a
discussion for any such mass test update series, and separate from the
reftable series per-se.

I do lean on the side of shipping a thing like that as part of our
plumbing, of course with very prominent warnings, just as we do
--literally.

It is useful for not only our test suite, but also for others to
reproduce issues locally with the ref and object store going out of
sync.

Until now there hasn't been much of a need in practice, the file backend
format is trivially understood and you can just echo a bad value to a
file.

But if I or anyone else encounters some Heisenbug in the wild where we
suspect the ref and object store went out of syncy, and we're using
reftable, it would be very handy to test that with an "update-ref
--literally", instead of having to locally recompile git, or otherwise
munge the opaque reftable format.

>> Then adjust our various tests that move, clobber, or intentionally
>> corrupt things in the refstore to use that helper/option.
>>
>> At that point we can actually run things like t1450-fsck.sh, maybe we
>> simply have to skip some of them because we think the sort of corruption
>> we're testing/worried about is impossible with reftable, but that in
>> itself is *very* interesting.
>
> One reason for skipping some files wholesale, is that quite a few
> tests are not isolated in their test functions, so skipping an
> individual test functions affects follow-on tests. I can't remember if
> t1450 is one of them, though.

[...]

>> I.e. do we not have to handle certain edge cases at all in fsck and
>> friends because of inherent properties of the reftable backend (e.g. I'd
>> assume, but don't know, that we're not likely to get one corrupt ref
>> entry with it), but of course we can have a ref pointing to a loose
>> object that then gets removed, and is thus corrupt.
>>
>> Anyway. I'm speculating, but that's also my point, that I'm having to
>> speculate. That comes down to us having a new GIT_TEST_* mode that
>> doesn't pass the test suite._
>
> The speculation is warranted, but there are currently about 80 files
> that have some sort of test failure, and some of these are for quite
> common workflows. I would rather focus on those before looking at
> tests like fsck. Functional failures may have far reaching
> implications on the code structure (example: worktrees considerably
> complicated both the reftable library and its git-core glue), while
> corruption issues if they occur, will be due to localized bugs.

I'm not sure from reading this if/where we disagree, so let's enumerate
and see where we land point-by-point:

 1. The test suite must pass 100% for any new GIT_TEST_X mode as it
    lands.

 2. It must do so in a meaningful way (i.e. not skipping entire files,
    if some/many of the tests are relevant / stress the codepath
    involved)

 3. #2 should be there as the series lands.

My comments on that:

 1: I feel strongly that #1 is an absolute prerequisite for integrating
    this or any other major core surgery.

    E.g. now we can't do any meaningful cross-platform tests for this to
    see how it breaks on our various supported platforms.

    We're also in V6 of this and apparently finding an easily
    discoverable segfault now, if v7 of this had a GIT_TEST_REFTABLE in
    the relevant ci/* code ...

    I realize that that's hard, e.g. we had to do large refactoring for
    all of the commit-graph, protocol v2, sha256 etc. modes, but it
    really pays off. E.g. for protocol v2 upload-pack not paying
    attention to hideRefs was discovered that way.

 2-3: I do think we should have this too, but would be willing to be
      convinced otherwise. E.g. we have a "gcov" target, once we have #1
      passing do we se that under GIT_TEST_REFTABLE=1 we have the same
      (or trivially explainable) test coverage, or are we simply
      skipping a large part of the fsck etc. code (some/much of which
      will have to do with refs).

      Even more than test coverage I think this is interesting to see
      what we *don't* cover, as I noted upthread. I.e. can we skip large
      parts of some sanity checking "if (reftable)"? I suspect so, but
      without the coverage we won't be able to tell.

I realize the above would be a lot of work, probably a >50 patch series
to t/* if previous major GIT_TEST_* refactorings are any indication.

But if we agree that's a viable way forward that's not all work that
you'd have to do.