Re: [PATCH 00/11] t: reduce direct disk access to data structures

Patrick Steinhardt <ps@xxxxxx> · Mon, 23 Oct 2023 15:58:12 +0200

On Thu, Oct 19, 2023 at 12:13:12PM +0200, Han-Wen Nienhuys wrote:
> On Wed, Oct 18, 2023 at 5:32 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > > this patch series refactors a bunch of our tests to perform less direct
> > > disk access to on-disk data structures. Instead, the tests are converted
> > > to use Git tools or our test-tool to access data to the best extent
> > > possible. This serves two benefits:
> >
> > Laudable goal.
> >
> > >     - We increase test coverage of our own code base.
> >
> > Meaning the new code added to test-tool for this series will also
> > get tested and bugs spotted?

For now all the helpers of the test-tool only have implicit test
coverage, but I get your point. If we decide to instead transform this
test tool into production-level code as you suggested (e.g. `git
rev-parse --exists`) then this becomes less of a discussion point as we
would of course have proper test coverage for it.

> > >     - We become less dependent on the actual on-disk format.
> >
> > Yes, this is very desirable.  Without looking at the implementation,
> > I see some issues aiming for this goal may involve. [a] using the
> > production code for validation would mean our expectation to be
> > compared to the reality to be validated can be affected by the same
> > bug, making two wrongs to appear right; [b] using a separate
> > implementation used only for validation would at least mean we will
> > have to make the same mistake in unique part of both implementations
> > that is less likely to miss bugs compared to [a], but bugs in shared
> > part of the production code and validation code will be hidden the
> > same way as [a].
> 
> I think it would be really great if there were separate unittests for
> the ref backend API. Some of the reftable work was needlessly
> difficult because the contract of the API was underspecified. The API
> is well compartmentalized in refs-internal.h, and a lot of the API
> behavior can be tested as a black box, eg.
> 
> * setup symref HEAD pointing to R1
> * setup transaction updating ref R1 from C1 to C2
> * commit transaction, check that it succeeds
> * read ref R1, check if it is C2
> * read reflog for R1, see that it has a C1 => C2 update
> * read reflog for HEAD, see that it has a C1 => C2 update
> 
> Tests for the loose/packed backend could directly mess with the
> on-disk files to test failure scenarios.
> 
> With unittests like that, the tests can zoom in on the functionality
> of the ref backend, and provide more convenient coverage for
> dynamic/static analysis.

Agreed. Ideally, I think our tests should be split up into two layers:

    1. Low-level tests which are backend specific. These allow us to
       assert the on-disk state of the respective backend thoroughly and
       also allow us to explicitly test for edge cases that are only of
       relevance to this specific backend.

       The reftable backend in its current (non-upstream) already has
       such tests, but we don't really have explicit tests for the files
       backend. This is a gap that should ideally be filled at some
       point in time.

    2. Higher-level tests should then be allowed to assume that the
       underlying logic works as expected. These tests are thus free to
       use plumbing tools that tie into the reference backends.

Patrick
Attachment:
signature.asc

Description: PGP signature