Re: [PATCH 2/3] t1405: mark test that checks existence as REFFILES

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 03 Feb 2022 18:39:29 +0100

On Thu, Feb 03 2022, Han-Wen Nienhuys wrote:

> On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>
>> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>>
>> > We could surely add magic record types, but how would such a dance be
>> > performed while keeping compatibility with existing JGit clients?
>>
>> Yes.  It is exactly the point of the question I asked.  If it is
>> simple and easy to add such a new type that is ignored/skipped by
>> existing clients, then we can go that route.  If it is simple and
>> easy to add a new bit per ref that existing clients would not barf,
>> we can use that as an alternative implementation strategy.
>
> I'm not sure that there are any JGit clients: I committed reftable
> support at the end of 2019. Before that time, we were running it
> internally at Google, but only ref storage, and without the posix
> part. Reflogs were never stored in refable, and I actually found a
> couple of bugs in Shawn's Java code.
>
> Gerrit has increasingly started using Git as a database, and the
> packed/loose system is just not a very good database, so that
> motivates the work reftable in general. But the folks who run Gerrit
> on a POSIX filesystem want to be sure that isn't a fringe feature, so
> they only want to start using it once Git itself supports it. So there
> is a chicken & egg problem.
>
> It's sad that we have to introduce an existence bit to make things
> work, but overall it is probably easier for me to do than trying to
> make sense of sequencer.c and how it uses refs/stash@{0}.
>
> Technically, the only obstacle I see is that we'd need to treat an
> existence entry especially for the purpose of compaction/gc: we can
> discard older entries, but we shouldn't discard the existence bit, no
> matter how old it is.

Ah, that's very informative. I had been assuming (or misremembered) that
reftable was already seeing production use at Google. Perhaps I
remembering the now-dead Google Code (or whatever it was called). Maybe
not.

In any case, not being locked into the format as specified is very
nice. So is it basically seeing no (production) use anywhere as far as
you know? Whether that's in production at Google, or some third parties
via JGit-something (maybe as editor libraries?).

Taking a bit of a step back.

I do think that generally speaking parts of this series are putting the
cart before the horse in seemingly trying to get the test suite clean
before we have the integration in-tree.

Not everything you have here, but some of it.

I know I'm the one who started encouraging you to work towards getting
the test mode passing, but I think that while it's good to mark some
obviously file-only tests beforehand, anything where we have different
behavior under reftable should really come after.

Because then we can positively assert what we do differently, not just
skip an existing test.

And yes, for many tests that will require rewriting their setup, because
they conflate things that are backend-independent, such as the general
question of "can we ask about reflog existence?" with the implementation
detail of the test setup, which oftentimes is file-backend specific.

Of course that will mean we'll have some interim period where our test
suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think
that's fine, as long as we work towards getting it passing, and as long
as the non-stability of the nascent backend is very prominently
advertised in the interim.

I.e. I think *the* issue with the original series you had in this regard
was that git-init.txt (or whatever it was) basically just discussed
enabling reftable matter-of-factly, when we were still failing
tens/hundreds of tests, which is just setting up a big bear trap for
users to step into.

But if we just changed those docs a bit to note "!!WARNING WARNING!!
EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could
merge the API integration parts sooner than later, even with a lot of
known-broken tests.

We could then whitelist the broken parts, and work on narrowing that set
down. Similar what the SANITIZE=leak mode is currently doing for memory
leaks.

I think that would make things a lot easier when reviewing submissions
like these, in that we have reftable/* in-tree already, but with the
"real" integration we could check how files/reftable backends behave,
add the diverging behavior to tests etc.

What do you think?