Re: [PATCH 2/3] t1405: mark test that checks existence as REFFILES

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Thu, 3 Feb 2022 19:10:43 +0100

On Thu, Feb 3, 2022 at 6:53 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> >> Yes.  It is exactly the point of the question I asked.  If it is
> >> simple and easy to add such a new type that is ignored/skipped by
> >> existing clients, then we can go that route.  If it is simple and
> >> easy to add a new bit per ref that existing clients would not barf,
> >> we can use that as an alternative implementation strategy.
> >
> > I'm not sure that there are any JGit clients: I committed reftable
> > support at the end of 2019. Before that time, we were running it
> > internally at Google, but only ref storage, and without the posix
> > part. Reflogs were never stored in refable, and I actually found a
> > couple of bugs in Shawn's Java code.
> >
> > Gerrit has increasingly started using Git as a database, and the
> > packed/loose system is just not a very good database, so that
> > motivates the work reftable in general. But the folks who run Gerrit
> > on a POSIX filesystem want to be sure that isn't a fringe feature, so
> > they only want to start using it once Git itself supports it. So there
> > is a chicken & egg problem.
> >
> > It's sad that we have to introduce an existence bit to make things
> > work, but overall it is probably easier for me to do than trying to
> > make sense of sequencer.c and how it uses refs/stash@{0}.
> >
> > Technically, the only obstacle I see is that we'd need to treat an
> > existence entry especially for the purpose of compaction/gc: we can
> > discard older entries, but we shouldn't discard the existence bit, no
> > matter how old it is.
>
> Ah, that's very informative. I had been assuming (or misremembered) that
> reftable was already seeing production use at Google. Perhaps I
> remembering the now-dead Google Code (or whatever it was called). Maybe
> not.

We use the format (the JGit code) at Google, but we only use it to
store refs, that is, the refname => {SHA1, symref, tag} mapping. We
currently don't store reflog data in reftable, and the bugs I found
were just in the reflog parts.

We store the tables in bigtable (among others), so the part that does
the POSIX file locking is new (basically, everything in
reftable/stack.c and its equivalent in JGit).

So we are locked into the format to some degree.

For the existence bit, I think I could simply record a $zeroid =>
$zeroid ref update in ref log and treat that specially.

> In any case, not being locked into the format as specified is very
> nice. So is it basically seeing no (production) use anywhere as far as
> you know? Whether that's in production at Google, or some third parties
> via JGit-something (maybe as editor libraries?).

I think our friends at Gerritforge have been experimenting with it,
but not in a production setting, AFAIK. Luca might confirm.

> Taking a bit of a step back.
>
> I do think that generally speaking parts of this series are putting the
> cart before the horse in seemingly trying to get the test suite clean
> before we have the integration in-tree.
>
> Not everything you have here, but some of it.
>
> I know I'm the one who started encouraging you to work towards getting
> the test mode passing, but I think that while it's good to mark some
> obviously file-only tests beforehand, anything where we have different
> behavior under reftable should really come after.

(I can't parse your last sentence)

> Of course that will mean we'll have some interim period where our test
> suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think

It's actually not that bad. By my last count, there were 38 files with
test failures.

> that's fine, as long as we work towards getting it passing, and as long
> as the non-stability of the nascent backend is very prominently
> advertised in the interim.
>
> I.e. I think *the* issue with the original series you had in this regard
> was that git-init.txt (or whatever it was) basically just discussed
> enabling reftable matter-of-factly, when we were still failing
> tens/hundreds of tests, which is just setting up a big bear trap for
> users to step into.

I read that comment, and I removed that long ago. Right now the only
way to get a reftable is to say GIT_TEST_REFTABLE=1 on init.

> But if we just changed those docs a bit to note "!!WARNING WARNING!!
> EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could
> merge the API integration parts sooner than later, even with a lot of
> known-broken tests.
>
> We could then whitelist the broken parts, and work on narrowing that set
> down. Similar what the SANITIZE=leak mode is currently doing for memory
> leaks.
>
> I think that would make things a lot easier when reviewing submissions
> like these, in that we have reftable/* in-tree already, but with the
> "real" integration we could check how files/reftable backends behave,
> add the diverging behavior to tests etc.
>
> What do you think?

Sounds more fun than the current model :-)

The latest version of the code is here:
https://github.com/hanwen/git/tree/merged-seen-20220117

What do you need besides the "RFC: reftable backend" commit?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado