On Thu, Feb 03 2022, Han-Wen Nienhuys wrote: > On Tue, Feb 1, 2022 at 11:12 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: >> >> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: >> >> > We could surely add magic record types, but how would such a dance be >> > performed while keeping compatibility with existing JGit clients? >> >> Yes. It is exactly the point of the question I asked. If it is >> simple and easy to add such a new type that is ignored/skipped by >> existing clients, then we can go that route. If it is simple and >> easy to add a new bit per ref that existing clients would not barf, >> we can use that as an alternative implementation strategy. > > I'm not sure that there are any JGit clients: I committed reftable > support at the end of 2019. Before that time, we were running it > internally at Google, but only ref storage, and without the posix > part. Reflogs were never stored in refable, and I actually found a > couple of bugs in Shawn's Java code. > > Gerrit has increasingly started using Git as a database, and the > packed/loose system is just not a very good database, so that > motivates the work reftable in general. But the folks who run Gerrit > on a POSIX filesystem want to be sure that isn't a fringe feature, so > they only want to start using it once Git itself supports it. So there > is a chicken & egg problem. > > It's sad that we have to introduce an existence bit to make things > work, but overall it is probably easier for me to do than trying to > make sense of sequencer.c and how it uses refs/stash@{0}. > > Technically, the only obstacle I see is that we'd need to treat an > existence entry especially for the purpose of compaction/gc: we can > discard older entries, but we shouldn't discard the existence bit, no > matter how old it is. Ah, that's very informative. I had been assuming (or misremembered) that reftable was already seeing production use at Google. Perhaps I remembering the now-dead Google Code (or whatever it was called). Maybe not. In any case, not being locked into the format as specified is very nice. So is it basically seeing no (production) use anywhere as far as you know? Whether that's in production at Google, or some third parties via JGit-something (maybe as editor libraries?). Taking a bit of a step back. I do think that generally speaking parts of this series are putting the cart before the horse in seemingly trying to get the test suite clean before we have the integration in-tree. Not everything you have here, but some of it. I know I'm the one who started encouraging you to work towards getting the test mode passing, but I think that while it's good to mark some obviously file-only tests beforehand, anything where we have different behavior under reftable should really come after. Because then we can positively assert what we do differently, not just skip an existing test. And yes, for many tests that will require rewriting their setup, because they conflate things that are backend-independent, such as the general question of "can we ask about reflog existence?" with the implementation detail of the test setup, which oftentimes is file-backend specific. Of course that will mean we'll have some interim period where our test suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think that's fine, as long as we work towards getting it passing, and as long as the non-stability of the nascent backend is very prominently advertised in the interim. I.e. I think *the* issue with the original series you had in this regard was that git-init.txt (or whatever it was) basically just discussed enabling reftable matter-of-factly, when we were still failing tens/hundreds of tests, which is just setting up a big bear trap for users to step into. But if we just changed those docs a bit to note "!!WARNING WARNING!! EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could merge the API integration parts sooner than later, even with a lot of known-broken tests. We could then whitelist the broken parts, and work on narrowing that set down. Similar what the SANITIZE=leak mode is currently doing for memory leaks. I think that would make things a lot easier when reviewing submissions like these, in that we have reftable/* in-tree already, but with the "real" integration we could check how files/reftable backends behave, add the diverging behavior to tests etc. What do you think?