On Thu, Feb 3, 2022 at 6:53 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > >> Yes. It is exactly the point of the question I asked. If it is > >> simple and easy to add such a new type that is ignored/skipped by > >> existing clients, then we can go that route. If it is simple and > >> easy to add a new bit per ref that existing clients would not barf, > >> we can use that as an alternative implementation strategy. > > > > I'm not sure that there are any JGit clients: I committed reftable > > support at the end of 2019. Before that time, we were running it > > internally at Google, but only ref storage, and without the posix > > part. Reflogs were never stored in refable, and I actually found a > > couple of bugs in Shawn's Java code. > > > > Gerrit has increasingly started using Git as a database, and the > > packed/loose system is just not a very good database, so that > > motivates the work reftable in general. But the folks who run Gerrit > > on a POSIX filesystem want to be sure that isn't a fringe feature, so > > they only want to start using it once Git itself supports it. So there > > is a chicken & egg problem. > > > > It's sad that we have to introduce an existence bit to make things > > work, but overall it is probably easier for me to do than trying to > > make sense of sequencer.c and how it uses refs/stash@{0}. > > > > Technically, the only obstacle I see is that we'd need to treat an > > existence entry especially for the purpose of compaction/gc: we can > > discard older entries, but we shouldn't discard the existence bit, no > > matter how old it is. > > Ah, that's very informative. I had been assuming (or misremembered) that > reftable was already seeing production use at Google. Perhaps I > remembering the now-dead Google Code (or whatever it was called). Maybe > not. We use the format (the JGit code) at Google, but we only use it to store refs, that is, the refname => {SHA1, symref, tag} mapping. We currently don't store reflog data in reftable, and the bugs I found were just in the reflog parts. We store the tables in bigtable (among others), so the part that does the POSIX file locking is new (basically, everything in reftable/stack.c and its equivalent in JGit). So we are locked into the format to some degree. For the existence bit, I think I could simply record a $zeroid => $zeroid ref update in ref log and treat that specially. > In any case, not being locked into the format as specified is very > nice. So is it basically seeing no (production) use anywhere as far as > you know? Whether that's in production at Google, or some third parties > via JGit-something (maybe as editor libraries?). I think our friends at Gerritforge have been experimenting with it, but not in a production setting, AFAIK. Luca might confirm. > Taking a bit of a step back. > > I do think that generally speaking parts of this series are putting the > cart before the horse in seemingly trying to get the test suite clean > before we have the integration in-tree. > > Not everything you have here, but some of it. > > I know I'm the one who started encouraging you to work towards getting > the test mode passing, but I think that while it's good to mark some > obviously file-only tests beforehand, anything where we have different > behavior under reftable should really come after. (I can't parse your last sentence) > Of course that will mean we'll have some interim period where our test > suite is a dumpster fire under GIT_TEST_REFTABLE=true, and I think It's actually not that bad. By my last count, there were 38 files with test failures. > that's fine, as long as we work towards getting it passing, and as long > as the non-stability of the nascent backend is very prominently > advertised in the interim. > > I.e. I think *the* issue with the original series you had in this regard > was that git-init.txt (or whatever it was) basically just discussed > enabling reftable matter-of-factly, when we were still failing > tens/hundreds of tests, which is just setting up a big bear trap for > users to step into. I read that comment, and I removed that long ago. Right now the only way to get a reftable is to say GIT_TEST_REFTABLE=1 on init. > But if we just changed those docs a bit to note "!!WARNING WARNING!! > EXPERIMENTAL AND UNSTABLE !!WARNING WARNING!!" or whatever we could > merge the API integration parts sooner than later, even with a lot of > known-broken tests. > > We could then whitelist the broken parts, and work on narrowing that set > down. Similar what the SANITIZE=leak mode is currently doing for memory > leaks. > > I think that would make things a lot easier when reviewing submissions > like these, in that we have reftable/* in-tree already, but with the > "real" integration we could check how files/reftable backends behave, > add the diverging behavior to tests etc. > > What do you think? Sounds more fun than the current model :-) The latest version of the code is here: https://github.com/hanwen/git/tree/merged-seen-20220117 What do you need besides the "RFC: reftable backend" commit? -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Paul Manicle, Halimah DeLaine Prado