Re: Implementing reftable in Git

David Turner <novalis@xxxxxxxxxxx> · Fri, 11 May 2018 18:21:34 -0400

On Fri, 2018-05-11 at 11:31 +0200, Michael Haggerty wrote:
> On Wed, May 9, 2018 at 4:33 PM, Christian Couder
> <christian.couder@xxxxxxxxx> wrote:
> > I might start working on implementing reftable in Git soon.
> > [...]
> 
> Nice. It'll be great to have a reftable implementation in git core
> (and ideally libgit2, as well). It seems to me that it could someday
> become the new default reference storage method. The file format is
> considerably more complicated than the current loose/packed scheme,
> which is definitely a disadvantage (for example, for other Git
> implementations). But implementing it *with good performance and
> without races* might be no more complicated than the current scheme.

I am somewhat concerned about perf, because as I recall, we have a
bunch of code which effectively load all refs, which will be more
expensive with reftable than packed-refs (though maybe cheaper than
loose refs).  But maybe we have eliminated this code or can work around
it.

> Testing will be important. There are already many tests specifically
> about testing loose/packed reference storage. These will always have
> to run against repositories that are forced to use that reference
> scheme. And there will need to be new tests specifically about the
> reftable scheme. Both classes of tests should be run every time. That
> much is pretty obvious.
> 
> But currently, there are a lot of tests that assume the loose/packed
> reference format on disk even though the tests are not really related
> to references at all. ISTM that these should be converted to work at
> a
> higher level, for example using `for-each-ref`, `rev-parse`, etc. to
> examine references rather than reading reference files directly. That
> way the tests should run correctly regardless of which scheme is in
> use.

I agree with that, and I think some of my patches from years ago
attempted to do that.  I probably should have broken those out into a
separate series so that they could have been applied separately.

> And since it's too expensive to run the whole test suite with both
> reference storage schemes, it seems to me that the reference storage
> scheme that is used while running the scheme-neutral tests should be
> easy to choose at runtime.

I ran the whole suite with both schemes during my testing, and I think
it was quite valuable in flushing out bugs.

> David Turner did some analogous work for wiring up and testing his
> proposed LMDB ref storage backend that might be useful [1]. I'm CCing
> him, since he might have thoughts on this topic.

Inline, above.