Re: [PATCH v7 0/6] Reftable support git-core

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 23 Mar 2020 23:06:28 -0700

Hi!

Han-Wen Nienhuys wrote:

> This adds the reftable library, and hooks it up as a ref backend.
>
> Feedback wanted:
>
>  * spots marked with XXX in the Git source code.
>
>  * what is a good test strategy? Could we have a CI flavor where we flip the
>    default to reftable, and see how it fares?
[...]
> Han-Wen Nienhuys (5):
>   refs.h: clarify reflog iteration order
>   create .git/refs in files-backend.c
>   refs: document how ref_iterator_advance_fn should handle symrefs
>   Add reftable library
>   Reftable support for git-core
>
> Jonathan Nieder (1):
>   reftable: file format documentation
[...]
>  51 files changed, 8547 insertions(+), 32 deletions(-)

This series was discussed at the Git Contributor Summit, and I
promised to provide a summary here.  Sorry for the delay.

Contribution format
~~~~~~~~~~~~~~~~~~~
This is a bit of an unusual contribution to Git:

- the heart of the series is in a single commit
- it is under a different license than most of Git
- the code doesn't use the same dialect as and "look like" the rest of
  Git

All three of these are because the code is meant to be a standalone
library that can be used by Git, libgit2, and any other project that
wants to understand reftables.  But they create obstacles to reviewing
(one reviewer said he had spent a two-hour block looking at the patch
and not made much headway).

On the other hand, people mostly agreed that having the ability to
share this code with libgit2 is beneficial.  So how can we get a
substantial review without losing that benefit?

 1. Most importantly, people would like a series of multiple patches
    that tell a story.  If review of an early patch in the series
    reveals significant changes that should happen in reftable
    upstream, that's fine --- upstream, that can happen in a patch on
    top, whereas in the series being contributed, the patch would be
    amended.

    In the end, there would be two different series of commits: the
    contributed series, and the upstream history.  The upstream
    history might be messier, and that's fine.  The contributed series
    would be applied to git.git.

    If I understood correctly, this is more important for the initial
    contribution than for later changes.  Git includes some code (e.g.
    in compat/) with outside upstreams and gets all-in-one-commit
    updates.  As long as upstream has a functional review process,
    this is fine --- here, reftable upstream (you :)) wants review on
    the initial contribution and splitting into a series that tells a
    story is the cost of making that review feasible to provide.

 2. Both Git and libgit2 have abstractions like strbuf.  We'd like
    reftable to make use of similar abstractions where they make the
    code cleaner.

 3. libgit2 has a git_malloc allocator.  reftable doesn't necessarily
    have to use it or make it pluggable --- at worst, we can #define
    malloc to use it.  But it's something to be aware of.

Maintenance
~~~~~~~~~~~
There's some interest in the maintenance story: if we run into an
issue with the reftable library, do we report and fix it on the git
mailing list or does the reftable library have its own upstream forums
for that?

Portability
~~~~~~~~~~~
There was some confusion about the scope of what the reftable library
provides.  It provides a reader and a writer and the caller is
responsible for connecting those to files.

There were some questions about mmap usage here (there is none) and
whether the library needs some kind of seeking reader interface for
abstracting the OS interface to files.

I think the upshot is

 4. Some summary documentation would be helpful e.g. in the README.md,
    to point people to what header file a person should start with to
    understand the library

 5. People are also interested in the file-oriented part of reftable,
    even though this library doesn't implement it (that's patch 6,
    which is Git-specific).

Testing
~~~~~~~
Git's testsuite has various GIT_TEST_* knobs.  A GIT_TEST_REF_BACKEND
knob would be helpful, to allow running the full testsuite with
reftable.  That's a good way to suss out edge cases.

The testsuite should *also* include some specific tests designed for
reftable.  They can demonstrate edge cases and demonstrate some of the
benefits

- case sensitivity
- handling of directory/file conflicts

That provides another entry point for people to understand reftable.

We also discussed table-driven tests that can be shared between
implementations but didn't end up saying anything too concrete about
that.

Summary
~~~~~~~
Breaking down the library into multiple patches is likely to be a
signficant amount of work, but it's a good way to break the review
into manageable pieces that get the project engaged.

Integration testing is also likely to be helpful.

People were very excited about the benefits reftable brings and it
feels a little strange to ask so much when you've already written this
code, but this seemed like the best way to make it into something the
project understands well and to get any kinks ironed out.  I'm happy
to help in any way I can.

Thoughts?

Thanks,
Jonathan