Re: [gitgitgadget/git] Reftable support git-core (#539)

Han-Wen Nienhuys <hanwenn@xxxxxxxxx> · Fri, 27 Mar 2020 12:20:48 +0100

On Tue, Mar 24, 2020 at 7:09 AM gitgitgadget[bot]
<notifications@xxxxxxxxxx> wrote:
> Contribution format
> ~~~~~~~~~~~~~~~~~~~
> This is a bit of an unusual contribution to Git:
>
> - the heart of the series is in a single commit
> - it is under a different license than most of Git
> - the code doesn't use the same dialect as and "look like" the rest of
>   Git
>
> All three of these are because the code is meant to be a standalone
> library that can be used by Git, libgit2, and any other project that
> wants to understand reftables.  But they create obstacles to reviewing
> (one reviewer said he had spent a two-hour block looking at the patch
> and not made much headway).
>
> On the other hand, people mostly agreed that having the ability to
> share this code with libgit2 is beneficial.  So how can we get a
> substantial review without losing that benefit?
>
>  1. Most importantly, people would like a series of multiple patches
>     that tell a story.  If review of an early patch in the series
>     reveals significant changes that should happen in reftable
>     upstream, that's fine --- upstream, that can happen in a patch on
>     top, whereas in the series being contributed, the patch would be
>     amended.
>
>     In the end, there would be two different series of commits: the
>     contributed series, and the upstream history.  The upstream
>     history might be messier, and that's fine.  The contributed series
>     would be applied to git.git.

This not very practical. I currently sync the reftable library into
git by means of the reftable/update.sh script,
and creating a layered commits is a lot of busywork.  Also, getting
the entire code in one go lets reviewers see how the whole thing fits
together.

I do understand the complaint, though. The good news is that there
already is a story in the source code; you just have to read it in the
right order. That order is (bottom-up):

* record.{c,h} - (de)serialized single records
* block.{c,h} - read and write blocks
* writer.c - write a reftable
* reader.c - read a reftable
* merged.{c,h}, pq.{c,h} - reading a stack of reftables
* stack.{c,h} - writing and compacting stacks of reftable on the filesystem.

before anyone attempts to review the code, they should read the
specification of the format very carefully.

>  2. Both Git and libgit2 have abstractions like strbuf.  We'd like
>     reftable to make use of similar abstractions where they make the
>     code cleaner.

Have you looked at slice.{c,h} ? Is there any specific code that
anyone had an issue with?

>  3. libgit2 has a git_malloc allocator.  reftable doesn't necessarily
>     have to use it or make it pluggable --- at worst, we can #define
>     malloc to use it.  But it's something to be aware of.

I added pluggable malloc routines.

> Maintenance
> ~~~~~~~~~~~
> There's some interest in the maintenance story: if we run into an
> issue with the reftable library, do we report and fix it on the git
> mailing list or does the reftable library have its own upstream forums
> for that?

There is no specific forum. The most practical solution would be
report to the git mailing list with cc to hanwen@xxxxxxxxxx.

Alternatively, an issue on the github project would also work.

I don't foresee much need for maintenance, though. (Or do you foresee
a need to store more types of data besides refs and reflogs?)

> Portability
> ~~~~~~~~~~~
> There was some confusion about the scope of what the reftable library
> provides.  It provides a reader and a writer and the caller is
> responsible for connecting those to files.

No, it also does transactional updates to a stack of reftables on the
filesystem.

> There were some questions about mmap usage here (there is none) and

The format is suitable for mmap, but it seems premature optimization,
given the (lack of) efficiency of loose+packed refs storage that it
indends to replace.

> whether the library needs some kind of seeking reader interface for
> abstracting the OS interface to files.

Look at reftable_block_source in reftable.h

> I think the upshot is
>
>  4. Some summary documentation would be helpful e.g. in the README.md,
>     to point people to what header file a person should start with to
>     understand the library

reftable.h is a good start, as it declares the public interface, and
is extensively documented. It also addresses many of the points that
this email raised.

>  5. People are also interested in the file-oriented part of reftable,
>     even though this library doesn't implement it (that's patch 6,
>     which is Git-specific).

No, this is incorrect.

> Testing
> ~~~~~~~
> Git's testsuite has various GIT_TEST_* knobs.  A GIT_TEST_REF_BACKEND
> knob would be helpful, to allow running the full testsuite with
> reftable.  That's a good way to suss out edge cases.

I agree, but this outside my personal area of expertise. I think this
is best tackled by someone else.

> The testsuite should *also* include some specific tests designed for
> reftable.  They can demonstrate edge cases and demonstrate some of the
> benefits

There are unittests in the upstream project. Edge cases in the library
itself should be covered by unittests. Is there interest in adding
these to the git project itself?

I can add some Git tests for directory/file conflicts and case sensitivity.

> We also discussed table-driven tests that can be shared between
> implementations but didn't end up saying anything too concrete about
> that.

My plan is to add a cross-language tests, that passes data between
JGit, Go and C implementations to verify that they interoperate.

--
Han-Wen Nienhuys - hanwenn@xxxxxxxxx - http://www.xs4all.nl/~hanwen