Re: [PATCH v2 05/13] reftable: utility functions

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Fri, 23 Oct 2020 11:13:51 +0200

On Sat, Oct 10 2020, Han-Wen Nienhuys wrote:

> On Thu, Oct 8, 2020 at 3:48 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
>>
>> > From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
>> >
>> > This commit provides basic utility classes for the reftable library.
>> >
>> > Since the reftable library must compile standalone, there may be some overlap
>> > with git-core utility functions.
>
>> I think duplicating things like strbuf is an unnecessary burden if Git
>> is to maintain this library. Something like "reftable will only import
>> git-compat-util.h and strbuf.h, and any project that wants to use
>> reftable must make sure that these functions and data structures are
>> available" would be more plausible.
>
> Sure, but how do we ensure that the directory won't take on
> dependencies beyond these headers? I am worried that I will be
> involved in a tedious back & forth process to keep updates going into
> libgit2 and/or also have to keep maintaining
> github.com/google/reftable.
>
> FWIW, the duplication is really tiny: according to
>
>  $ wc $(grep -l REFTABLE_STANDALONE *[ch])
>
> it's just 431 lines of code.

Why import the dead code at all? If I add this to your update script:

    # Not a general solution, but works for the specific code here.
    perl -0777 -pi -e 's[(#ifndef REFTABLE_STANDALONE\n.*?\n#else\n).*?(?=\n#endif)][$1/* Removed upstream code for git.git import */]gs' reftable/system.h
    perl -0777 -pi -e 's[(?<=#ifdef REFTABLE_STANDALONE\n).*?(?=\n#endif)][/* Removed upstream code for git.git import */]gs' reftable/strbuf.c reftable/compat.h
    perl -0777 -pi -e 's[(?<=#ifdef REFTABLE_STANDALONE\n).*?(?=\n#else)][/* Removed upstream code for git.git import */]gs' reftable/compat.c reftable/strbuf.h

It's now 157 lines instead of 431.

I think doing that with a tiny bit more complexity in the update.sh
script is a much lower maintenance burden.

It's not just about the number of lines, but things coming up in grep,
and now unique code really stands out (e.g. strbuf_add_void, should
probably be just added to the main strbuf.h if it's needed...), and of
course the cost of attention of eyeballing an already big series on the
ML & the churn every time we have updates.

Overall I'm all for having this carved out in its own directory at a
cost of a bit more maintenance burden if it can be shared with libgit2 &
be a standalone library.

I am concerned that it seems this code can't be maintained in git.git by
anyone not willing to sign a contract with Google. I sent a tiny PR for
a typo fix at [1] and got directed to sign [2] before someone at Google
could look at it. I see brian raised this before in [3] but there wasn't
a reply to that point.

Is there some summary of how this part of integrating it is supposed to
work going forward?

At first glance that seems like a recipe for perma-forking this pretty
much from day one, i.e.:

 A. Upstream changes happen
 B. We get a patch to the bundled library to this ML
 ==> Google employees maintaining A can't even look at B
 ==> Patch B can't be integrated into A

1. https://github.com/google/reftable/pull/2
2. https://cla.developers.google.com/about/google-individual
3. https://lore.kernel.org/git/20200512233741.GB6605@xxxxxxxxxxxxxxxxxxxxxxxxx/