On Sat, Oct 10 2020, Han-Wen Nienhuys wrote: > On Thu, Oct 8, 2020 at 3:48 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: >> >> > From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx> >> > >> > This commit provides basic utility classes for the reftable library. >> > >> > Since the reftable library must compile standalone, there may be some overlap >> > with git-core utility functions. > >> I think duplicating things like strbuf is an unnecessary burden if Git >> is to maintain this library. Something like "reftable will only import >> git-compat-util.h and strbuf.h, and any project that wants to use >> reftable must make sure that these functions and data structures are >> available" would be more plausible. > > Sure, but how do we ensure that the directory won't take on > dependencies beyond these headers? I am worried that I will be > involved in a tedious back & forth process to keep updates going into > libgit2 and/or also have to keep maintaining > github.com/google/reftable. > > FWIW, the duplication is really tiny: according to > > $ wc $(grep -l REFTABLE_STANDALONE *[ch]) > > it's just 431 lines of code. Why import the dead code at all? If I add this to your update script: # Not a general solution, but works for the specific code here. perl -0777 -pi -e 's[(#ifndef REFTABLE_STANDALONE\n.*?\n#else\n).*?(?=\n#endif)][$1/* Removed upstream code for git.git import */]gs' reftable/system.h perl -0777 -pi -e 's[(?<=#ifdef REFTABLE_STANDALONE\n).*?(?=\n#endif)][/* Removed upstream code for git.git import */]gs' reftable/strbuf.c reftable/compat.h perl -0777 -pi -e 's[(?<=#ifdef REFTABLE_STANDALONE\n).*?(?=\n#else)][/* Removed upstream code for git.git import */]gs' reftable/compat.c reftable/strbuf.h It's now 157 lines instead of 431. I think doing that with a tiny bit more complexity in the update.sh script is a much lower maintenance burden. It's not just about the number of lines, but things coming up in grep, and now unique code really stands out (e.g. strbuf_add_void, should probably be just added to the main strbuf.h if it's needed...), and of course the cost of attention of eyeballing an already big series on the ML & the churn every time we have updates. Overall I'm all for having this carved out in its own directory at a cost of a bit more maintenance burden if it can be shared with libgit2 & be a standalone library. I am concerned that it seems this code can't be maintained in git.git by anyone not willing to sign a contract with Google. I sent a tiny PR for a typo fix at [1] and got directed to sign [2] before someone at Google could look at it. I see brian raised this before in [3] but there wasn't a reply to that point. Is there some summary of how this part of integrating it is supposed to work going forward? At first glance that seems like a recipe for perma-forking this pretty much from day one, i.e.: A. Upstream changes happen B. We get a patch to the bundled library to this ML ==> Google employees maintaining A can't even look at B ==> Patch B can't be integrated into A 1. https://github.com/google/reftable/pull/2 2. https://cla.developers.google.com/about/google-individual 3. https://lore.kernel.org/git/20200512233741.GB6605@xxxxxxxxxxxxxxxxxxxxxxxxx/