Re: [PATCH v2 04/22] reftable/basics: handle allocation failures in `reftable_calloc()`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 26, 2024 at 09:13:23AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> >> But it illustrates why open coding is not necessarily an excellent
> >> idea in the longer term, doesn't it?  When unsigned_mult_overflows()
> >> is updated to avoid such a false positive, how would we remember
> >> that we need to update this copy we?
> >
> > I agree in general, but with the reftable library I'm stuck between a
> > rock and a hard place. My goal is to make it fully reusable by other
> > projects without first having to do surgery on their side. While having
> > something like `st_mult()` is simple enough to port over, the biggest
> > problem I have is that over time we sneak in more and more code from the
> > Git codebase. The result is death by a thousand cuts.
> 
> > Now if we had a single header that exposes a small set of functions
> > without _anything_ else it could work alright. But I rather doubt that
> > we really want to have a standalone header for `st_mult()` that we can
> > include. But without such a standalone header it is simply too easy to
> > start building on top of Git features by accident.
> >
> > So I'm instead aiming to go a different way and fully cut the reftable
> > code loose from Git. So even if we e.g. eventually want `struct strbuf`
> > return errors on failure, it would only address one part of my problem.
> 
> The dependency to "strbuf" (just as an example) was added initially
> fairly early.  Soon after 27f7ed2a (reftable: add LICENSE,
> 2021-10-07) added the reftable/ hierarchy, e303bf22 (reftable:
> (de)serialization for the polymorphic record type., 2021-10-07).  I
> somehow had an impression that reftable "library" started without
> any Git dependency and then use of our helper functions seemed
> through from the shim layer, but it was pretty much part of the
> library from day one, it seems.

I think the history goes a bit different. Initially, the reftable
library was developed completely outside of the Git tree in [1]. It did
not have any external dependencies and didn't use any of the Git code.

During the upstreaming process the discussion rather quickly swayed into
the direction of making Git the canonical upstream instead of using a
separate repository that hosts the code. See e.g. [2], which was still
tracking a VERSION file to keep track of the upstream version that the
code was pulled from. In [3] the discussion started about upstream
requiring a CLA, which was an (understandable) non-starter. I cannot
find the statement, but eventually we decided that canonical upstream
should be Git, and the VERSION file was never committed.

But with that decision, now that the reftable library was part of Git,
the reviews also started to note that it really should use functions
provided by Git. The initial version of the patch series didn't yet have
any of that. From my point of view that was a big mistake, as the result
is that Git is _not_ reusable by other projects in its current state.

[1]: https://github.com/google/reftable/tree/master/c
[2]: <2106ff286b1135f9428529d9fc392edc127e960c.1580134944.git.gitgitgadget@xxxxxxxxx>
[3]: <20200207001612.GF6573@xxxxxxxxxxxxxxxxxxxxxxxxx>

> > A couple months ago I said that I'll try to write something like shims
> > that wrap our types in reftable types such that other projects can
> > provide implementations for such shims themselves. I tried to make that
> > work, but the result was an unholy mess that really didn't make any
> > sense whatsoever. Most of the features that I need from the Git codebase
> > can be provided in a couple of lines of code (`struct strbuf` is roughly
> > 50 lines for example), plus maybe a callback function that can be
> > provided to wire things up on the user side (`register_tempfiles()` for
> > example). So once I saw that those wrappers are harder to use and result
> > in roughly the same lines of code I decided to scrap that approach and
> > instead try to convert it fully.
> >
> > So yeah, overall we shouldn't open-code things like this. But I don't
> > really see another way to do this for the reftable library.
> 
> But isn't all of the above what Libification ought to be about?  I
> was hoping that the reftable polishing would not have to be done by
> you alone, and you would recruit those who are into libification of
> other parts of Git codebase to help cleaning up these fringe (from
> the point of view of reftable) interfaces.
> 
> What do the libification folks feel about this (folks involved in
> libgit-sys CC'ed; of course all others who are interested in the
> libification topic are welcome to comment)?

I think it's orthogonal to that. The libification effort doesn't really
care about exposing low-level details like `st_mult()` as a library, or
even the `strbuf` interface. It cares about making Git functionality
available, which is on a much higher conceptual level.

Furthermore, to the best of my knowledge the intent wasn't ever to
provide the ability to take a couple of C files and have them be easy to
build as part of another project. The goal is to have proper interfaces
to our code, but whether the code itself has internal interdependencies
is not really much of a concern (globals are a different story).

My goal is different: I want to `cp -r` the "reftable/" directory into
libgit2 or any other project that wants to implement a reftable backend
without having to care about all the different dependencies that it may
pull in or having to carefully massage all of the includes.

Now if it was a single standalone file that one would have to copy in
addition to that I'd be fine. But "strbuf.c' isn't that at all: it ropes
in "gettext.h", "hex-ll.h", "string-list.h", "utf-8.h", "date.h". All of
which is not needed at all, but the person who does the porting now has
to carefully figure out what can and cannot be removed from both
"strbuf.h" and "strbuf.c".

So... it gets out of hand really fast. We would have to significantly
change the way we develop the Git codebase if we wanted to make this a
good experience. But enforcing very strict coding guidelines onto all of
our codebase doesn't feel reasonable to me, which is why I am instead
trying to reinstate the state where the reftable library was decoupled
from the rest of the Git codebase.

The difference here is roughly 100 to 200 lines of code, which I don't
think is much of a maintenance burden by itself. In fact, we'll even
start to share the maintenance burden with libgit2, because they would
be able to use the exact same copy as we do and thus contribute back
bugfixes and improvements for discoveries that their additional test
coverage brings us.

I hope that this all clarifies my point of view :) I've also Cc'd
Han-Wen in case I've made any mistakes regarding the original intent.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux