Re: [RFD] Libification proposal: separate internal and external interfaces

Kyle Lippincott <spectral@xxxxxxxxxx> · Thu, 9 May 2024 12:45:51 -0700

On Mon, Apr 22, 2024 at 9:26 AM Calvin Wan <calvinwan@xxxxxxxxxx> wrote:
>
> Thanks everyone for your initial comments on this discussion. I wanted
> to provide some examples of how an internal/external interface could
> look in practice -- originally I had intended to use git-std-lib v6 as
> that example, but found that it fell short due to feedback that only
> being able to expose a smaller subset of functions in that library would
> be insufficient for users since they should have the same tools that we
> have for building Git. In this reply, I have two examples of paths
> forward that such an interface could look like for future libraries
> (both methods would require a non-trivial amount of code change so this
> seemed like a better idea than completely refactoring git-std-lib twice).
>
> Part of the reason for wanting to expose a smaller subset of library
> functions initially was to avoid having to expose functions that do
> things a library function shouldn't, mainly those with die() calls. I
> chose `strbuf_grow()` as the example function to be libified with an
> internal/external interface since it has a die() call and in a library,
> we would want to pass that error up rather than die()ing. I have two
> ideas for how such an interface could look. For reference, this is how
> `strbuf_grow()` currently looks:
>
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 die("you want to use way too much memory");
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
> }
>
> The first idea involves turning `strbuf_grow()` into a wrapper function
> that invokes its equivalent library function, eg.
> `libgit_strbuf_grow()`:
>
> int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 return -1;
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
>         return 0;
> }
>
> void strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         if (libgit_strbuf_grow(sb, extra))
>                 die("you want to use way too much memory");
> }
>
> (Note a context object could also be added as a parameter to
> `libgit_strbuf_grow()` for error messages and possibly global variables.)
>
> In this scenario, we would be exposing `libgit_strbuf_grow()` to
> external consumers of the library, while not having to refactor internal
> uses of `strbuf_grow()`. This method would reduce initial churn within
> the codebase, however, we would want to eventually get rid of
> `strbuf_grow()` and use `libgit_strbuf_grow()` internally as well. I
> envision that it would be easier to remove die()'s all at once, from top
> down, once libification has progressed further since top level callers
> do not have to worry about refactoring any callers to accomodate passing
> up error messages/codes.
>
> The shortfall of this approach is that we'd be carrying two different
> functions for every library function until we are able to remove all of
> them. It would also create additional toil for Git contributors to
> figure out which version of the function should be used.
>
> The second idea removes the need for two different functions by removing
> the wrapper function and instead refactoring all callers of
> `strbuf_grow()` (and subsequently callers of other library functions).
>
> int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
> {
>         int new_buf = !sb->alloc;
>         if (unsigned_add_overflows(extra, 1) ||
>             unsigned_add_overflows(sb->len, extra + 1))
>                 return -1;
>         if (new_buf)
>                 sb->buf = NULL;
>         ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
>         if (new_buf)
>                 sb->buf[0] = '\0';
>         return 0;
> }
>
> void strbuf_grow_caller() {
>         strbuf *sb;
>         size_t extra;
>
>         // if only success/failure is passed up
>         if (libgit_strbuf_grow(sb, extra))
>                 die("you want to use way too much memory");
>
>         // if context object is used
>         if (libgit_strbuf_grow(sb, extra, context_obj))
>                 die(context_obj->error_msg);
>
>         // if there are multiple error codes that can be passed up
>         if (libgit_strbuf_grow(sb, extra) == -1)
>                 die("you want to use way too much memory");
>         else if (libgit_strbuf_grow(sb, extra) == -2)
>                 die("some other error");
> }

Thought about this some more last night, and I think we'll _need_ to
go with this approach for most of the libraries (all but the ones used
by builtin/*.c, which can have a wrapper that preserves existing
functionality), otherwise the libraries aren't composable. If we have
three library interfaces, with interface C depending on B which
depends on A, C needs to call the library safe versions of B, and B
needs to call the library safe versions of A, which means that the
"internal" version needs to be the library safe version. So I think
that we're back to my original proposal:

// Library-safe method, used internally
int strbuf_grow(struct strbuf *sb, size_t extra)
{
        int new_buf = !sb->alloc;
        if (unsigned_add_overflows(extra, 1) ||
            unsigned_add_overflows(sb->len, extra + 1))
                return -1;
        if (new_buf)
                sb->buf = NULL;
        ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
        if (new_buf)
                sb->buf[0] = '\0';
        return 0;
}

// "External" version
int libgit_strbuf_grow(struct strbuf *sb, size_t extra)  // Or maybe int64_t?
{
        strbuf_grow(sb, extra);
}

I don't expect there's many cases where we want to create a wrapper
that maintains the existing interface and error handling, because that
wrapper can _only_ by used by the git project binaries, not any of the
code that's in a library.

>
> One shortcoming of this approach is the need to refactor all callers of
> library functions, but that can be handled better and the churn made
> more visible with a coccinelle patch. Another shortcoming is the need
> for lengthier code blocks whenever calling a library function, however,
> it could also be seen as a benefit since the caller would understand the
> function can die(). These error messages would also ideally be passed up
> as well in the future rather than die()ing.
>
> While I tried to find a solution that avoided the shortcomings of both
> approaches, I think that answer simply does not exist so the ideas above
> are what I believe to be the least disruptive options. I'm wondering
> which interface would be more suitable, and also open to hearing if
> there are any other ideas!