Re: [RFD] Libification proposal: separate internal and external interfaces

Calvin Wan <calvinwan@xxxxxxxxxx> · Mon, 22 Apr 2024 16:26:17 +0000

Thanks everyone for your initial comments on this discussion. I wanted
to provide some examples of how an internal/external interface could
look in practice -- originally I had intended to use git-std-lib v6 as
that example, but found that it fell short due to feedback that only
being able to expose a smaller subset of functions in that library would
be insufficient for users since they should have the same tools that we
have for building Git. In this reply, I have two examples of paths
forward that such an interface could look like for future libraries
(both methods would require a non-trivial amount of code change so this
seemed like a better idea than completely refactoring git-std-lib twice).

Part of the reason for wanting to expose a smaller subset of library
functions initially was to avoid having to expose functions that do
things a library function shouldn't, mainly those with die() calls. I
chose `strbuf_grow()` as the example function to be libified with an
internal/external interface since it has a die() call and in a library,
we would want to pass that error up rather than die()ing. I have two
ideas for how such an interface could look. For reference, this is how
`strbuf_grow()` currently looks:

void strbuf_grow(struct strbuf *sb, size_t extra)
{
	int new_buf = !sb->alloc;
	if (unsigned_add_overflows(extra, 1) ||
	    unsigned_add_overflows(sb->len, extra + 1))
		die("you want to use way too much memory");
	if (new_buf)
		sb->buf = NULL;
	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
	if (new_buf)
		sb->buf[0] = '\0';
}

The first idea involves turning `strbuf_grow()` into a wrapper function
that invokes its equivalent library function, eg.
`libgit_strbuf_grow()`:

int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
{
	int new_buf = !sb->alloc;
	if (unsigned_add_overflows(extra, 1) ||
	    unsigned_add_overflows(sb->len, extra + 1))
		return -1;
	if (new_buf)
		sb->buf = NULL;
	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
	if (new_buf)
		sb->buf[0] = '\0';
        return 0;
}

void strbuf_grow(struct strbuf *sb, size_t extra)
{
        if (libgit_strbuf_grow(sb, extra))
                die("you want to use way too much memory");
}

(Note a context object could also be added as a parameter to
`libgit_strbuf_grow()` for error messages and possibly global variables.)

In this scenario, we would be exposing `libgit_strbuf_grow()` to
external consumers of the library, while not having to refactor internal
uses of `strbuf_grow()`. This method would reduce initial churn within
the codebase, however, we would want to eventually get rid of
`strbuf_grow()` and use `libgit_strbuf_grow()` internally as well. I
envision that it would be easier to remove die()'s all at once, from top
down, once libification has progressed further since top level callers
do not have to worry about refactoring any callers to accomodate passing
up error messages/codes. 

The shortfall of this approach is that we'd be carrying two different
functions for every library function until we are able to remove all of
them. It would also create additional toil for Git contributors to
figure out which version of the function should be used.

The second idea removes the need for two different functions by removing
the wrapper function and instead refactoring all callers of
`strbuf_grow()` (and subsequently callers of other library functions).

int libgit_strbuf_grow(struct strbuf *sb, size_t extra)
{
	int new_buf = !sb->alloc;
	if (unsigned_add_overflows(extra, 1) ||
	    unsigned_add_overflows(sb->len, extra + 1))
		return -1;
	if (new_buf)
		sb->buf = NULL;
	ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc);
	if (new_buf)
		sb->buf[0] = '\0';
        return 0;
}

void strbuf_grow_caller() {
	strbuf *sb;
	size_t extra;

	// if only success/failure is passed up
	if (libgit_strbuf_grow(sb, extra))
                die("you want to use way too much memory");

	// if context object is used
	if (libgit_strbuf_grow(sb, extra, context_obj))
                die(context_obj->error_msg);

	// if there are multiple error codes that can be passed up
	if (libgit_strbuf_grow(sb, extra) == -1)
                die("you want to use way too much memory");
	else if (libgit_strbuf_grow(sb, extra) == -2)
		die("some other error");
}

One shortcoming of this approach is the need to refactor all callers of
library functions, but that can be handled better and the churn made
more visible with a coccinelle patch. Another shortcoming is the need
for lengthier code blocks whenever calling a library function, however,
it could also be seen as a benefit since the caller would understand the
function can die(). These error messages would also ideally be passed up
as well in the future rather than die()ing.

While I tried to find a solution that avoided the shortcomings of both
approaches, I think that answer simply does not exist so the ideas above
are what I believe to be the least disruptive options. I'm wondering
which interface would be more suitable, and also open to hearing if
there are any other ideas!