On Mon, Apr 22, 2024 at 9:26 AM Calvin Wan <calvinwan@xxxxxxxxxx> wrote: > > Thanks everyone for your initial comments on this discussion. I wanted > to provide some examples of how an internal/external interface could > look in practice -- originally I had intended to use git-std-lib v6 as > that example, but found that it fell short due to feedback that only > being able to expose a smaller subset of functions in that library would > be insufficient for users since they should have the same tools that we > have for building Git. In this reply, I have two examples of paths > forward that such an interface could look like for future libraries > (both methods would require a non-trivial amount of code change so this > seemed like a better idea than completely refactoring git-std-lib twice). > > Part of the reason for wanting to expose a smaller subset of library > functions initially was to avoid having to expose functions that do > things a library function shouldn't, mainly those with die() calls. I > chose `strbuf_grow()` as the example function to be libified with an > internal/external interface since it has a die() call and in a library, > we would want to pass that error up rather than die()ing. I have two > ideas for how such an interface could look. For reference, this is how > `strbuf_grow()` currently looks: > > void strbuf_grow(struct strbuf *sb, size_t extra) > { > int new_buf = !sb->alloc; > if (unsigned_add_overflows(extra, 1) || > unsigned_add_overflows(sb->len, extra + 1)) > die("you want to use way too much memory"); > if (new_buf) > sb->buf = NULL; > ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > if (new_buf) > sb->buf[0] = '\0'; > } > > The first idea involves turning `strbuf_grow()` into a wrapper function > that invokes its equivalent library function, eg. > `libgit_strbuf_grow()`: > > int libgit_strbuf_grow(struct strbuf *sb, size_t extra) > { > int new_buf = !sb->alloc; > if (unsigned_add_overflows(extra, 1) || > unsigned_add_overflows(sb->len, extra + 1)) > return -1; > if (new_buf) > sb->buf = NULL; > ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > if (new_buf) > sb->buf[0] = '\0'; > return 0; > } > > void strbuf_grow(struct strbuf *sb, size_t extra) > { > if (libgit_strbuf_grow(sb, extra)) > die("you want to use way too much memory"); > } > > (Note a context object could also be added as a parameter to > `libgit_strbuf_grow()` for error messages and possibly global variables.) > > In this scenario, we would be exposing `libgit_strbuf_grow()` to > external consumers of the library, while not having to refactor internal > uses of `strbuf_grow()`. This method would reduce initial churn within > the codebase, however, we would want to eventually get rid of > `strbuf_grow()` and use `libgit_strbuf_grow()` internally as well. I > envision that it would be easier to remove die()'s all at once, from top > down, once libification has progressed further since top level callers > do not have to worry about refactoring any callers to accomodate passing > up error messages/codes. > > The shortfall of this approach is that we'd be carrying two different > functions for every library function until we are able to remove all of > them. It would also create additional toil for Git contributors to > figure out which version of the function should be used. > > The second idea removes the need for two different functions by removing > the wrapper function and instead refactoring all callers of > `strbuf_grow()` (and subsequently callers of other library functions). > > int libgit_strbuf_grow(struct strbuf *sb, size_t extra) > { > int new_buf = !sb->alloc; > if (unsigned_add_overflows(extra, 1) || > unsigned_add_overflows(sb->len, extra + 1)) > return -1; > if (new_buf) > sb->buf = NULL; > ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); > if (new_buf) > sb->buf[0] = '\0'; > return 0; > } > > void strbuf_grow_caller() { > strbuf *sb; > size_t extra; > > // if only success/failure is passed up > if (libgit_strbuf_grow(sb, extra)) > die("you want to use way too much memory"); > > // if context object is used > if (libgit_strbuf_grow(sb, extra, context_obj)) > die(context_obj->error_msg); > > // if there are multiple error codes that can be passed up > if (libgit_strbuf_grow(sb, extra) == -1) > die("you want to use way too much memory"); > else if (libgit_strbuf_grow(sb, extra) == -2) > die("some other error"); > } Thought about this some more last night, and I think we'll _need_ to go with this approach for most of the libraries (all but the ones used by builtin/*.c, which can have a wrapper that preserves existing functionality), otherwise the libraries aren't composable. If we have three library interfaces, with interface C depending on B which depends on A, C needs to call the library safe versions of B, and B needs to call the library safe versions of A, which means that the "internal" version needs to be the library safe version. So I think that we're back to my original proposal: // Library-safe method, used internally int strbuf_grow(struct strbuf *sb, size_t extra) { int new_buf = !sb->alloc; if (unsigned_add_overflows(extra, 1) || unsigned_add_overflows(sb->len, extra + 1)) return -1; if (new_buf) sb->buf = NULL; ALLOC_GROW(sb->buf, sb->len + extra + 1, sb->alloc); if (new_buf) sb->buf[0] = '\0'; return 0; } // "External" version int libgit_strbuf_grow(struct strbuf *sb, size_t extra) // Or maybe int64_t? { strbuf_grow(sb, extra); } I don't expect there's many cases where we want to create a wrapper that maintains the existing interface and error handling, because that wrapper can _only_ by used by the git project binaries, not any of the code that's in a library. > > One shortcoming of this approach is the need to refactor all callers of > library functions, but that can be handled better and the churn made > more visible with a coccinelle patch. Another shortcoming is the need > for lengthier code blocks whenever calling a library function, however, > it could also be seen as a benefit since the caller would understand the > function can die(). These error messages would also ideally be passed up > as well in the future rather than die()ing. > > While I tried to find a solution that avoided the shortcomings of both > approaches, I think that answer simply does not exist so the ideas above > are what I believe to be the least disruptive options. I'm wondering > which interface would be more suitable, and also open to hearing if > there are any other ideas!