Re: [RFC PATCH 1/1] Add a type for errors

Emily Shaffer <nasamuffin@xxxxxxxxxx> · Thu, 3 Oct 2024 09:17:12 -0700

On Wed, Oct 2, 2024 at 2:54 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
>
> Hi brian
>
> Thanks for working on this.
>
> On 30/09/2024 23:03, brian m. carlson wrote:
> > There is work underway to move some of the Git code out into a reusable
> > library.  In such a case, it's not very desirable to have the library
> > code write random errors to standard error, since this is an antipattern
> > which annoys terminal users.
> >
> > Instead, we will expect callers of our library function to return
> > errors.
>
> Isn't it the callers that will expect the function to return an error?
>
> > The reusability of our library will be substantially improved
> > if we return typed errors so that users can easily determine what kind
> > of error might have occurred and react differently based on different
> > contexts.  For example, if we are looking up an object in a partial
> > clone and it's not found, we might decide to download it, but we might
> > return an error to the user if the problem is instead that the revision
> > specified is not syntactically valid.
> >
> > To help the libification process and make our code more generally
> > maintainable, add an error type.  This consists of on 64-bit integer,
> > which contains bit flags and a 32-bit code, and a pointer, which depends
> > on the code.  It is designed to be passed and returned by value, not
> > pointer, and it is possible to do so in two registers on 64-bit systems.
> > Similar functionality works well for error types in Rust and for the
> > standard library's lldiv_t, so this should not pose a problem.
>
> Part of the reason it works well in rust is that it supports
> discriminated unions with pattern matching and has the "?" macro for
> early returns. In C the code ends up being quite verbose compared to
> taking a pointer to error struct as a function parameter and returning a
> boolean success/fail flag.
>
>      struct git_error e;
>      struct object_id oid;
>
>      e = repo_get_oid(r, "HEAD", &oid);
>      if (!GIT_ERROR_SUCCESS(e))
>          return e;
>
> With a boolean return we can have
>
>      struct object_id oid;
>
>      if (repo_get_oid(r, "HEAD", &oid, e))
>          return -1;
>
> where "e" is a "struct git_error*" passed into the function.

I actually don't find this complaint all that compelling; it's not
hard to write a shorter macro that can be used inline, so we can do
things like:

    ERR_VAR(e);
    if(ERR(e, repo_get_oid(...))
      return e;

or even a macro to do the return if desired:

    ERR_VAR(e); // or, i guess we can be not SO lazy and just write
struct git_error e here, whatever :) )
    RETURN_IF_ERR(e, repo_get_oid(...));

For better or worse, you can do a lot of things in a macro, so I don't
see verboseness as much of an issue because I think we can hide a lot
of it this way.

>
> > Provide the ability to specify either an errno value or a git error code
> > as the code.  This allows us to use this type generically when handling
> > errno values such as processing files, as well as express a rich set of
> > possible error codes specific to Git.  We pick an unsigned 32-bit code
> > because Windows can use the full set of 32 bits in its error values,
> > even though most Unix systems use only a small set of codes which
> > traditionally start at 1.  32 bits for Git errors also allows us plenty
> > of space to expand as we see fit.
>
> I think the design of the struct is fine. It does mean we need a global
> list of error values. GError [1] avoids this by having a "domain" field
> that is an interned string so that error codes only need to be unique
> within a given domain. I think that for a self-contained project like
> git a global list is probably fine - svn does this for example [2].
>
> [1] https://docs.gtk.org/glib/error-reporting.html
> [2]
> https://github.com/apache/subversion/blob/be229fd54f5860b3140831671efbfd3f7f6fbb0b/subversion/include/svn_error_codes.h
>
> > Allow multiple errors to be provided and wrapped in a single object,
> > which is useful in many situations, and add helpers to determine if any
> > error in the set matches a particular code.
>
> The api appears to require the caller know up front how many errors
> there will be which seems unlikely to be true in practice. I think a
> more realistic design would allow callers to push errors as they occur
> and grow the array accordingly. For example ref_transaction_prepare()
> would want to return a list of errors, one for each ref that it was
> unable to lock or which did not match the expected value but it would
> not know up-front how many errors there were going to be.
>
> It would be useful to be able to add context to an error as the stack is
> unwound. For example if unpack_trees() detects that it would overwrite
> untracked files it prints a error listing those files. The exact
> formatting of that message depends on the command being run. That is
> currently handled by calling setup_unpack_trees_porcelain() with the
> name of the command before calling unpack_trees(). In a world where
> unpack_trees() returns a error containing the list of files we would
> want to customize the message by adding some context before converting
> it to a string.
>
> > Additionally, provide error formatting functions that produce a suitable
> > localized string for ease of use.
>
> I share Emily's concern that this function has to know the details of
> how to format every error. We could mitigate that somewhat using a
> switch that calls external helper functions that do the actual formatting
>
>      switch (e.code) {
>      case GIT_ERR_OBJECT_NOT_FOUND:
>          format_object_not_found(buf, e); /* lives in another file */
>          break;
>      ...
>
> I know this is an RFC but I couldn't resist one code comment
>
> > +#define GIT_ERROR_SUCCESS(e) (((e).code == GIT_ERROR_BIT_INIT))
> > +#define GIT_ERROR_SUCCESS_CONSUME(e) ((git_error_free(&(e)).code == GIT_ERROR_BIT_INIT)
>
> git_error_free() returns the code as in integer so we don't need ".code"
> here. Also our coding guidelines would suggest git_error_clear() for the
> name of that function.
>
>
> In conclusion I like the general idea but have concerns about the
> verbosity of returning an error struct. It would be helpful to see some
> examples of this being used to see how it works in practice.
>
> Best Wishes
>
> Phillip
>