Re: [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Wed, 27 Sep 2023 16:42:03 -0400

On Wed, Sep 27, 2023 at 3:55 PM Eric W. Biederman <ebiederm@xxxxxxxxx> wrote:
> Two basic functions are provided:
> - convert_object_file Takes an object file it's type and hash algorithm
>   and converts it into the equivalent object file that would
>   have been generated with hash algorithm "to".
>
>   For blob objects there is no converstion to be done and it is an
>   error to use this function on them.

s/converstion/conversion/

>   For commit, tree, and tag objects embedded oids are replaced by the
>   oids of the objects they refer to with those objects and their
>   object ids reencoded in with the hash algorithm "to".  Signatures
>   are rearranged so that they remain valid after the object has
>   been reencoded.
>
> - repo_oid_to_algop which takes an oid that refers to an object file
>   and returns the oid of the equavalent object file generated
>   with the target hash algorithm.

s/equavalent/equivalent/

> The pair of files object-file-convert.c and object-file-convert.h are
> introduced to hold as much of this logic as possible to keep this
> conversion logic cleanly separated from everything else and in the
> hopes that someday the code will be clean enough git can support
> compiling out support for sha1 and the various conversion functions.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

Just some minor comments below, many of which are subjective
style-related observations, thus not necessarily actionable, but also
one or two legitimate questions.

> diff --git a/object-file-convert.c b/object-file-convert.c
> @@ -0,0 +1,57 @@
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +                     const struct git_hash_algo *to, struct object_id *dest)
> +{
> +       /*
> +        * If the source alogirthm is not set, then we're using the
> +        * default hash algorithm for that object.
> +        */

s/alogirthm/algorithm/

> +       const struct git_hash_algo *from =
> +               src->algo ? &hash_algos[src->algo] : repo->hash_algo;
> +
> +       if (from == to) {
> +               if (src != dest)
> +                       oidcpy(dest, src);
> +               return 0;
> +       }
> +       return -1;
> +}

On this project, we usually get the simple cases out of the way first,
which often reduces the indentation level, making the code easier to
digest at a glance. So, it would be typical to write this as:

    if (from != to)
        return -1
    if (src != dest)
        oidcpy(dest, src);
    return 0;

or even:

    if (from != to)
        return -1
    if (src == dest)
        return 0;
    oidcpy(dest, src);
    return 0;

This way, for instance, the reader doesn't get to the end of the
function and then have to scan backward to understand the condition of
the `return -1`.

> +int convert_object_file(struct strbuf *outbuf,
> +                       const struct git_hash_algo *from,
> +                       const struct git_hash_algo *to,
> +                       const void *buf, size_t len,
> +                       enum object_type type,
> +                       int gentle)
> +{
> +       int ret;
> +
> +       /* Don't call this function when no conversion is necessary */
> +       if ((from == to) || (type == OBJ_BLOB))
> +               die("Refusing noop object file conversion");

Several comments...

Style: we usually reduce the noise level by dropping the extra parentheses:

    if (from == to || type == OBJ_BLOB)

Does this condition represent a programming error or a runtime error
triggerable by some input? If a programming error, then use BUG()
rather than die().

If a triggerable runtime error, then...

* start user-facing messages with lowercase rather than capitalized word

* make the user-facing message localizable so readers of other
languages can digest it

    die(_("refusing do-nothing object conversion"));

On the other hand, don't make BUG() messages localizable.

> +       switch (type) {
> +       case OBJ_COMMIT:
> +       case OBJ_TREE:
> +       case OBJ_TAG:
> +       default:
> +               /* Not implemented yet, so fail. */
> +               ret = -1;
> +               break;
> +       }
> +       if (!ret)
> +               return 0;
> +       if (gentle) {
> +               strbuf_release(outbuf);
> +               return ret;
> +       }

This function appears to be a mere skeleton at the moment, so it's
difficult to judge at this point whether you are using `outbuf` as a
bag of bytes or as a legitimate string container. If the latter, then
the API may be reasonable, but if you're using it as a bag-of-bytes,
then it feels like you're leaking an implementation detail into the
API.

> +       die(_("Failed to convert object from %s to %s"),
> +               from->name, to->name);

s/Failed/failed/

For people trying to diagnose this problem, would it be helpful to
present more information about the failed conversion, such as object
type and perhaps even its OID?

> diff --git a/object-file-convert.h b/object-file-convert.h
> @@ -0,0 +1,24 @@
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +                     const struct git_hash_algo *to, struct object_id *dest);

I suppose the function name is pretty much self-explanatory to those
familiar with the underlying concepts, but it might still be helpful
to add a comment explaining what the function does.

> +/*
> + * Convert an object file from one hash algorithm to another algorithm.
> + * Return -1 on failure, 0 on success.
> + */
> +int convert_object_file(struct strbuf *outbuf,
> +                       const struct git_hash_algo *from,
> +                       const struct git_hash_algo *to,
> +                       const void *buf, size_t len,
> +                       enum object_type type,
> +                       int gentle);