Re: [PATCH v2 03/13] hex: introduce functions to print arbitrary hashes

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 16 Oct 2018 10:54:23 +0900

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> diff --git a/cache.h b/cache.h
> index d508f3d4f8..a13d14ce0a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1361,9 +1361,9 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
>  extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>  
>  /*
> - * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
> + * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
>   * and writes the NUL-terminated output to the buffer `out`, which must be at
> - * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
> + * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
>   * convenience.
>   *
>   * The non-`_r` variant returns a static buffer, but uses a ring of 4
> @@ -1371,10 +1371,13 @@ extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
>   *
>   *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
>   */
> -extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> -extern char *oid_to_hex_r(char *out, const struct object_id *oid);
> -extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
> -extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
> +char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
> +char *sha1_to_hex_r(char *out, const unsigned char *sha1);
> +char *oid_to_hex_r(char *out, const struct object_id *oid);
> +char *hash_to_hex_algo(const unsigned char *hash, int algo);	/* static buffer result! */
> +char *sha1_to_hex(const unsigned char *sha1);			/* same static buffer */
> +char *hash_to_hex(const unsigned char *hash);			/* same static buffer */
> +char *oid_to_hex(const struct object_id *oid);			/* same static buffer */

Even though in hex.c I see mixture of *_algo and *_algop helper
functions, I see only "algo" variants above.  Is it our official
stance to use primarily the integer index into the algo array when
specifying the hash, and when a caller into 'multi-hash' API happens
to have other things, it should use functions in 2/13 to convert it
to the canonical "int algo" beforehand?

I am not saying it is bad or good to choose the index into the algo
array as the primary way to specify the algorithm.  I think it is a
good idea to pick one and stick to it, and I wanted to make sure
that the choice we made here is clearly communicated to any future
developer who read this code.

> +char *sha1_to_hex(const unsigned char *sha1)
> +{
> +	return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
> +}
> +
> +char *hash_to_hex(const unsigned char *hash)
> +{
> +	return hash_to_hex_algo(hash, hash_algo_by_ptr(the_hash_algo));
>  }
>  
>  char *oid_to_hex(const struct object_id *oid)
>  {
> -	return sha1_to_hex(oid->hash);
> +	return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
>  }

Having said the above, seeing the use of hash_algo_by_ptr() here
makes me suspect if it makes more sense to use the algop as the
primary way to specify which algorithm the caller wants to use.
IOW, making the set of helpers in 02/13 to allow quering by name,
format-id, or the integer index and have them all return a pointer
to "const struct git_hash_algo".  Two immediate downsides I can see
is that it exposes the actual structure to the callers (but is it
really a problem?  Outside callers learn hash sizes etc. by accessing
its fields anyway without treating the algo struct as opaque.), and
passing an 8-byte pointer may be more costly than passing a small
integer index that ranges between 0 and 1 at most (assuming that
we'd only use SHA-1 and "the current NewHash" in the code).