Re: [PATCH v4 5/6] get_short_oid: sort ambiguous objects by type, then SHA-1

Jeff King <peff@xxxxxxxx> · Thu, 10 May 2018 11:22:21 -0400

On Thu, May 10, 2018 at 12:43:02PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Now we'll instead show:
> 
>     hint:   e8f2650052 tag v2.17.0
>     hint:   e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name
>     hint:   e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src
>     hint:   e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries
>     hint:   e8f2093055 tree
>     hint:   e8f25a3a50 tree
>     hint:   e8f28d537c tree
>     hint:   e8f2cf6ec0 tree
>     hint:   e8f21d02f7 blob
>     hint:   e8f21d577c blob
>     hint:   e8f2867228 blob
>     hint:   e8f2a35526 blob

I said already that I like the output, but this time I'll actually read
the code. ;)

It all looks good to me, with the exception of a few documentation nits
I'll mention below.

> A note on the implementation: Derrick rightly pointed out[1] that
> we're bending over backwards here in get_short_oid() to first
> de-duplicate the list, and then emit it, but could simply do it in one
> step.
> 
> The reason for that is that oid_array_for_each_unique() doesn't
> actually require that the array be sorted by oid_array_sort(), it just
> needs to be sorted in some order that guarantees that all objects with
> the same ID are adjacent to one another, which (barring a hash
> collision, which'll be someone else's problem) the sort_ambiguous()
> function does.

If we were to go this route, I think it would make sense to add a
sorting function pointer to "struct oid_array". I'm OK with punting on
it for now, though.

> diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt
> index b0c11f868d..94b529722c 100644
> --- a/Documentation/technical/api-oid-array.txt
> +++ b/Documentation/technical/api-oid-array.txt
> @@ -35,13 +35,18 @@ Functions
>  	Free all memory associated with the array and return it to the
>  	initial, empty state.
>  
> +`oid_array_for_each`::
> +	Iterate over each element of the list, executing the callback
> +	function for each one. Does not sort the list, so any custom
> +	hash order is retained. If the callback returns a non-zero
> +	value, the iteration ends immediately and the callback's
> +	return is propagated; otherwise, 0 is returned.
> +
>  `oid_array_for_each_unique`::
> -	Efficiently iterate over each unique element of the list,
> -	executing the callback function for each one. If the array is
> -	not sorted, this function has the side effect of sorting it. If
> -	the callback returns a non-zero value, the iteration ends
> -	immediately and the callback's return is propagated; otherwise,
> -	0 is returned.
> +	Iterate over each unique element of the list in sort order ,
> +	but otherwise behaves like `oid_array_for_each`. If the array
> +	is not sorted, this function has the side effect of sorting
> +	it.

Extra space in "sort order ,".

I'd probably say "sorted order", but that might be a matter of
preference.

Also, your parallel verb tenses don't agree. ;) It should be "Iterate
... but otherwise behave", not "behaves".

> +	/*
> +	 * Between object types show tags, then commits, and finally
> +	 * trees and blobs.
> +	 *
> +	 * The object_type enum is commit, tree, blob, tag, but we
> +	 * want tag, commit, tree blob. Cleverly (perhaps too
> +	 * cleverly) do that with modulus, since the enum assigns 1 to
> +	 * commit, so tag becomes 0.
> +	 */
> +	a_type_sort = a_type % 4;
> +	b_type_sort = b_type % 4;
> +	return a_type_sort > b_type_sort ? 1 : -1;

This is amusingly clever, and should be very efficient. I'm glad there's
a comment at least, though.

-Peff