Re: Re* [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified

Namhyung Kim <namhyung@xxxxxxxxx> · Thu, 10 Mar 2011 20:52:46 +0900

2011-03-10 (ë), 01:19 -0800, Junio C Hamano:
> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> > I think what may be desirable is to honor the caller-supplied "len" a bit
> > better.  If an object is uniquely identifiable with only 4-hexdigit today,
> > and if the caller gives 7 as len and the guard is set to 3, we return 10
> > hexdigits with the current code.  We should instead return 7 hexdigits in
> > such a case, as that is in line with the "use 3 extra to give the
> > disambiguation we find today a better chance to survive".
> 
> And here is an attempt to do so.  I have to admit that I didn't give it
> too much thought, though, so please be careful when reviewing the logic.
> 

What if the unique length is greater than or equal to the given length?
For instance the unique length is 7 and the caller gives 7 and the guard
is 3. What do you want to return, 7 or 10? How about the unique length
of 8?

I think the meaning of the guard is somewhat vague. When this feature
was considered in LKML at first, Linus just wanted to change the default
length of commit abbreviation to 12 by making it user-configurable. [1]
And this is the same situation what I tried to tell you in the previous
email.

So I think it would be better to choose the output length using only
caller-given length and the guard length if it guarantees the uniqueness
today. It'll be simple, consistent and expected behavior IMHO.

Thanks.

> -- >8 --
> Subject: find_unique_abbrev(): honor caller-supplied "len" better
> 
> The caller of this function wants to ensure that the returned string is a
> unique abbreviation of the object name, and at least len characters long.
> When "len" is sufficiently short and we cannot ensure uniqueness with only
> "len" bytes, we returned minimally unique prefix (i.e. if you dropped the
> last character, there would be two or more objects that share that same
> prefix as their names in the repository).
> 
> An earlier change introduced core.abbrevguard configuration with a
> realization that a short prefix that is unique today may not stay unique
> forever, as new objects are added to the repository. When "len" is shorter
> than the length necessary to ensure uniqueness today, instead of returning
> a string that is only one character longer than the longest ambiguous
> prefix, we wanted to add that many extra characters to ensure uniqueness
> for longer time.
> 
> However, the code forgot that "len" may be sufficiently long.  If an
> object is uniquely identifiable with only 4-hexdigit today, and if the
> caller gives 7 as len and the guard is set to 3, we returned 10 hexdigits,
> which was 3 characters longer than necessary.  We should instead return 7
> hexdigits in such a case, as that is in line with the original intention
> of using 3 extra hexdigits to give the disambiguation we find today a
> better chance to survive.
> 
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  sha1_name.c |   24 ++++++++++++++++++++++++
>  1 files changed, 24 insertions(+), 0 deletions(-)
> 
> diff --git a/sha1_name.c b/sha1_name.c
> index 709ff2e..0f81581 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -193,6 +193,23 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  	return status;
>  }
>  
> +/*
> + * The caller wants a unique abbreviation of the full object name in
> + * "sha1" that is at least "len" characters long.
> + *
> + * (1) If sha1 identifies an existing object, there must be no other
> + *     object that shares the returned string as the prefix of its
> + *     name;
> + *
> + * (2) If no object with the given object name exists, there must be
> + *     no object that has the returned string as the prefix of its
> + *     name.
> + *
> + * Usually we try to return as short a string as possible, but the
> + * core.abbrevguard configuration may tell us to use at least that
> + * many characters more than necessary to make the result unique,
> + * in order to keep it unique a bit longer.
> + */
>  const char *find_unique_abbrev(const unsigned char *sha1, int len)
>  {
>  	int status, exists;
> @@ -202,6 +219,13 @@ const char *find_unique_abbrev(const unsigned char *sha1, int len)
>  	memcpy(hex, sha1_to_hex(sha1), 40);
>  	if (len == 40 || !len)
>  		return hex;
> +	len -= unique_abbrev_extra_length;
> +	if (len <= 0)
> +		len = 1;

Anyway, how about

	  if (len < MINIMUM_ABBREV)
		  len = MINIMUM_ABBREV;

> +	/*
> +	 * Try to see how short a prefix we can feed to get
> +	 * the desired unique hit
> +	 */
>  	while (len < 40) {
>  		unsigned char sha1_ret[20];
>  		status = get_short_sha1(hex, len, sha1_ret, 1);

-- 
Regards,
Namhyung Kim

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html