Re: Re* [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Namhyung Kim <namhyung@xxxxxxxxx> writes:

> 2011-03-10 (ë), 01:19 -0800, Junio C Hamano:
>> Junio C Hamano <gitster@xxxxxxxxx> writes:
>> 
>> > I think what may be desirable is to honor the caller-supplied "len" a bit
>> > better.  If an object is uniquely identifiable with only 4-hexdigit today,
>> > and if the caller gives 7 as len and the guard is set to 3, we return 10
>> > hexdigits with the current code.  We should instead return 7 hexdigits in
>> > such a case, as that is in line with the "use 3 extra to give the
>> > disambiguation we find today a better chance to survive".
>> 
>> And here is an attempt to do so.  I have to admit that I didn't give it
>> too much thought, though, so please be careful when reviewing the logic.
>> 
>
> What if the unique length is greater than or equal to the given length?
> For instance the unique length is 7 and the caller gives 7 and the guard
> is 3. What do you want to return, 7 or 10? How about the unique length
> of 8?

Perhaps I didn't state it clearly enough.

Conceptually, f-u-a works in two stages.  First find out what the shortest
prefix that can be used to uniquely identify the given object _now_.
Then make sure that the result it returns is at least "len" characters
long.  So if you code it naively, it would have looked like this:

	for (try = 0; try < 40; try++)
		if ("do first try characters name an object uniquely?")
			break; /* happy */
	if (try < len)
		return the first "len" characters;
	else
        	return the first "try" characters;

The original (before the abbrevguard patch) uses an obvious optimization
to start the trial from "len" because it didn't matter if a prefix shorter
than "len" is already unique---it will return "len" anyway.

The "guard" change was buggy.  At the conceptual level, it should have
changed the first phase of f-u-a, by dropping the "_now_" part. Given that
we will be adding more and more objects, only one extra character to make
things unique is not sufficient and having "guard" extra characters would
keep the result unique hopefully a little longer.  It should have done
this:

	for (try = 0; try < 40; try++)
		if ("do first try characters name an object uniquely?")
			break; /* happy */
       +/* try has only one extra character to make it unique */
       +if (guard)
       +	try += (guard - 1); /* could be try += guard */
	if (try < len)
		return the first "len" characters;
	else
        	return the first "try" characters;

Back to your original question, if the object needs 7 characters (iow, its
6-character prefix is shared with another object) to make it unique, and
if the caller gives you 7 as length, without guard, you would get 7
because you are saying 1 extra character is sufficiently unique by not
setting the guard.  If you set the guard to 3, you are saying that 1 extra
character is not sufficient to guard it against future collisions and you
want 3 extra instead, so you should be getting either 9 (if you define
that "guard" overrides the default "1 more characters to make it minimally
unique") or 10 (if you define that "guard" specifies how many extra
characters to use to make it safer).

If the object needs 8 characters to make it minimally unique and you give
7 as "len", you should never get a 7-character string back.  f-u-a is not
"truncate to 'len'" function but its primary job is to ensure uniqueness.
Without guard, you should get 8 (minimally unique), and with guard set to
3, you should get either 10 or 11 (again, depending on the definition of
"guard").

> I think the meaning of the guard is somewhat vague. When this feature
> was considered in LKML at first, Linus just wanted to change the default
> length of commit abbreviation to 12 by making it user-configurable. [1]

Linus's "Let's make the default 12 for our project by adding a way to
customize it" merely states "right now we know 7 is not enough and 12 is
plenty for some time, and likely to be enough to give the same length".
It does not mean he suddenly start favoring uniform length over
uniqueness.  Uniqueness requirement is a given.

And that discussion is all about what "len" to be given when calling
f-u-a; it doesn't have anything to do with what the guard is about.

Setting "len" to 12 would make sure your output you get today would have
at least 12 characters, but it doesn't have any effect on how long an
abbreviated unique object name will stay unique in the future.

When you give the name of an object you have today to somebody else by
shortening it with f-u-a, you can ensure that the object name has a better
chance of staying unique a bit longer by setting the guard to a non-zero
value even if you use shorter "len".

The "len" and "guard" are different concepts, have different purposes, and
implemented as two different variables, of course.

> So I think it would be better to choose the output length using only
> caller-given length and the guard length if it guarantees the uniqueness
> today. It'll be simple, consistent and expected behavior IMHO.

Sorry, but I see nothing consistent nor expected about what you are
saying. Perhaps it is "expected" if you don't understand what the guard is
about. It is about the future viability of the output you produce today.

If you don't care about uniqueness tomorrow, you won't even need any
guard.  You just give an interface to specify "len" and be done with it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]