Namhyung Kim <namhyung@xxxxxxxxx> writes: > 2011-03-10 (ë), 01:19 -0800, Junio C Hamano: >> Junio C Hamano <gitster@xxxxxxxxx> writes: >> >> > I think what may be desirable is to honor the caller-supplied "len" a bit >> > better. If an object is uniquely identifiable with only 4-hexdigit today, >> > and if the caller gives 7 as len and the guard is set to 3, we return 10 >> > hexdigits with the current code. We should instead return 7 hexdigits in >> > such a case, as that is in line with the "use 3 extra to give the >> > disambiguation we find today a better chance to survive". >> >> And here is an attempt to do so. I have to admit that I didn't give it >> too much thought, though, so please be careful when reviewing the logic. >> > > What if the unique length is greater than or equal to the given length? > For instance the unique length is 7 and the caller gives 7 and the guard > is 3. What do you want to return, 7 or 10? How about the unique length > of 8? Perhaps I didn't state it clearly enough. Conceptually, f-u-a works in two stages. First find out what the shortest prefix that can be used to uniquely identify the given object _now_. Then make sure that the result it returns is at least "len" characters long. So if you code it naively, it would have looked like this: for (try = 0; try < 40; try++) if ("do first try characters name an object uniquely?") break; /* happy */ if (try < len) return the first "len" characters; else return the first "try" characters; The original (before the abbrevguard patch) uses an obvious optimization to start the trial from "len" because it didn't matter if a prefix shorter than "len" is already unique---it will return "len" anyway. The "guard" change was buggy. At the conceptual level, it should have changed the first phase of f-u-a, by dropping the "_now_" part. Given that we will be adding more and more objects, only one extra character to make things unique is not sufficient and having "guard" extra characters would keep the result unique hopefully a little longer. It should have done this: for (try = 0; try < 40; try++) if ("do first try characters name an object uniquely?") break; /* happy */ +/* try has only one extra character to make it unique */ +if (guard) + try += (guard - 1); /* could be try += guard */ if (try < len) return the first "len" characters; else return the first "try" characters; Back to your original question, if the object needs 7 characters (iow, its 6-character prefix is shared with another object) to make it unique, and if the caller gives you 7 as length, without guard, you would get 7 because you are saying 1 extra character is sufficiently unique by not setting the guard. If you set the guard to 3, you are saying that 1 extra character is not sufficient to guard it against future collisions and you want 3 extra instead, so you should be getting either 9 (if you define that "guard" overrides the default "1 more characters to make it minimally unique") or 10 (if you define that "guard" specifies how many extra characters to use to make it safer). If the object needs 8 characters to make it minimally unique and you give 7 as "len", you should never get a 7-character string back. f-u-a is not "truncate to 'len'" function but its primary job is to ensure uniqueness. Without guard, you should get 8 (minimally unique), and with guard set to 3, you should get either 10 or 11 (again, depending on the definition of "guard"). > I think the meaning of the guard is somewhat vague. When this feature > was considered in LKML at first, Linus just wanted to change the default > length of commit abbreviation to 12 by making it user-configurable. [1] Linus's "Let's make the default 12 for our project by adding a way to customize it" merely states "right now we know 7 is not enough and 12 is plenty for some time, and likely to be enough to give the same length". It does not mean he suddenly start favoring uniform length over uniqueness. Uniqueness requirement is a given. And that discussion is all about what "len" to be given when calling f-u-a; it doesn't have anything to do with what the guard is about. Setting "len" to 12 would make sure your output you get today would have at least 12 characters, but it doesn't have any effect on how long an abbreviated unique object name will stay unique in the future. When you give the name of an object you have today to somebody else by shortening it with f-u-a, you can ensure that the object name has a better chance of staying unique a bit longer by setting the guard to a non-zero value even if you use shorter "len". The "len" and "guard" are different concepts, have different purposes, and implemented as two different variables, of course. > So I think it would be better to choose the output length using only > caller-given length and the guard length if it guarantees the uniqueness > today. It'll be simple, consistent and expected behavior IMHO. Sorry, but I see nothing consistent nor expected about what you are saying. Perhaps it is "expected" if you don't understand what the guard is about. It is about the future viability of the output you produce today. If you don't care about uniqueness tomorrow, you won't even need any guard. You just give an interface to specify "len" and be done with it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html