[PATCH] find_unique_abbrev(): honor caller-supplied "len" better

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The caller of this function wants to ensure that the returned string is a
unique abbreviation of the object name, and at least "len" characters
long.  When "len" is too short to ensure uniqueness with only that many
characters, it returns minimally unique prefix (i.e. if you dropped the
last character, there would be two or more objects that share that same
prefix as their names in the repository).

An earlier change introduced core.abbrevguard configuration with a
realization that a short prefix that is unique today may not stay unique
forever, as new objects are added to the repository. When "len" is shorter
than the length necessary to ensure uniqueness today, instead of returning
a string that is only one character longer than the longest ambiguous
prefix, we want to add that many extra characters (in addition to the "one
character that is absolutely needed to make it unique today") to ensure
uniqueness for longer time.

The code however forgot that the function may be called with a "len" that
is long enough.  If an object is uniquely identifiable with only 4 leading
characters today, and if the caller gives 7 as len and the guard is set to
3, it returned 10 hexdigits, which was 3 characters longer than necessary.
We should instead return 7 leading characters in such a case, as that is
in line with the original intention of using 3 characters more than
absolutely necessary to give the disambiguation we find today a better
chance to survive.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---

 * A three-line patch, with 35 new comment lines and four paragraph commit
   log message, to fix a stupid thinko that I noticed during a separate
   discussion with Namhyung-ssi, who wanted to add an option to ensure
   uniqueness of the truncated commit name in the human-readable blame
   output.

 sha1_name.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 4a226ad..62950aa 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -191,6 +191,26 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	return status;
 }
 
+/*
+ * The caller wants a unique abbreviation of the full object name in
+ * "sha1" that is at least "len" characters long.  A string is a unique
+ * abbrevation of the full object name when:
+ *
+ * (1) there is no other object that shares the returned string as the
+ *     prefix of its name, if sha1 identifies an existing object; or
+ *
+ * (2) there is no object that has the returned string as the prefix
+ *     of its name, if sha1 does not identify any existing object.
+ *
+ * If there exist two or more objects that share the same N characters
+ * at the beginning of their object names, N+1 leading characters is
+ * sufficient to make the abbreviation unique today.  As the number of
+ * objects in the repository grows, however, such an abbreviation
+ * won't stay unique forever.  core.abbrevguard configuration variable
+ * can be used to add extra G characters when ensuring the uniqueness
+ * of the abbreviation, i.e. making N+1+G characters the minimum in
+ * order to keep the result unique a bit longer, instead of just N+1.
+ */
 const char *find_unique_abbrev(const unsigned char *sha1, int len)
 {
 	int status, exists;
@@ -200,6 +220,24 @@ const char *find_unique_abbrev(const unsigned char *sha1, int len)
 	memcpy(hex, sha1_to_hex(sha1), 40);
 	if (len == 40 || !len)
 		return hex;
+	/*
+	 * Try to see how short a prefix we can feed to get a unique
+	 * hit.  When len is sufficiently long, we may find the
+	 * absolute minimum abbreviation that is a lot shorter than
+	 * len, so we try from (len - unique_abbrev_extra_length), to
+	 * avoid the final addition of u_a_e_l to the result getting
+	 * longer than necessary.  E.g. an object that is ambiguous
+	 * today with only 3 but is unique with 4 leading characters,
+	 * it should yield 7 character result if the caller called us
+	 * with len=7 with u_a_e_l=3 (or shorter), so we start from 4
+	 * in such a case.  When len is shorter than the minimum
+	 * required to make the result unique, the loop counts up and
+	 * finds the absolute minimum (just one character longer than
+	 * ambiguous truncation) and then we add u_a_e_l to it.
+	 */
+	len -= unique_abbrev_extra_length;
+	if (len <= 0)
+		len = 1;
 	while (len < 40) {
 		unsigned char sha1_ret[20];
 		status = get_short_sha1(hex, len, sha1_ret, 1);
-- 
1.7.4.1.373.g37629

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]