Re* [PATCH 1/2] sha1_name: try to use same abbrev length when core.abbrevguard is specified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> I think what may be desirable is to honor the caller-supplied "len" a bit
> better.  If an object is uniquely identifiable with only 4-hexdigit today,
> and if the caller gives 7 as len and the guard is set to 3, we return 10
> hexdigits with the current code.  We should instead return 7 hexdigits in
> such a case, as that is in line with the "use 3 extra to give the
> disambiguation we find today a better chance to survive".

And here is an attempt to do so.  I have to admit that I didn't give it
too much thought, though, so please be careful when reviewing the logic.

-- >8 --
Subject: find_unique_abbrev(): honor caller-supplied "len" better

The caller of this function wants to ensure that the returned string is a
unique abbreviation of the object name, and at least len characters long.
When "len" is sufficiently short and we cannot ensure uniqueness with only
"len" bytes, we returned minimally unique prefix (i.e. if you dropped the
last character, there would be two or more objects that share that same
prefix as their names in the repository).

An earlier change introduced core.abbrevguard configuration with a
realization that a short prefix that is unique today may not stay unique
forever, as new objects are added to the repository. When "len" is shorter
than the length necessary to ensure uniqueness today, instead of returning
a string that is only one character longer than the longest ambiguous
prefix, we wanted to add that many extra characters to ensure uniqueness
for longer time.

However, the code forgot that "len" may be sufficiently long.  If an
object is uniquely identifiable with only 4-hexdigit today, and if the
caller gives 7 as len and the guard is set to 3, we returned 10 hexdigits,
which was 3 characters longer than necessary.  We should instead return 7
hexdigits in such a case, as that is in line with the original intention
of using 3 extra hexdigits to give the disambiguation we find today a
better chance to survive.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 sha1_name.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 709ff2e..0f81581 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -193,6 +193,23 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	return status;
 }
 
+/*
+ * The caller wants a unique abbreviation of the full object name in
+ * "sha1" that is at least "len" characters long.
+ *
+ * (1) If sha1 identifies an existing object, there must be no other
+ *     object that shares the returned string as the prefix of its
+ *     name;
+ *
+ * (2) If no object with the given object name exists, there must be
+ *     no object that has the returned string as the prefix of its
+ *     name.
+ *
+ * Usually we try to return as short a string as possible, but the
+ * core.abbrevguard configuration may tell us to use at least that
+ * many characters more than necessary to make the result unique,
+ * in order to keep it unique a bit longer.
+ */
 const char *find_unique_abbrev(const unsigned char *sha1, int len)
 {
 	int status, exists;
@@ -202,6 +219,13 @@ const char *find_unique_abbrev(const unsigned char *sha1, int len)
 	memcpy(hex, sha1_to_hex(sha1), 40);
 	if (len == 40 || !len)
 		return hex;
+	len -= unique_abbrev_extra_length;
+	if (len <= 0)
+		len = 1;
+	/*
+	 * Try to see how short a prefix we can feed to get
+	 * the desired unique hit
+	 */
 	while (len < 40) {
 		unsigned char sha1_ret[20];
 		status = get_short_sha1(hex, len, sha1_ret, 1);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]