Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 29, 2016 at 5:57 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Actually, all the other cases seem to be "parse a SHA1 with a known
> length", so they really don't have a negative length.  So this seems
> ok, and is easier to verify than the "what all contexts might use
> DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
> function so it's easy to check it locally in sha1_name.c.

Here's my original patch with just a tiny change that instead of
starting the automatic guessing at 7 each time, it starts at
"default_automatic_abbrev", which is initialized to 7.

The difference is that if we decide that "oh, that was too small, need
to repeat", we also update that "default_automatic_abbrev" value, so
that we won't start at the number that we now know was too small.

So it still loops over the abbrev values, but now it only loops a
couple of times.

I actually verified the performance impact by doing

      time git rev-list --abbrev-commit HEAD > /dev/null

on the kernel git tree, and it does actually matter. With my original
patch, we wasted a noticeable amount of time on just the extra
looping, with this it's down to the same performance as just doing it
once at init time (it's about 12s vs 9s on my laptop).

So this patch may actually be "production ready" apart from the fact
that some tests still fail (at least t2027-worktree-list.sh) because
of different short SHA1 cases.

                     Linus
 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 26 +++++++++++++++++++++++++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f28..d2da6d186 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,6 +1207,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7c..1003c96ea 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			// We only look at the one subdirectory, and we assume
+			// each subdirectory is roughly similar, so each object
+			// we find probably has 255 other objects in the other
+			// fan-out directories
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -380,6 +388,9 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
 	return 0;
 }
 
+// Why seven? That's our historical default before the automatic abbreviation
+static int default_automatic_abbrev = 7;
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -426,6 +437,14 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
+	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects > expect_collision) {
+			default_automatic_abbrev = len+1;
+			return SHORT_NAME_AMBIGUOUS;
+		}
+	}
+
 	return status;
 }
 
@@ -458,14 +477,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
+	if (len < 0) {
+		flags |= GET_SHA1_AUTOMATIC;
+		len = default_automatic_abbrev;
+	}
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]