On Thu, Sep 29, 2016 at 11:37 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > I'm playing with an early patch to make the default more dynamic. > Let's see how well it works in practice, but it looks fairly > promising. Let me test a bit more and send out an RFC patch.. Ok, this is *very* rough, and it doesn't actuall pass all the tests, and I didn't even try to look at why. But it passes the trivial smell-test, and in particular it actually makes mathematical sense... I think the patch can speak for itself, but the basic core is this section in get_short_sha1(): + if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) { + unsigned int expect_collision = 1 << (len * 2); + if (ds.nrobjects > expect_collision) + return SHORT_NAME_AMBIGUOUS; + } basically, what it says is that we will consider a sha1 ambiguous even if it was *technically* unique (that's the '!status' part of the test) if: - the length was 15 or less *and* - the number of objects we have is larger than the expected point where statistically we should start to expect to get one collision. That "expect_collision" math is actually very simple: each hex character adds four bits of range, but since we expect collisions at the square root of the maximum number of objects, we shift by just two bits per hex digits instead. The rest of the patch is a trivial change to just initialize the default short size to -1, and consider that to mean "enable the automatic size checking with a minimum of 7". And the trivial code to estimate the number of objects (which ignores duplicates between packs etc _entirely_). For the kernel, just the *math* right now actually gives 12 characters. For current git it actually seems to say that 8 is the correct number. For small projects, you'll still see 7. ANYWAY. This patch is on top of Jeff's patches in 'pu' (I think those are great regardless of this patch!), and as mentioned, it fails some tests. I suspect that the failures might be due to the abbrev_default being -1, and some other code finds that surprising now. But as mentioned, I didn't really even look at it. What do you think? It's actually a fairly simple patch and I really do think it makes sense and it seems to just DTRT automatically. Linus
cache.h | 1 + environment.c | 2 +- sha1_name.c | 21 ++++++++++++++++++++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index 6e33f2f..d2da6d1 100644 --- a/cache.h +++ b/cache.h @@ -1207,6 +1207,7 @@ struct object_context { #define GET_SHA1_TREEISH 020 #define GET_SHA1_BLOB 040 #define GET_SHA1_FOLLOW_SYMLINKS 0100 +#define GET_SHA1_AUTOMATIC 0200 #define GET_SHA1_ONLY_TO_DIE 04000 #define GET_SHA1_DISAMBIGUATORS \ diff --git a/environment.c b/environment.c index c1442df..fd6681e 100644 --- a/environment.c +++ b/environment.c @@ -16,7 +16,7 @@ int trust_executable_bit = 1; int trust_ctime = 1; int check_stat = 1; int has_symlinks = 1; -int minimum_abbrev = 4, default_abbrev = 7; +int minimum_abbrev = 4, default_abbrev = -1; int ignore_case; int assume_unchanged; int prefer_symlink_refs; diff --git a/sha1_name.c b/sha1_name.c index 3b647fd..8791ff3 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *); struct disambiguate_state { int len; /* length of prefix in hex chars */ + unsigned int nrobjects; char hex_pfx[GIT_SHA1_HEXSZ + 1]; unsigned char bin_pfx[GIT_SHA1_RAWSZ]; @@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds) if (strlen(de->d_name) != 38) continue; + + // We only look at the one subdirectory, and we assume + // each subdirectory is roughly similar, so each object + // we find probably has 255 other objects in the other + // fan-out directories + ds->nrobjects += 256; if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2)) continue; memcpy(hex + 2, de->d_name, 38); @@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p, open_pack_index(p); num = p->num_objects; + ds->nrobjects += num; last = num; while (first < last) { uint32_t mid = (first + last) / 2; @@ -426,6 +434,12 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1, for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds); } + if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) { + unsigned int expect_collision = 1 << (len * 2); + if (ds.nrobjects > expect_collision) + return SHORT_NAME_AMBIGUOUS; + } + return status; } @@ -458,14 +472,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data) int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) { int status, exists; + int flags = GET_SHA1_QUIETLY; + if (len < 0) { + flags |= GET_SHA1_AUTOMATIC; + len = 7; + } sha1_to_hex_r(hex, sha1); if (len == 40 || !len) return 40; exists = has_sha1_file(sha1); while (len < 40) { unsigned char sha1_ret[20]; - status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY); + status = get_short_sha1(hex, len, sha1_ret, flags); if (exists ? !status : status == SHORT_NAME_NOT_FOUND) {