On Mon, Jan 08 2018, Jeff King jotted: > On Mon, Jan 08, 2018 at 05:20:29AM -0500, Jeff King wrote: > >> I.e., what if we did something like this: >> >> diff --git a/sha1_name.c b/sha1_name.c >> index 611c7d24dd..04c661ba85 100644 >> --- a/sha1_name.c >> +++ b/sha1_name.c >> @@ -600,6 +600,15 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) >> if (len == GIT_SHA1_HEXSZ || !len) >> return GIT_SHA1_HEXSZ; >> >> + /* >> + * A default length of 10 implies a repository big enough that it's >> + * getting expensive to double check the ambiguity of each object, >> + * and the chance that any particular object of interest has a >> + * collision is low. >> + */ >> + if (len >= 10) >> + return len; >> + > > Oops, this really needs to terminate the string in addition to returning > the length (so it was always printing 40 characters in most cases). The > correct patch is below, but it performs the same. > > diff --git a/sha1_name.c b/sha1_name.c > index 611c7d24dd..5921298a80 100644 > --- a/sha1_name.c > +++ b/sha1_name.c > @@ -600,6 +600,17 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len) > if (len == GIT_SHA1_HEXSZ || !len) > return GIT_SHA1_HEXSZ; > > + /* > + * A default length of 10 implies a repository big enough that it's > + * getting expensive to double check the ambiguity of each object, > + * and the chance that any particular object of interest has a > + * collision is low. > + */ > + if (len >= 10) { > + hex[len] = 0; > + return len; > + } > + > mad.init_len = len; > mad.cur_len = len; > mad.hex = hex; That looks much more sensible, leaving aside other potential benefits of MIDX. Given the argument Linus made in e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30) maybe we should add a small integer to the length for good measure, i.e. something like: if (len >= 10) { int extra = 2; /* or just 1? or maybe 0 ... */ hex[len + extra] = 0; return len + extra; } I tried running: git log --pretty=format:%h --abbrev=7 | perl -nE 'chomp; say length'|sort|uniq -c|sort -nr On several large repos, which forces something like the disambiguation we had before Linus's patch, on e.g. David Turner's 2015-04-03-1M-git.git test repo it's: 952858 7 44541 8 2861 9 168 10 17 11 2 12 And the default abbreviation picks 12. I haven't yet found a case where it's wrong, but if we wanted to be extra safe we could just add a byte or two to the SHA-1.