On Sep 25, 2016, at 18:39, Linus Torvalds wrote:
The kernel, these days, is at roughly 5 million objects, and while the
seven hex digits are still often enough for uniqueness (and git will
always add digits *until* it is unique), it's long been at the point
where I tell people to do
git config --global core.abbrev 12
because even though git will extend the seven hex digits until the
object name is unique, that only reflects the *current* situation in
the repository. With 5 million objects and a very healthy growth rate,
a 7-8 hex digit number that is unique today is not necessarily unique
a month or two from now, and then it gets annoying when a commit
message has a short git ID that is no longer unique when you go back
and try to figure out what went wrong in that commit.
On Sep 25, 2016, at 20:46, Junio C Hamano wrote:
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
I can just keep reminding kernel maintainers and developers to update
their git config, but maybe it would be a good idea to just admit
that
the defaults picked in 2005 weren't necessarily the best ones
possible, and those could be bumped up a bit?
I am not quite sure how good any new default would be, though. Just
like any timeout is not long enough for somebody, growing projects
will eventually hit whatever abbreviation length they start with.
This made me curious what the situation is really like. So I crunched
some data.
Using a recent clone of $korg/torvalds/linux:
$ git rev-parse --verify d597639e203
error: short SHA1 d597639e203 is ambiguous.
fatal: Needed a single revision
So the kernel already has 11-character "short" SHA1s that are
ambiguous. Is a core.abbrev setting of 12 really good enough?
Here are the stats on the kernel's repository:
Ambiguous length 11 (but not at length 12) info:
prefixes: 2
0 (with 1 or more commit disambiguations)
Ambiguous length 10 (but not at length 11) info:
prefixes: 12
3 (with 1 or more commit disambiguations)
0 (with 2 or more commit disambiguations)
Ambiguous length 9 (but not at length 10) info:
prefixes: 186
43 (with 1 or more commit disambiguations)
1 (with 2 or more commit disambiguations)
0 (with 3 or more disambiguations)
Ambiguous length 8 (but not at length 9) info:
prefixes: 2723
651 (with 1 or more commit disambiguations)
40 (with 2 or more commit disambiguations)
1 (with 3 or more disambiguations)
maxambig: 3 (there is 1 of them)
Ambiguous length 7 (but not at length 8) info:
prefixes: 41864
9842 (with 1 or more commit disambiguations)
680 (with 2 or more commit disambiguations)
299 (with 3 or more disambiguations)
maxambig: 3 (there are 299 of them)
The "maxambig" value is the maximum number of disambiguations for any
single prefix at that prefix length. So for prefixes of length 7
there are 299 that disambiguate into 3 objects.
Just out of curiosity, generating stats on the Git repository gives:
Ambiguous length 8 (but not at length 9) info:
prefixes: 7
3 (with 1 or more commit disambiguations)
2 (with 2 or more commit disambiguations)
0 (with 3 or more disambiguations)
Ambiguous length 7 (but not at length 8) info:
prefixes: 87
36 (with 1 or more commit disambiguations)
3 (with 2 or more commit disambiguations)
0 (with 3 or more disambiguations)
Running the stats on $github/gitster/git produces some ambiguous
length 9 prefixes (one of which contains a commit disambiguation).
--Kyle