Re: Minimum git commit abbrev length (Was Re: -tip: origin tree build failure (was: [GIT PULL] ext4 update) for 2.6.37)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 28, 2010 at 11:28 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Yes. The default of 7 (I think) comes from fairly early in git
> development, when seven hex digits was a lot (it covers about 250+
> million hash values). Back then I thought that 65k revisions was a lot
> (it was what we were about to hit in BK), and each revision tends to
> be about 5-10 new objects or so, so a million objects was a big
> number.
>
> These days, the kernel isn't even the largest git project, and even
> the kernel has about 220k revisions (_much_ bigger than the BK tree
> ever was) and we are approaching two million objects. At that point,
> seven hex digits is still unique for a lot of them, but when we're
> talking about just two orders of magnitude difference between number
> of objects and the hash size, there _will_ be hash collisions. It's no
> longer even close to unrealistic - it happens all the time.

Hmm. In fact, in the kernel, we currently have about twelve thousand
objects that end up having collisions in 7 hex digits. Even in the old
historical BK kernel tree, we have over a thousand objects that
collide (each bucket in both cases gets just two objects, there are as
of yet no multiple collisions, which is what you'd expect with a good
hash). See with

  git rev-list --objects --all | cut -c1-7 | sort | uniq -dc

and in fact git itself has a few collisions (but currently just 44
objects ending up sharing 22 SHA1 buckets in 7 digits).

With each digit, you'd expect the collisions to decrease by a factor
of 16, and that is indeed exactly what happens. For my current kernel
tree I get:

 - 7 digits: 5823 buckets with duplicates (ie 11646 objects that aren't unique)
 - 8: 406
 - 9: 30
 - 10: 1
 - 11: 0

so 12 hex digits is indeed pretty safe for the kernel, and is likely
to remain so until the kernel history grows by a factor of 16.

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]