On Thu, Oct 28, 2010 at 11:28 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Yes. The default of 7 (I think) comes from fairly early in git > development, when seven hex digits was a lot (it covers about 250+ > million hash values). Back then I thought that 65k revisions was a lot > (it was what we were about to hit in BK), and each revision tends to > be about 5-10 new objects or so, so a million objects was a big > number. > > These days, the kernel isn't even the largest git project, and even > the kernel has about 220k revisions (_much_ bigger than the BK tree > ever was) and we are approaching two million objects. At that point, > seven hex digits is still unique for a lot of them, but when we're > talking about just two orders of magnitude difference between number > of objects and the hash size, there _will_ be hash collisions. It's no > longer even close to unrealistic - it happens all the time. Hmm. In fact, in the kernel, we currently have about twelve thousand objects that end up having collisions in 7 hex digits. Even in the old historical BK kernel tree, we have over a thousand objects that collide (each bucket in both cases gets just two objects, there are as of yet no multiple collisions, which is what you'd expect with a good hash). See with git rev-list --objects --all | cut -c1-7 | sort | uniq -dc and in fact git itself has a few collisions (but currently just 44 objects ending up sharing 22 SHA1 buckets in 7 digits). With each digit, you'd expect the collisions to decrease by a factor of 16, and that is indeed exactly what happens. For my current kernel tree I get: - 7 digits: 5823 buckets with duplicates (ie 11646 objects that aren't unique) - 8: 406 - 9: 30 - 10: 1 - 11: 0 so 12 hex digits is indeed pretty safe for the kernel, and is likely to remain so until the kernel history grows by a factor of 16. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html