Re: VCS comparison table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I've been following the git-vs-bzr discussion, and I'd like to ask a
> question (being new to both bzr and git). How does git disambiguate SHA1
> hash collisions? I think git has an alternative way to name revisions
> (can someone please explain it in more detail, I've seen <ref>~<n>
> mentioned only in passing in this thread). It seems to me collisions are
> a good argument in favour of having two independent naming schemes, so
> that you're not solely relying on hashes being unique.
> 
> A strong argument is that a global namespace based on hashes of data is
> ideal because the names are generated from the data being named, and
> therefore are immutable. Same data => same name for that data, always
> and forever, which is desirable when merging named data from many
> sources. But the converse isn't true: one name does not necessarily map
> to only that data. Have I misunderstood? Is this a problem?

Git avoids the problem by using a very large gryptographic hash.
While recent cryptographic advances mean that it will soon be barely
possible to deliberately construct two files with the same SHA hash,
absent massivecomputational investment, it can be considered a truly
random hash function.

That means that, given N objects in the repository, there are N * (N-1)
/ 2 pairs, each of which is a chance to collide.  The probability of
having a collision is thus roughly N * (N-1) / 2^161.  Which doesn't
reach 1/2 until N == 2^80.

Suppose a git repository has 100 objects added per second, continuously,
until the Sun turns into a red giant and swallows the Earth in 5-6
billion years' time (thus halting development).

That will generate 2^64 objects in the repository, and the probability
of there being a collison at that time is 2^128/2^161 = 2^-33, or 1 in
8 billion, small enough to be negligible.

For slightly more plausible repository sizes, the odds of having a
problem are even more miniscule.

And the benefits gained from having a completely deterministic
data => name mapping are considerable, well worth running such
a minisule risk of major problems.

(There is a road map for conversion to the even larger SHA-256 if
that ever proves necessary, but it's not likely to.)



As for the <ref>~<n> syntax, see "man git-rev-parse".  The parent of <rev>
is <rev>^.  If it has multiple parents, the othersare <rev>^2, <rev>^3, etc.

<rev>^^^^ is the great-great-grandparent of the named rev, and may also
be written <rev>~4.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]