Re: About git and the use of SHA-1

"Geoffrey Irving" <irving@xxxxxxx> · Tue, 29 Apr 2008 08:34:11 -0700

On Mon, Apr 28, 2008 at 12:34 PM, Daniel Barkalow <barkalow@xxxxxxxxxxxx> wrote:
> On Mon, 28 Apr 2008, Henrik Austad wrote:
>
>  > Hi list!
>  >
>  > As far as I have gathered, the SHA-1-sum is used as a identifier for commits,
>  > and that is the primary reason for using sha1.  However, several places
>  > (including the google tech-talk featuring Linus himself) states that the id's
>  > are cryptographically secure.
>  >
>  > As discussed in [1], SHA-1 is not as secure as it once was (and this was in
>  > 2005), and I'm wondering - are there any plans for migrating to another
>  > hash-algorithm? I.e. SHA-2, whirlpool..
>
>  No. The cryptographic security we care about is that it's impractical to
>  come up with another set of content that hashes to the same value as a
>  given set of content. The known attacks on SHA-1 (and more broken earlier
>  hashes in the same general class) only allow the attacker to produce two
>  files that will collide. Now, it's true that this would allow somebody to
>  produce a commit where some people see the "good" blob and some people see
>  the "evil" blob, but (a) the "good" blob contains some large chunk of
>  random data, which is a major red flag by itself, and (b) all of these
>  people have to be taking data from the attacker.
>
>  If somebody gives you some source, and it's got some large random chunk in
>  it, and the behavior of the object depends on the content of this chunk,
>  and it's unspecified where this chunk comes from, you should be aware
>  that they might be able to swap this chunk for a different chunk. But such
>  a file is pretty blatantly malicious anyway.

This argument is invalid, since the use of git is not limited to
source code.  People
can and do store unreadable binary data in git, and unless you are completely
sure that no one would ever care about the security of that data in a
way that can
be attacked with a single collision, git should be secure about those as well.

For example, I just converted a 20 GB repository to git which, among
other things,
contains pdf files of my tax returns.  I have looked them over, but I
have not opened
them in a hex editor and looked them over at the binary level, and I
don't think git
should expect me to.

Incidentally, git was the only version control system I tried except
for subversion that
didn't choke on that repository.  Mercurial looked at my file renames
and expanded
the size past 45 GB before I killed it, I had to fix a several bugs in
the bazaar conversion
scripts before I realized it was just too slow, and svk turns out to
be even more like
the Antichrist than subversion itself is (mirroring N repository
copies requires an N-fold
increase in size).

Geoffrey
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html