Re: About git and the use of SHA-1

Henrik Austad <henrikau@xxxxxxxxxxxxxx> · Mon, 28 Apr 2008 23:29:14 +0200

On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote:
> On Mon, 28 Apr 2008, Henrik Austad wrote:
> > Hi list!
> >
> > As far as I have gathered, the SHA-1-sum is used as a identifier for
> > commits, and that is the primary reason for using sha1.  However, several
> > places (including the google tech-talk featuring Linus himself) states
> > that the id's are cryptographically secure.
> >
> > As discussed in [1], SHA-1 is not as secure as it once was (and this was
> > in 2005), and I'm wondering - are there any plans for migrating to
> > another hash-algorithm? I.e. SHA-2, whirlpool..
>
> No. The cryptographic security we care about is that it's impractical to
> come up with another set of content that hashes to the same value as a
> given set of content. The known attacks on SHA-1 (and more broken earlier
> hashes in the same general class) only allow the attacker to produce two
> files that will collide. Now, it's true that this would allow somebody to
> produce a commit where some people see the "good" blob and some people see
> the "evil" blob, but (a) the "good" blob contains some large chunk of
> random data, which is a major red flag by itself, and (b) all of these
> people have to be taking data from the attacker.

yes, I can see that point, but I was thinking more along the line of:

1) clone repo
2) add malicious code
3) add a huge block of comment, ifdef-block etc somewhere obscure in the code 
and keep adding random data untill hash matches a well-known release.
4) publish repo, or even worse, change central repo

Most users, and probably a lot of developers never browse through the *entire* 
archive looking for this, and as long as the hash checks out - why would you? 
Yes, it would probably be discovered soon enough, but take the linux kernel 
as an example - if you get, say 100 infected machines due to this, what would 
this do to the reputation of the kernel?

> If somebody gives you some source, and it's got some large random chunk in
> it, and the behavior of the object depends on the content of this chunk,
> and it's unspecified where this chunk comes from, you should be aware
> that they might be able to swap this chunk for a different chunk. But such
> a file is pretty blatantly malicious anyway.

True, but this actually means you have to verify *everything*, even though the 
hash checks out.

but yes, I can see your point, and it would most likely be infeasible to 
generate a collision using this approach, and changing to another 
hashfunction would probably not add much. basically I was just curious and 
played ahead with the idea.

Thanks for the answer though :)
-- 
mvh Henrik Austad
Attachment:
signature.asc

Description: This is a digitally signed message part.