Re: About git and the use of SHA-1

Andreas Ericsson <ae@xxxxxx> · Tue, 29 Apr 2008 08:38:37 +0200

Henrik Austad wrote:
On Monday 28 April 2008 21:34:50 Daniel Barkalow wrote:
On Mon, 28 Apr 2008, Henrik Austad wrote:
Hi list!

As far as I have gathered, the SHA-1-sum is used as a identifier for
commits, and that is the primary reason for using sha1.  However, several
places (including the google tech-talk featuring Linus himself) states
that the id's are cryptographically secure.

As discussed in [1], SHA-1 is not as secure as it once was (and this was
in 2005), and I'm wondering - are there any plans for migrating to
another hash-algorithm? I.e. SHA-2, whirlpool..
No. The cryptographic security we care about is that it's impractical to
come up with another set of content that hashes to the same value as a
given set of content. The known attacks on SHA-1 (and more broken earlier
hashes in the same general class) only allow the attacker to produce two
files that will collide. Now, it's true that this would allow somebody to
produce a commit where some people see the "good" blob and some people see
the "evil" blob, but (a) the "good" blob contains some large chunk of
random data, which is a major red flag by itself, and (b) all of these
people have to be taking data from the attacker.

yes, I can see that point, but I was thinking more along the line of:

1) clone repo
2) add malicious code
3) add a huge block of comment, ifdef-block etc somewhere obscure in the code 
and keep adding random data untill hash matches a well-known release.
4) publish repo, or even worse, change central repo

This depends greatly on git accepting objects with a colliding object-name,
which it doesn't. Once you have an object with a particular SHA1, it will
never get overwritten, ever, as git will believe it's about to do unnecessary
work. As such, you'd still have to create a new object, hashing to a new SHA1
and get that new object added to the kernel.

I think perhaps Andrew Morton and a few other "high brass" among the kernel
hackers can get away with pushing crud like that to Linus' public tree
(which is the de facto master copy of published kernel sources), but random
John Doe's such as you and me wouldn't stand a chance, as our patches would
get reviewed by someone who, at the end of the day, makes a living coding
Linux.

Most users, and probably a lot of developers never browse through the *entire* 
archive looking for this, and as long as the hash checks out - why would you? 
Yes, it would probably be discovered soon enough, but take the linux kernel 
as an example - if you get, say 100 infected machines due to this, what would 
this do to the reputation of the kernel?

That depends. If the source of it was Linus' public tree, that would not be
very good at all. If the source was a random tarball off a random webpage
or ftp site (which would be the same as fetching and, unverified, using an
unchecked git repository), I doubt it would matter much.

If somebody gives you some source, and it's got some large random chunk in
it, and the behavior of the object depends on the content of this chunk,
and it's unspecified where this chunk comes from, you should be aware
that they might be able to swap this chunk for a different chunk. But such
a file is pretty blatantly malicious anyway.

True, but this actually means you have to verify *everything*, even though the 
hash checks out.

Not really. What you need to verify is that
a) You cloned from somewhere you trust (kernel.org, fe)
b) The SHA1 of the commit you want to build from matches the SHA1 of the same
commit in the repository you originally cloned from.

Colliding objects can never enter a repository. Git is lazy and will reuse the
already existing colliding object with the same name instead.

--
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html