Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

Jeff King <peff@xxxxxxxx> · Mon, 15 Oct 2012 14:34:38 -0400

On Mon, Oct 15, 2012 at 07:47:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto <gitter.spiros@xxxxxxxxx> wrote:
> > Very clear analysis. Well written. Perhaps is it the time to update
> > http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
> >
> > Hope useful
> >
> > http://www.schneier.com/crypto-gram-1210.html
> 
> This would be concerning if the Git security model would break down if
> someone found a SHA1 collision, but it really wouldn't.
> 
> It's one thing to find *a* collision, it's quite another to:
> 
>  1. Find a collision for the sha1 of harmless.c which I know you use,
>     and replace it with evil.c.
> 
>  2. Somehow make evil.c compile so that it actually does something
>     useful and nefarious, and doesn't just make the C compiler puke.
> 
>     If finding one arbitrary collision costs $43K in 2021 dollars
>     getting past this point is going to take quite a large multiple of
>     $43K.

There are easier attacks than that if you can hide arbitrary bytes
inside a file. It's hard with C source code. The common one in hash
collision detection circles is to put invisible cruft into binary
document formats like PDF or Postscript. Git blobs themselves do not
have such an invisible place to put it, but you might be storing a
format that does.

But worse, git _commits_ have such an invisible portion. We calculate
the sha1 over the full commit, but we tend to show only the portion up
to the first NUL byte. I used that horrible trick in my "choose your own
sha1 prefix" patch. However, we could mitigate that by checking for
embedded NULs in git-fsck.

>  3. Somehow inject the new evil object into your repository, or
>     convince you to re-clone it / clone it from somewhere you usually
>     wouldn't.

Yeah, this part is the kicker. With the commit NUL trick, you would make
a useful commit and then ask somebody to pull it, and then later replace
it with a commit pointing to an arbitrary tree. But if we assume we can
detect that easily (which I think we can), we are left with replacing
binary blobs that have hidden bits. And most projects do not take many
such blobs, and the result is that you could only replace the contents
of that particular blob, not an arbitrary part of the tree.

> It would be very interesting to see an analysis that deals with some
> actual Git-related security scenarios, instead of something that just
> assumes that if someone finds *any* SHA1 collision the sky is going to
> fall.

I agree that most of the analysis is overblown. Having read the analysis
Schneier pointed to, it actually is not that bad. We have 5-10 years to
get to a point where it's really expensive and extremely complex to
mount a single attack.

That doesn't seem like an emergency to me. It sounds like something we
should be thinking about (and we are). The simplest thing would be to
wait for a moment when it makes sense to break compatibility (e.g., we
decide that "git 2.0" is here, and everybody will have to rewrite to
take advantage of new features, so we can jump to sha-2). We can also
start building sha-2 history that references sha-1 history. That would
mean everybody needs to upgrade their git, but that is not a problem
that requires 5-10 years of foresight and planning.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html