Re: SHA1 collisions found

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 2 Mar 2017 18:19:07 -0800

On Thu, Mar 2, 2017 at 5:50 PM, Mike Hommey <mh@xxxxxxxxxxxx> wrote:
>
> What if the "object version" is a hash of the content (as opposed to
> header + content like the normal git hash)?

It doesn't actually matter for that attack.

The concept of the attack is actually fairly simple: generate a lot of
collisions in the first hash (and they outline how you only need to
generate 't' serial collisions and turn them into 2**t collisions),
and then just check those collisions against the second hash.

If you have enough collisions in the first one, having a collision in
the second one is inevitable just from the birthday rule.

Now, *In*practice* that attack is not very easy to do. Each collision
is still hard to generate. And because of the git object rules (the
size has to match), it limits you a bit in what collisions you
generate.

But the fact that git requires the right header can be considered just
a variation on the initial state to SHA1, and then the additional
requirement might be as easy as just saying that your collision
generation function just always needs to generate a fixed-size block
(which could be just 64 bytes - the SHA1 blocking size).

So assuming you can arbitrarily generate collisions (brute force:
2**80 operations) you could make the rule be that you generate
something that starts with one 64-byte block that matches the git
rules:

   "blob 6454\0"..pad with repeating NUL bytes..

and then you generate 100 pairs of 64-byte SHA1 collisions (where the
first starts with the initial value of that fixed blob prefix, the
next with the state after the first block, etc etc).

Now you can generate 2**100 different sequences that all are exactly
6464 bytes (101 64-byte blocks) and all have the same SHA1 - all all
share that first fixed 64-byte block.

You needed "only" on the order of 100 * 2**80 SHA1 operations to do
that in theory.

An since you have 2**100 objects, you know that you will have a likely
birthday paradox even if your secondary hash is 200 bits long.

So all in all, you generated a collision in on the order of 2**100 operations.

So instead of getting the security of "160+200" bits, you only got 200
bits worth of real collision resistance.

NOTE!! All of the above is very much assuming "brute force". If you
can brute-force the hash, you can completely break any hash. The input
block to SHA1 is 64 bytes, so by definition you have 512 bits of data
to play with, and you're generating a 160-bit output: there is no
question what-so-ever that you couldn't generate any hash you want if
you brute-force things.

The place where things like having a fixed object header can help is
when the attack in question requires some repeated patterns. For
example, if you're not brute-forcing things, your attack on the hash
will likely involve using very particular patterns to change a number
of bits in certain ways, and then combining those particular patterns
to get the hash collision you wanted.  And *that* is when you may need
to add some particular data to the middle to make the end result be a
particular match.

But a brute-force attack definitely doesn't need size changes. You can
make the size be pretty much anything you want (modulo really small
inputs, of course - a one-byte input only has 256 different possible
hashes ;) if you have the resources to just go and try every possible
combination until you get the hash you wanted.

I may have overly simplified the paper, but that's the basic idea.

               Linus