Re: Hash algorithm analysis

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 18 Sep 2018 09:50:17 -0700

On Tue, Sep 18, 2018 at 8:18 AM Joan Daemen <jda@xxxxxxxxxxx> wrote:
>
> 3) The relatively large state in the sponge construction increases the generic strength against attacks when the input contains redundancy or
> has a certain form. For instance, if the input is restricted to be text in ASCII (such as source code), then the collision-resistance grows
> higher than the nominal 2^{c/2}. Such an effect does not exist with narrow-pipe Merkle-Damgård. (This may be what Linus had intuitively in mind.)

Answering to just this part:

No, what I had in mind was literally just exactly the kind of attack
that SHA1 broke for - attacking the internal state vector directly,
and not paying any penalty for it, because the stat size is the same
as the final hash size.

The length extension attack is just the simplest and most trivial
version of that kind of attack - because the internal state vector
*is* the result, and you just continue using it.

But that trivial length extension thing not the real problem, it's
just the absolutely simplest symptom of the real problem.

I think that the model where the internal state of the hash is the
same width as the final result is simply broken. It was what broke
SHA1, and that problem is shared with SHA2.

"Length extension" is just the simplest way to say "broken by design", imho.

Because the length extension attack is just the most trivial attack,
but it isn't the fundamental problem. It was just the first and the
cheapest attack found, but it was also the most special-cased and
least interesting. You need to have a very special case (with that
secret at the beginning etc) to make the pure length extension attack
interesting. And git has no secrets, so in that sense "length
extension" by itself is totally immaterial. But the basic problem of
internal hash size obviously wasn't.

So I would say that length extension is a direct result of the _real_
problem, which is that the hash exposes _all_ of the internal data.

That is what makes length extension possible - because you can just
continue from a known state, and there is absolutely nothing hidden -
and yes, that's a really easy special case where you don't even need
to actually break the hash at all.

But I argue that it's _also_ one big part of what made SHAttered
practical, and I think the underlying problem is exactly the same.
When the internal state is the same size as the hash, you can attack
the internal state itself for basically the same cost as attacking the
whole hash.

So you can pick-and-choose the weakest point.

Which is basically exactly what SHAttered did. No, it wasn't the
trivial "just add to the end", but it used the exact same underlying
weakness as one part of the attack.

*This* is why I dislike SHA2. It has basically the exact same basic
weakness that we already know SHA1 fell for. The hashing details are
different, and hopefully that means that there aren't the same kind of
patterns that can be generated to do the "attack the internal hash
state" part, but I don't understand why people seem to ignore that
other fundamental issue.

Something like SHA-512/256 would have been better, but I think almost
nobody does that in hardware, which was one of the big advantages of
plain SHA2.

The main reason I think SHA2 is acceptable is simply that 256 bits is
a lot. So even if somebody comes up with a shortcut that weakens it by
tens of bits, nobody really cares. Plus I'm obviously not a
cryptographer, so I didn't feel like I was going to fight it a lot.

But yes, I'd have probably gone with any of the other alternatives,
because I think it's a bit silly that we're switching hashes to
another hash that has (at least in part) the *exact* same issue as the
one people call broken.

(And yes, the hashing details are different, so it's "exactly the
same" only wrt that internal state part - not the bitpattern finding
part that made the attack on the internal state much cheaper. Real
cryptographers obviously found that "figure out the weakness of the
hashing" to be the more interesting and novel part over the trivial
internal hash size part).

That said..

The real reason I think SHA2 is the right choice was simply that there
needs to be a decision, and none of the choices were *wrong*.
Sometimes just the _act_ of making a decision is more important than
_what_ the decision is.

And hey, it is also likely that the reason _I_ get hung up on just the
size of the internal state is that exactly because I am _not_ a
cryptographer, that kind of high-level stuff is the part I understand.
When you start talking about why the exact rules of Merkle–Damgård
constructions work, my eyes just glaze over.

So I'm probably - no, certainly - myopic and looking at only one part
of the issue to begin with.

The end result is that I argued for more bits in the internal state
(and apparently wide vs narrow is the technical term), and I would
have seen parallel algorithms as a bonus for the large-file case. None
of which argued for SHA2.

But see above on why I think SHA2 is if not *the* right choice, at
least *a* right choice.

                    Linus