Re: RFC: Another proposed hash function transition plan

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 6 Mar 2017 11:22:17 -0800

On Mon, Mar 6, 2017 at 10:39 AM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
>
> I think "nohash" can be explained in 2 points:

I do think that that was my least favorite part of the suggestion. Not
just "nohash", but all the special "hash" lines too.

I would honestly hope that the design should not be about "other
hashes". If you plan your expectations around the new hash being
broken, something is wrong to begin with.

I do wonder if things wouldn't be simpler if the new format just
included the SHA1 object name in the new object. Put it in the
"header" line of the object, so that every time you look up an object,
you just _see_ the SHA1 of that object. You can even think of it as an
additional protection.

Btw, the multi-collision attack referenced earlier does _not_ work for
an iterated hash that has a bigger internal state than the final hash.
Which is actually a real argument against sha-256: the internal state
of sha-256 is 256 bits, so if an attack can find collisions due to
some weakness, you really can then generate exponential collisions by
chaining a linear collision search together.

But for sha3-256 or blake2, the internal hash state is larger than the
final hash, so now you need to generate collisions not in the 256
bits, but in the much larger search space of the internal hash space
if you want to generate those exponential collisions.

So *if* the new object format uses a git header line like

    "blob <size> <sha1>\0"

then it would inherently contain that mapping from 256-bit hash to the
SHA1, but it would actually also protect against attacks on the new
hash. In fact, in particular for objects with internal format that
differs between the two hashing models (ie trees and commits which to
some degree are higher-value targets), it would make attacks really
quite complicated, I suspect.

And you wouldn't need those "hash" or "nohash" things at all. The old
SHA1 would simply always be there, and cheap to look up (ie you
wouldn't have to unpack the whole object).

Hmm?

                   Linus