Re: Hash algorithm analysis

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 11 Jun 2018 17:45:07 -0700

On Mon, Jun 11, 2018 at 4:27 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> >
> > And no, I'm not a cryptographer. But honestly, length extension
> > attacks were how both md5 and sha1 were broken in practice, so I'm
> > just going "why would we go with a crypto choice that has that known
> > weakness? That's just crazy".
>
> What do you think about Johannes's summary of this being a non-issue for
> Git in
> https://public-inbox.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/
> ?

I agree that the fact that git internal data is structured and all
meaningful (and doesn't really have ignored state) makes it *much*
harder to attack the basic git objects, since you not only have to
generate a good hash, the end result has to also *parse* and there is
not really any hidden non-parsed data that you can use to hide the
attack.

And *if* you are using git for source code, the same is pretty much
true even for the blob objects - an attacking object will stand out
like a sore thumb in "diff" etc.

So I don't disagree with Johannes in that sense: I think git does
fundamentally tend to have some extra validation in place, and there's
a reason why the examples for both the md5 and the sha1 attack were
pdf files.

That said, even if git internal ("metadata") objects like trees and
commits tend to not have opaque parts to them and are thus pretty hard
to attack, the blob objects are still an attack vector for projects
that use git for non-source-code (and even source projects do embed
binary files - including pdf files - even though they might not be "as
interesting" to attack). So you do want to protect those too.

And hey, protecting the metadata objects is good just to protect
against annoyances. Sure, you should always sanity check the object at
receive time anyway, but even so, if somebody is able to generate a
blob object that hashes to the same hash as a metadata object (ie tree
or commit), that really could be pretty damn annoying.

And the whole "intermediate hashed state is same size as final hash
state" just _fundamentally_ means that if you find a weakness in the
hash, you can now attack that weakness without having to worry about
the attack being fundamentally more expensive.

That's essentially what SHAttered relied on. It didn't rely on a
secret and a hash and length extension, but it *did* rely on the same
mechanism that a length extension attack relies on, where you can
basically attack the state in the middle with no extra cost.

Maybe some people don't consider it a length extension attack for that
reason, but it boils down to much the same basic situation where you
can attack the internal hash state and cause a state collision. And
you can try to find the patterns that then cause that state collision
when you've found a weakness in the hash.

With SHA3 or k12, you can obviously _also_ try to attack the hash
state and cause a collision, but because the intermediate state is
much bigger than the final hash, you're just making things *way*
harder for yourself if you try that.

              Linus