Re: SHA1 collisions found

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 23 Feb 2017 10:29:09 -0800

On Thu, Feb 23, 2017 at 10:10 AM, Joey Hess <id@xxxxxxxxxx> wrote:
>
> It would cost 6500 CPU years + 100 GPU years to generate valid colliding
> git objects using the methods of the paper's authors. That might be cost
> effective if it helped get a backdoor into eg, the kernel.

I still think it also needs to be interesting enough data, not just
random noise that is then trivial to find with automated tools.

Because for the kernel, it's not just that an attacker needs to do the
CPU time. Yes, first he needs the technical resources to just do just
the attack and create the situation you described.

But then he *also* needs to build up the social capital to get the end
result pulled into the tree (ie if he depends on the hidden spaces, he
needs somebody to actually do a git pull, not just apply a patch).

.. and if we then have a tool that then finds the problem trivially
(ie "git fsck"), he's not only wasted all those technical resources,
he's also burned his identity.

>>  (b) we can probably easily add some extra sanity checks to the opaque
>> data we do have, to make it much harder to do the hiding of random
>> data that these attacks pretty much always depend on.
>
> For example, git fsck does warn about a commit message with opaque
> data hidden after a NUL. But, git show/merge/pull give no indication
> that something funky is going on when working with such commits.

I do agree that we might want to do some of the fsck checks
particularly at fetch time. That's when doing checks is both relevant
and cheap.

So we could do the opaque data checks, but we could/should probably
also add the attack pattern ("disturbance vectors") checks.

And the thing is, adding those checks is really cheap, and basically
makes the whole attack vector pointless against git.

Because unlike some "signing a pdf" attack, git doesn't fundamentally
depend on the SHA1 as some kind of absolute security.  If we have the
minimal machinery in git to just notice the attack, the attack
essentially goes away. Attackers can waste infinite amounts of CPU
time, and if it's cheap for us to notice, it completely disarms all
that attack work.

Again, I'm not arguing that people shouldn't work on extending git to
a new (and bigger) hash. I think that's a no-brainer, and we do want
to have a path to eventually move towards SHA3-256 or whatever.

But I'm very definitely arguing that the current attack doesn't
actually sound like it really even _matters_, because it should be so
easy to mitigate against.

                   Linus