Re: SHA1 collisions found

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 23 Feb 2017 11:09:32 -0800

On Thu, Feb 23, 2017 at 10:46 AM, Jeff King <peff@xxxxxxxx> wrote:
>>
>> So I agree with you that we need to make git check for the opaque
>> data. I think I was the one who brought that whole argument up.
>
> We do already.

I'm aware of the fsck checks, but I have to admit I wasn't aware of
'transfer.fsckobjects'. I should turn that on myself.

Or maybe git should just turn it on by default? At least the
per-object fsck costs should be essentially free compared to the
network costs when you just apply them to the incoming objects.

I also do think that it would be good to check for the disturbance
vectors at receive time (and fsck). Not necessarily interesting during
normal operations.

And in particular, while the *kernel* doesn't generally have critical
opaque blobs, other projects do. Things like firmware images etc are
open to attack, and crazy people put ISO images in repositories etc.

So I don't think this discussion should focus exclusively on the git metadata.

It is likely much easier to replace a binary blob than it is to
replace a commit or tree (or a source file that has to go through a
compiler). And for many projects, that would be a bad thing.

> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).

I think this particular attack depended on an actual identical prefix,
but I didn't go back to the paper and check.

But the attacks tend to very much depend on particular input bit
patterns that have very particular effects on the resulting
intermediate hash, and those bit patterns are specific to the hash and
known.

So a very powerful defense is to just look for those bit patterns in
the objects, and just warn about them. Those patterns don't tend to
exist in normal inputs anyway, but particularly if you just warn, it's
a heads-ups that "ok, something iffy is going on"

And as mentioned, a cheap "something iffy is going on" thing is
basically a death sentence to SCM attacks.

The whole _point_ of an SCM is that it isn't about a one-time event,
but about continuous history. That also fundamentally means that a
successful attack needs to work over time, and not be detectable.

In contrast, many other uses of hashes are "one-time" events.  If you
use a hash to validate a single piece of data from a source that you
wouldn't otherwise trust, it's a one-time "all or nothing" trust
situation.

And the attack surface is very different for those "one-time" vs
"trust over time" cases. If you can get a bank to trust a session one
time, you can empty a bank account and live on a paradise island for
the rest of your life. It doesn't matter if it gets detected or not
after-the-fact.

But if you can fool a SCM one time, insert your code, and it gets
detected next week, you didn't actually do anything useful. You only
burned yourself.

See the difference? One-time vs having a continual interaction makes a
*fundamntal* difference in game theory.

                Linus