On Thu, Feb 23, 2017 at 10:46 AM, Jeff King <peff@xxxxxxxx> wrote: >> >> So I agree with you that we need to make git check for the opaque >> data. I think I was the one who brought that whole argument up. > > We do already. I'm aware of the fsck checks, but I have to admit I wasn't aware of 'transfer.fsckobjects'. I should turn that on myself. Or maybe git should just turn it on by default? At least the per-object fsck costs should be essentially free compared to the network costs when you just apply them to the incoming objects. I also do think that it would be good to check for the disturbance vectors at receive time (and fsck). Not necessarily interesting during normal operations. And in particular, while the *kernel* doesn't generally have critical opaque blobs, other projects do. Things like firmware images etc are open to attack, and crazy people put ISO images in repositories etc. So I don't think this discussion should focus exclusively on the git metadata. It is likely much easier to replace a binary blob than it is to replace a commit or tree (or a source file that has to go through a compiler). And for many projects, that would be a bad thing. > It's not an identical prefix, but I think collision attacks generally > are along the lines of selecting two prefixes followed by garbage, and > then mutating the garbage on both sides. That would "work" in this case > (modulo the fact that git would complain about the NUL). I think this particular attack depended on an actual identical prefix, but I didn't go back to the paper and check. But the attacks tend to very much depend on particular input bit patterns that have very particular effects on the resulting intermediate hash, and those bit patterns are specific to the hash and known. So a very powerful defense is to just look for those bit patterns in the objects, and just warn about them. Those patterns don't tend to exist in normal inputs anyway, but particularly if you just warn, it's a heads-ups that "ok, something iffy is going on" And as mentioned, a cheap "something iffy is going on" thing is basically a death sentence to SCM attacks. The whole _point_ of an SCM is that it isn't about a one-time event, but about continuous history. That also fundamentally means that a successful attack needs to work over time, and not be detectable. In contrast, many other uses of hashes are "one-time" events. If you use a hash to validate a single piece of data from a source that you wouldn't otherwise trust, it's a one-time "all or nothing" trust situation. And the attack surface is very different for those "one-time" vs "trust over time" cases. If you can get a bank to trust a session one time, you can empty a bank account and live on a paradise island for the rest of your life. It doesn't matter if it gets detected or not after-the-fact. But if you can fool a SCM one time, insert your code, and it gets detected next week, you didn't actually do anything useful. You only burned yourself. See the difference? One-time vs having a continual interaction makes a *fundamntal* difference in game theory. Linus