On Mon, 28 Aug 2006, David Lang wrote: > > just to double check. > > if you already have a file A in git with hash X is there any condition where a > remote file with hash X (but different contents) would overwrite the local > version? Nope. If it has the same SHA1, it means that when we receive the object from the other end, we will _not_ overwrite the object we already have. So what happens is that if we ever see a collision, the "earlier" object in any particular repository will always end up overriding. But note that "earlier" is obviously per-repository, in the sense that the git object network generates a DAG that is not fully ordered, so while different repositories will agree about what is "earlier" in the case of direct ancestry, if the object came through separate and not directly related branches, two different repos may obviously have gotten the two objects in different order. However, the "earlier will override" is very much what you want from a security standpoint: remember that the git model is that you should primarily trust only your _own_ repository. So if you do a "git pull", the new incoming objects are by definition less trustworthy than the objects you already have, and as such it would be wrong to allow a new object to replace an old one. So you have two cases of collision: - the inadvertent kind, where you somehow are very very unlucky, and two files end up having the same SHA1. At that point, what happens is that when you commit that file (or do a "git-update-index" to move it into the index, but not committed yet), the SHA1 of the new contents will be computed, but since it matches an old object, a new object won't be created, and the commit-or-index ends up pointing to the _old_ object. You won't notice immediately (since the index will match the old object SHA1, and that means that something like "git diff" will use the checked-out copy), but if you ever do a tree-level diff (or you do a clone or pull, or force a checkout) you'll suddenly notice that that file has changed to something _completely_ different than what you expected. So you would generally notice this kind of collision fairly quickly. In related news, the question is what to do about the inadvertent collision.. First off, let me remind people that the inadvertent kind of collision is really really _really_ damn unlikely, so we'll quite likely never ever see it in the full history of the universe. But _if_ it happens, it's not the end of the world: what you'd most likely have to do is just change the file that collided slightly, and just force a new commit with the changed contents (add a comment saying "/* This line added to avoid collision */") and then teach git about the magic SHA1 that has been shown to be dangerous. So over a couple of million years, maybe we'll have to add one or two "poisoned" SHA1 values to git. It's very unlikely to be a maintenance problem ;) - The attacker kind of collision because somebody broke (or brute-forced) SHA1. This one is clearly a _lot_ more likely than the inadvertent kind, but by definition it's always a "remote" repository. If the attacker had access to the local repository, he'd have much easier ways to screw you up. So in this case, the collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's _literally_ no different from the attacker just not having found a collision at all, but just using the object you already had (ie it's 100% equivalent to the "trivial" collision of the identical file generating the same SHA1). > what would happen if you ended up with two packs that both contained a file > with hash X but with different contents and then did a repack on them? (either > packs from different sources, or packs downloaded through some mechanism other > then the git protocol are two ways this could happen that I can think of) See above. The only _dangerous_ kind of collision is the inadvertent kind, but that's obviously also the very very unlikely kind. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html