On Tue, Nov 29, 2011 at 01:56:28PM -0800, Bill Zaumen wrote: > The additional CRC (easily changed to whatever message digest one might > prefer) makes a malicious attack far more difficult: the modified file > has to have both the same SHA-1 hash (including the Git header) and > the same CRC (not including the Git header). Only if the attack actually involves creating a collision on both. But I think the important attacks bypass your CRC anyway. Consider this attack scenario: 1. Linus signs a tag (or a commit) and pushes it to kernel.org. 2. kernel.org gets hacked, and the attacker replaces an object with an evil colliding version[1]. 3. I clone from kernel.org, and run "git tag --verify". Git says it's OK, because the signature checks out, but I have a bogus object. How does your CRC help? If I understand your scheme correctly, kernel.org will have told me the CRC of all of the objects during the clone. But that isn't part of what Linus signed, so the attacker in step 2 could just as easily have overwritten kernel.org's crc file, and the signature will remain valid. [1] This is an over-simplification, of course. Because the only even remotely feasible attacks on sha1 are birthday attacks, not pre-image attacks, there is a step 0 in which the attacker generates a colliding pair, convinces Linus to commit it, and then waits. Which is probably really hard, but for the purposes of this discussion, we assume the attacker is capable of inserting a colliding object maliciously into a repo you will fetch from. Otherwise, the integrity of sha1 isn't an issue at all. > An efficient algorithm to do both simultaneously does not yet exist. > So, if we could generate a SHA-1 collision in one second, it would > presumably take billions of seconds (many decades of continuous > computation) to generate a SHA-1 hash with the same CRC, and well > before a year has elapsed, the original object should have been in all > the repositories, preventing a forged object from being inserted. Of > course, eventually you might need a real message digest. This is wrong, for two reasons. 1. The method for generating an object that collides in both sha-1 and CRC is not necessarily to generate a colliding sha-1 and then do a pre-image attack on the CRC. It is to do a birthday attack on the sha-1 and the CRC together. Which halves the bit-strength of the CRC to 16 bits (just as we can generally find collisions in 160-bit sha1s in 2^80). 16 bits isn't a lot to add when you are trying to fix a broken cryptosystem (it's not broken yet, obviously, but when it does get broken, will it be because computing reaches the 2^57 or so that sha1 is broken at, or will it be because a new weakness is found that drops sha1's bit-strength to something much lower?). This assumes that you can combine the two in a birthday attack. Certainly this analysis works against brute-force 2^80 sha1 collision attacks. But I haven't actually read the details of the sha1 attacks, so maybe some of the tweaking they do to get those results makes it harder. On the other hand, attacking CRC is far from hard, so I certainly wouldn't stake money that sha1 reseachers couldn't tweak their attacks in a way that also allows finding CRC collisions. You say that an algorithm to do both simultaneously does not yet exist. But is that because it's hard, or simply because nobody has bothered trying? Anyway, all of that is just reiterating that CRC should not be used as a security function. It can easily be replaced in your scheme by sha-256, which does have the desired properties. 2. Your attack seems to be "find the sha-1 collision, publish one of your colliding objects (i.e., the innocent-looking half), then try to break the CRC". And then you claim that by the time you find the CRC, everybody will already have the object. But wouldn't a smarter attack be to first find the collision, including the CRC, and only _then_ start the attack? Then nobody will have the object. Moreover, it's not true that after a year everyone will have the object. People still run "git clone" against kernel.org. Those repos do not have the object. > The weakness of a CRC as an integrity check is not an issue since it > is never used alone: it's use is more analogous to the few extra bits > added to a data stream when error-detecting codes are used. I used a > CRC in the initial implementation rather than a message digest because > it is faster, and because the initial goal was to get things to work > correctly. In any case, the patch does not eliminate any code in > which Git already does a byte-by-byte comparison. In cases where Git > currently assumes that two objects are the same because the SHA-1 > hashes are the same, the patch compares CRCs as an additional test. Right. I don't claim that your scheme makes git any weaker. I just claim that it fails to solve the problems people are actually concerned about, and it adds a lot of complexity while doing so. > Regarding your [Jeff's] second concern, "how does this alternative > digest have any authority?" there are two things to keep in mind. > First, it is a supplement to the existing digest. Right, but we are assuming that sha1 is broken. That's the whole security problem. So the existing digest is not worth much. > Second, any value of the CRC that is stored permanently (baring bugs, > in my implementation, of course) is computed locally - when a loose > object is created or when a pack file's index is created. At no point > is a CRC that was obtained from another repository trusted. While the > patch modifies Git so that it can send CRCs when using the git > protocol, these CRCs are never stored, but are instead used only for > cross checks. If one side or the other "lies", you get an error. But if I don't already have the object, then I have nothing to compare against. So when I get it from kernel.org, I have to simply accept that the object I'm getting is good, and write it into my object db. > BTW, regarding your [Jeff's] discussion about putting an additional > header in commit messages - I tried that. The existing versions of > Git didn't like it: barring a bug in my test code, it seems that Git > expects headers in commit messages to be in a particular order and > treats deviations from that to be an error. Yes, the header has to go at the end of the existing headers. But I don't see any reason that would be a problem for the scheme I described. > I even tried appending blank lines at the end of a commit, with spaces > and tabs encoding an additional CRC, and that didn't work either - at > least it never got through all the test programs, failing in places > like the tests involving notes. Yes, git will helpfully trim whitespace in commit messages. With the current code, you can hide arbitrary bytes in a commit message after a NUL, but don't do that. It's not guaranteed to stay that way, and the appropriate place to add new information is in a header. > In any case, you'd have to phase in such a change gradually, first > putting in the code to read the new header if it is there, and > subsequently (after ample time so that everyone is running a > sufficiently new version) enabling the code to create the new header. Current git should ignore headers that it doesn't understand. I haven't tested this, but Junio recently has been experimenting with gpg-signature lines in commits, and I'm pretty sure he checked that older gits properly ignore them. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html