[Will send a reply to Jeff's comment from last night with some clarifications and explanations later]. > What I'm thinking is whether it's possible to decouple two sha-1 roles > in git, as object identifier and digest, separately. Each sha-1 > identifies an object and an extra set of digests on the "same" object. > Object database is extended to store all these new digests and mapping > between sha-1 and them. When we need to verify an object, given an > sha-1, we rehash that object and check the result digest with the ones > linked to the sha-1. The patch I created (at least, a reasonable chunk of the code) kind of does that: it is very easy to change the CRC to whatever message digest one wants. I used a CRC primarily because I had the impression that people were very concerned about speed, but it is easy to change that to the message digest of your choice. In any case, it might be a good starting point if you want to try something in a different direction. Basically, when you create a loose object, in addition to getting a SHA-1 ID, you get a message digest that gets stored as well (in a separate file). When you index a pack file, you get an IDX file containing the SHA-1 ID plus a corresponding MDS file containing the message digest. Index-pack calculates the SHA-1 value from the object stored in the pack file, and the (additional) message digest is computed at the same time using the same data. Commands like verify-pack check both the IDX file and the MDS file for consistency with the matching pack file. The new message digest (the CRC in the patch) is used only in cases where a repository is being altered (e.g., a loose object or pack file is being created or a fetch, push, or pull operation) or some explicit verification operation is running (e.g., git verify-pack). Adding an additional header to the commit message is a good idea (I had actually tried that, but something went wrong, although one of you suggested what the problem might have been --- I can try again if there is some interest in pursuing that). It might be worth pointing out that you can use the SHA-1 hash of the contents of objects (e.g., without the Git object header) as an additional digest: I tried a test using two 128-byte files with the same MD5 hash, differing past the 20th byte, and deleted the first four bytes of each. With those bytes deleted, the hash collision went away. I doubt if there is a known efficient algorithm that can generate a hash collision for two files and for two other files that differ from the first set by deleting N bytes from both. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html