On Thu, Mar 2, 2017 at 5:50 PM, Mike Hommey <mh@xxxxxxxxxxxx> wrote: > > What if the "object version" is a hash of the content (as opposed to > header + content like the normal git hash)? It doesn't actually matter for that attack. The concept of the attack is actually fairly simple: generate a lot of collisions in the first hash (and they outline how you only need to generate 't' serial collisions and turn them into 2**t collisions), and then just check those collisions against the second hash. If you have enough collisions in the first one, having a collision in the second one is inevitable just from the birthday rule. Now, *In*practice* that attack is not very easy to do. Each collision is still hard to generate. And because of the git object rules (the size has to match), it limits you a bit in what collisions you generate. But the fact that git requires the right header can be considered just a variation on the initial state to SHA1, and then the additional requirement might be as easy as just saying that your collision generation function just always needs to generate a fixed-size block (which could be just 64 bytes - the SHA1 blocking size). So assuming you can arbitrarily generate collisions (brute force: 2**80 operations) you could make the rule be that you generate something that starts with one 64-byte block that matches the git rules: "blob 6454\0"..pad with repeating NUL bytes.. and then you generate 100 pairs of 64-byte SHA1 collisions (where the first starts with the initial value of that fixed blob prefix, the next with the state after the first block, etc etc). Now you can generate 2**100 different sequences that all are exactly 6464 bytes (101 64-byte blocks) and all have the same SHA1 - all all share that first fixed 64-byte block. You needed "only" on the order of 100 * 2**80 SHA1 operations to do that in theory. An since you have 2**100 objects, you know that you will have a likely birthday paradox even if your secondary hash is 200 bits long. So all in all, you generated a collision in on the order of 2**100 operations. So instead of getting the security of "160+200" bits, you only got 200 bits worth of real collision resistance. NOTE!! All of the above is very much assuming "brute force". If you can brute-force the hash, you can completely break any hash. The input block to SHA1 is 64 bytes, so by definition you have 512 bits of data to play with, and you're generating a 160-bit output: there is no question what-so-ever that you couldn't generate any hash you want if you brute-force things. The place where things like having a fixed object header can help is when the attack in question requires some repeated patterns. For example, if you're not brute-forcing things, your attack on the hash will likely involve using very particular patterns to change a number of bits in certain ways, and then combining those particular patterns to get the hash collision you wanted. And *that* is when you may need to add some particular data to the middle to make the end result be a particular match. But a brute-force attack definitely doesn't need size changes. You can make the size be pretty much anything you want (modulo really small inputs, of course - a one-byte input only has 256 different possible hashes ;) if you have the resources to just go and try every possible combination until you get the hash you wanted. I may have overly simplified the paper, but that's the basic idea. Linus