On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote: > On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote: > > On 01/12/2014 10:11 AM, Mark Lee wrote: > > > Perhaps I'm not strong enough in mathematics but I'd like to know how > > > possible md5 collisions can be weaponized. From what I see, the idea > > > would be to modify a binary such that it contains malicious code > > > (without changing the md5sum). Since most security packages contain a > > > number of compilation tests and md5 hashes vary significantly with > > > slight modifications, I'd like to know how these collisions can be used > > > to hijack a system. If one has to build a binary that doesn't even > > > encompass the functionality of the binary it's trying to mimic, wouldn't > > > that severely decrease the effectiveness of a hash collision? > > > > The problem with MD5 is that it's possible to find two different > > messages A and B such that MD5(A) = MD5(B). The birthday attack [1] > > makes it possible to find collisions for 128-bit hashes (like MD5) in > > about 2^64 operations, but there are MD5-specific attacks that make it > > even easier for MD5. So... it takes a bit of time to find a collision, > > but not too much time. It can be done in about a day with PS3's [2]. > > > > For my explanation, I'll ignore the MD5-specific attacks and just use > > the birthday attack since it's easier to understand. > > > > The attacker starts out with two files: > > > > 1. The official copy of TrueCrypt. > > > > 2. A malicious copy of TrueCrypt. > > > > The malicious change would be something simple. For example, they could > > find where the master encryption key is copied into a buffer: > > > > ... > > MOV [master_key + ECX], EAX // EAX holds master key bits > > ... > > > > And change it so that it's always zero: > > > > ... > > MOV [master_key + ECX], 0 // Now the master key is set to zero > > ... > > > > Only one instruction is different, but now TrueCrypt encrypts everything > > with a null key. > > > > From these two copies copies of TrueCrypt, they generate 2^64 variants > > of each. They can do this by re-ordering instructions, changing which > > registers are used for which variables, re-ordering ELF segments, or > > just appending random strings to the file (you can append stuff to an > > ELF binary and it will still run fine). > > > > Now the attacker has: > > > > 1. 2^64 files that are different, but behave exactly like the > > non-malicious official copy of TrueCrypt. > > > > 2. 2^64 files that are different, but behave exactly like the malicious > > copy of TrueCrypt (that uses zero keys or whatever). > > > > If MD5 was a good hash function, each of the files would hash to a > > random 128-bit value. You should now be able to see that at least one > > file from the "non-malicious" set of files probably has the same MD5 > > hash as one of the files from the "malicious" set of files. > > > > There are 2^64 * 2^64 = 2^128 ways of pairing a file from the > > non-malicious pile with one from the malicious pile. The chance of any > > one pair colliding is 1 in 2^128, so there's a good chance that at least > > one of the pairs collide (it's not exactly 100% since the pairs are not > > mutually independent). > > > > To find the collision quickly, the attacker can just sort one of the > > sets of files by hash value, then for each file in the other set, use a > > binary search to see if any file has the same hash. Ignoring log(n) > > factors, that would take on the order of 2^64 operations. > > > > So... Now the attacker has two files with the same MD5 hash. One that > > behaves like the official copy of TrueCrypt, and another that uses null > > keys. > > > > So, they give the *good* copy to the package maintainer, who laboriously > > checks that it's not backdoored (as if that ever happens). The package > > maintainer thinks everything is fine, since it behaves exactly like > > TrueCrypt and there's nothing fishy about the file (they probably won't > > notice some instructions being reordered). The file might even be the > > same as the one from the TrueCrypt website, if the attacker is > > collaborating with the TrueCrypt developers. > > > > The package maintainer thinks everything is good, so they hard-code the > > MD5 hash of their good copy into the PKGBUILD. > > > > Now, when most users install TrueCrypt, they will download the *good* > > copy, check the MD5 hash, it will match, and everything will be good. > > However, when the adversary sees that his target (e.g. a terrorist, or > > activist/journalist living in a non-free country) is downloading > > TrueCrypt, they will silently replace the file with with the *bad* copy. > > When the MD5 gets checked, it will match, and the user will be > > encrypting their files with null keys. > > > > In total, it took the attacker on the order of 2^66 operations to find > > the colliding files. You'd need a good budget (like the NSA's, or a > > large company's) to do that, but if you take advantage of the > > MD5-specific attacks too, you can do it for much cheaper. > > > > Of course, this doesn't just apply to TrueCrypt, it could be applied to > > any other software that's checked only with an MD5 hash. To backdoor a > > web server, for example, you could change a constant used in array index > > bounds checking to introduce a buffer overflow vulnerability. > > > > [1] https://en.wikipedia.org/wiki/Birthday_attack > > > > [2] http://www.win.tue.nl/hashclash/rogue-ca/ > > > > Salutations! > > Excellent explanation but I still don't see the guarantee that md5 hash > can be cracked. I don't see any application of the pidgeon hole > principle which would show two files (one malicious with the null key > and one normal) would have to share the same md5 hash. Can you elaborate > any further? > > Regarding, switching out the md5 hashes, I agree with the switch to > sha256 for PKGBUILDS. > > Regards, > Mark Salutations! I'd just like the point out that with the birthday attack, we'd have 100% confidence when we have 2^128+1 different hashes (using the pidgeon hole theorem). Anyone care to calculate a # pair vs. probability of collision for md5 hash? Regards, Mark -- Mark Lee <mark@xxxxxxxxxxxx>
Attachment:
signature.asc
Description: This is a digitally signed message part