On Sun, 2014-01-12 at 16:37 -0500, Mark Lee wrote: > On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote: > > On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote: > > > On 01/12/2014 10:11 AM, Mark Lee wrote: > > > > Perhaps I'm not strong enough in mathematics but I'd like to know how > > > > possible md5 collisions can be weaponized. From what I see, the idea > > > > would be to modify a binary such that it contains malicious code > > > > (without changing the md5sum). Since most security packages contain a > > > > number of compilation tests and md5 hashes vary significantly with > > > > slight modifications, I'd like to know how these collisions can be used > > > > to hijack a system. If one has to build a binary that doesn't even > > > > encompass the functionality of the binary it's trying to mimic, wouldn't > > > > that severely decrease the effectiveness of a hash collision? > > > > > > The problem with MD5 is that it's possible to find two different > > > messages A and B such that MD5(A) = MD5(B). The birthday attack [1] > > > makes it possible to find collisions for 128-bit hashes (like MD5) in > > > about 2^64 operations, but there are MD5-specific attacks that make it > > > even easier for MD5. So... it takes a bit of time to find a collision, > > > but not too much time. It can be done in about a day with PS3's [2]. > > > > > > For my explanation, I'll ignore the MD5-specific attacks and just use > > > the birthday attack since it's easier to understand. > > > > > > The attacker starts out with two files: > > > > > > 1. The official copy of TrueCrypt. > > > > > > 2. A malicious copy of TrueCrypt. > > > > > > The malicious change would be something simple. For example, they could > > > find where the master encryption key is copied into a buffer: > > > > > > ... > > > MOV [master_key + ECX], EAX // EAX holds master key bits > > > ... > > > > > > And change it so that it's always zero: > > > > > > ... > > > MOV [master_key + ECX], 0 // Now the master key is set to zero > > > ... > > > > > > Only one instruction is different, but now TrueCrypt encrypts everything > > > with a null key. > > > > > > From these two copies copies of TrueCrypt, they generate 2^64 variants > > > of each. They can do this by re-ordering instructions, changing which > > > registers are used for which variables, re-ordering ELF segments, or > > > just appending random strings to the file (you can append stuff to an > > > ELF binary and it will still run fine). > > > > > > Now the attacker has: > > > > > > 1. 2^64 files that are different, but behave exactly like the > > > non-malicious official copy of TrueCrypt. > > > > > > 2. 2^64 files that are different, but behave exactly like the malicious > > > copy of TrueCrypt (that uses zero keys or whatever). > > > > > > If MD5 was a good hash function, each of the files would hash to a > > > random 128-bit value. You should now be able to see that at least one > > > file from the "non-malicious" set of files probably has the same MD5 > > > hash as one of the files from the "malicious" set of files. > > > > > > There are 2^64 * 2^64 = 2^128 ways of pairing a file from the > > > non-malicious pile with one from the malicious pile. The chance of any > > > one pair colliding is 1 in 2^128, so there's a good chance that at least > > > one of the pairs collide (it's not exactly 100% since the pairs are not > > > mutually independent). > > > > > > To find the collision quickly, the attacker can just sort one of the > > > sets of files by hash value, then for each file in the other set, use a > > > binary search to see if any file has the same hash. Ignoring log(n) > > > factors, that would take on the order of 2^64 operations. > > > > > > So... Now the attacker has two files with the same MD5 hash. One that > > > behaves like the official copy of TrueCrypt, and another that uses null > > > keys. > > > > > > So, they give the *good* copy to the package maintainer, who laboriously > > > checks that it's not backdoored (as if that ever happens). The package > > > maintainer thinks everything is fine, since it behaves exactly like > > > TrueCrypt and there's nothing fishy about the file (they probably won't > > > notice some instructions being reordered). The file might even be the > > > same as the one from the TrueCrypt website, if the attacker is > > > collaborating with the TrueCrypt developers. > > > > > > The package maintainer thinks everything is good, so they hard-code the > > > MD5 hash of their good copy into the PKGBUILD. > > > > > > Now, when most users install TrueCrypt, they will download the *good* > > > copy, check the MD5 hash, it will match, and everything will be good. > > > However, when the adversary sees that his target (e.g. a terrorist, or > > > activist/journalist living in a non-free country) is downloading > > > TrueCrypt, they will silently replace the file with with the *bad* copy. > > > When the MD5 gets checked, it will match, and the user will be > > > encrypting their files with null keys. > > > > > > In total, it took the attacker on the order of 2^66 operations to find > > > the colliding files. You'd need a good budget (like the NSA's, or a > > > large company's) to do that, but if you take advantage of the > > > MD5-specific attacks too, you can do it for much cheaper. > > > > > > Of course, this doesn't just apply to TrueCrypt, it could be applied to > > > any other software that's checked only with an MD5 hash. To backdoor a > > > web server, for example, you could change a constant used in array index > > > bounds checking to introduce a buffer overflow vulnerability. > > > > > > [1] https://en.wikipedia.org/wiki/Birthday_attack > > > > > > [2] http://www.win.tue.nl/hashclash/rogue-ca/ > > > > > > > Salutations! > > > > Excellent explanation but I still don't see the guarantee that md5 hash > > can be cracked. I don't see any application of the pidgeon hole > > principle which would show two files (one malicious with the null key > > and one normal) would have to share the same md5 hash. Can you elaborate > > any further? > > > > Regarding, switching out the md5 hashes, I agree with the switch to > > sha256 for PKGBUILDS. > > > > Regards, > > Mark > > Salutations! > > I'd just like the point out that with the birthday attack, we'd have > 100% confidence when we have 2^128+1 different hashes (using the pidgeon > hole theorem). Anyone care to calculate a # pair vs. probability of > collision for md5 hash? > > Regards, > Mark Salutations! In addition, this birthday attack assumes coercion from source code developers. It seems that the method you demonstrated of reducing the very high number needed to be 100% sure (from 2^128+1 to 129) one has a collision is dependent on source code developers being complacent (they are letting the code be refactored) to coercion. Regards, Mark -- Mark Lee <mark@xxxxxxxxxxxx>
Attachment:
signature.asc
Description: This is a digitally signed message part