Re: Packages Verified with MD5

Mark Lee <mark@xxxxxxxxxxxx> · Sun, 12 Jan 2014 16:37:20 -0500

On Sun, 2014-01-12 at 16:29 -0500, Mark Lee wrote:
> On Sun, 2014-01-12 at 11:29 -0700, Taylor Hornby wrote:
> > On 01/12/2014 10:11 AM, Mark Lee wrote:
> > > Perhaps I'm not strong enough in mathematics but I'd like to know how
> > > possible md5 collisions can be weaponized. From what I see, the idea
> > > would be to modify a binary such that it contains malicious code
> > > (without changing the md5sum). Since most security packages contain a
> > > number of compilation tests and md5 hashes vary significantly with
> > > slight modifications, I'd like to know how these collisions can be used
> > > to hijack a system. If one has to build a binary that doesn't even
> > > encompass the functionality of the binary it's trying to mimic, wouldn't
> > > that severely decrease the effectiveness of a hash collision?
> > 
> > The problem with MD5 is that it's possible to find two different
> > messages A and B such that MD5(A) = MD5(B). The birthday attack [1]
> > makes it possible to find collisions for 128-bit hashes (like MD5) in
> > about 2^64 operations, but there are MD5-specific attacks that make it
> > even easier for MD5. So... it takes a bit of time to find a collision,
> > but not too much time. It can be done in about a day with PS3's [2].
> > 
> > For my explanation, I'll ignore the MD5-specific attacks and just use
> > the birthday attack since it's easier to understand.
> > 
> > The attacker starts out with two files:
> > 
> > 1. The official copy of TrueCrypt.
> > 
> > 2. A malicious copy of TrueCrypt.
> > 
> > The malicious change would be something simple. For example, they could
> > find where the master encryption key is copied into a buffer:
> > 
> > ...
> > MOV	[master_key + ECX], EAX     // EAX holds master key bits
> > ...
> > 
> > And change it so that it's always zero:
> > 
> > ...
> > MOV	[master_key + ECX], 0       // Now the master key is set to zero
> > ...
> > 
> > Only one instruction is different, but now TrueCrypt encrypts everything
> > with a null key.
> > 
> > From these two copies copies of TrueCrypt, they generate 2^64 variants
> > of each. They can do this by re-ordering instructions, changing which
> > registers are used for which variables, re-ordering ELF segments, or
> > just appending random strings to the file (you can append stuff to an
> > ELF binary and it will still run fine).
> > 
> > Now the attacker has:
> > 
> > 1. 2^64 files that are different, but behave exactly like the
> > non-malicious official copy of TrueCrypt.
> > 
> > 2. 2^64 files that are different, but behave exactly like the malicious
> > copy of TrueCrypt (that uses zero keys or whatever).
> > 
> > If MD5 was a good hash function, each of the files would hash to a
> > random 128-bit value. You should now be able to see that at least one
> > file from the "non-malicious" set of files probably has the same MD5
> > hash as one of the files from the "malicious" set of files.
> > 
> > There are 2^64 * 2^64 = 2^128 ways of pairing a file from the
> > non-malicious pile with one from the malicious pile. The chance of any
> > one pair colliding is 1 in 2^128, so there's a good chance that at least
> > one of the pairs collide (it's not exactly 100% since the pairs are not
> > mutually independent).
> > 
> > To find the collision quickly, the attacker can just sort one of the
> > sets of files by hash value, then for each file in the other set, use a
> > binary search to see if any file has the same hash. Ignoring log(n)
> > factors, that would take on the order of 2^64 operations.
> > 
> > So... Now the attacker has two files with the same MD5 hash. One that
> > behaves like the official copy of TrueCrypt, and another that uses null
> > keys.
> > 
> > So, they give the *good* copy to the package maintainer, who laboriously
> > checks that it's not backdoored (as if that ever happens). The package
> > maintainer thinks everything is fine, since it behaves exactly like
> > TrueCrypt and there's nothing fishy about the file (they probably won't
> > notice some instructions being reordered). The file might even be the
> > same as the one from the TrueCrypt website, if the attacker is
> > collaborating with the TrueCrypt developers.
> > 
> > The package maintainer thinks everything is good, so they hard-code the
> > MD5 hash of their good copy into the PKGBUILD.
> > 
> > Now, when most users install TrueCrypt, they will download the *good*
> > copy, check the MD5 hash, it will match, and everything will be good.
> > However, when the adversary sees that his target (e.g. a terrorist, or
> > activist/journalist living in a non-free country) is downloading
> > TrueCrypt, they will silently replace the file with with the *bad* copy.
> > When the MD5 gets checked, it will match, and the user will be
> > encrypting their files with null keys.
> > 
> > In total, it took the attacker on the order of 2^66 operations to find
> > the colliding files. You'd need a good budget (like the NSA's, or a
> > large company's) to do that, but if you take advantage of the
> > MD5-specific attacks too, you can do it for much cheaper.
> > 
> > Of course, this doesn't just apply to TrueCrypt, it could be applied to
> > any other software that's checked only with an MD5 hash. To backdoor a
> > web server, for example, you could change a constant used in array index
> > bounds checking to introduce a buffer overflow vulnerability.
> > 
> > [1] https://en.wikipedia.org/wiki/Birthday_attack
> > 
> > [2] http://www.win.tue.nl/hashclash/rogue-ca/
> > 
> 
> Salutations!
> 
> Excellent explanation but I still don't see the guarantee that md5 hash
> can be cracked. I don't see any application of the pidgeon hole
> principle which would show two files (one malicious with the null key
> and one normal) would have to share the same md5 hash. Can you elaborate
> any further?
> 
> Regarding, switching out the md5 hashes, I agree with the switch to
> sha256 for PKGBUILDS.
> 
> Regards,
> Mark

Salutations!

I'd just like the point out that with the birthday attack, we'd have
100% confidence when we have 2^128+1 different hashes (using the pidgeon
hole theorem). Anyone care to calculate a # pair vs. probability of
collision for md5 hash?

Regards,
Mark
-- 
Mark Lee <mark@xxxxxxxxxxxx>
Attachment:
signature.asc

Description: This is a digitally signed message part