[Mac_crypto] Apple should use SHA! (or stronger) to authenticate software releases

Arnold G. Reinhold mac_crypto@vmeng.com
Mon, 5 Apr 2004 18:43:07 -0400

At 4:51 PM +0100 4/5/04, Nicko van Someren wrote:
>While I agree that it is somewhat lax of Apple to be using MD5 for 
>checking its updates it's far from clear to me that an attack of the 
>sort described above would ever be practical.  The problem is that 
>the while there are methods for finding has collisions by brute 
>force, not only do they require a work factor around the size of the 
>square root of the output space, and a large amount of storage, but 
>the work factor is multiplied by a factor linear in the size of the 
>data that follows the part of the message that you can manipulate. 
>When all you are doing is trying to find a single collision on the 
>hash function (or perhaps trying to forge a matching pair of small 
>key exchange messages) this can be made quite small but when it 
>comes to breaking a package distribution system it is much harder.
>Consider that the real package is represented by M, the subtly 
>modified version is M1 and the bogus package is represented by M2. 
>Each of these has some high resolution TIFF image T that we can get 
>away with tweaking into T1 and T2.  We probably don't have control 
>where this image comes out in the package because the file ordering 
>is deterministic.

The file ordering may be deterministic, but someone who is well 
versed in the configuration control and release engineering process 
might well be able to have a chosen file placed at the end of of the 
package. The method for getting stuff at the end needn't be perfect. 
The attacker can keep trying until he succeeds.

>The package can be broken down into three parts: the bit before the 
>tweakable part, the tweakable image and the bit after that:
>	M  = M1a | T  | M1b
>	M1 = M1a | T1 | M1b
>	M2 = M2a | T2 | M2b
>For the purposes of finding a collision we can pre-compute the hash 
>blocks of M1a and M2a but we can't do the same for M1b and M2b. 
>This means if L(x) is the function for the number of hash blocks in 
>message x then the cost of finding a hash collision in our packages 
>is L(T1|M1b) + L(T2|M2b) times harder than just finding a simple 
>collision by brute force.  In MD5 the blocks are 64 bytes, so for 
>every 1K in M1b the process of finding a collision gets 16 times 
>harder.  Finding a collision in SHA1 is about 2^16 times harder than 
>finding a collision in MD5, so unless you find something that you 
>can alter without it being noticed in the last 2MB of the package 
>then this sort of forgery is actually going to be harder than 
>finding a simple hash collision in SHA1!  In practice you'll 
>probably find something that you can alter in the last few hundred 
>KB but still the raw processing cost will be a few orders of 
>magnitude harder than a simple hash collision problem.  Furthermore, 
>since you have to process the "tail" of the message any specialised 
>hardware for doing this will have a hard time doing it's processing 
>in a usefully pipelined fashion and will need access to large 
>amounts of very fast RAM to store the tail of the message, and this 
>will also cost you another order of magnitude or so.

Having a tail 2 MB or longer may make the processing time comparable 
to finding an SHA1 collision, but it is still a 64-bit problem and 
thus requires far less memory than finding an SHA1 collision.

I am not saying that my attack is easy, but that it is feasible for a 
large organization and very dangerous and stealthy if it succeeds. 
History has shown, over and over and over again, the folly of 
ignoring cryptographic attacks that are theoretically possible but 
seem too hard to implement. On the other hand, defending against my 
attack certainly is easy, just publish an SHA1 (or stronger) hash 
alongside the MD5 hash.

Arnold Reinhold