[Mac_crypto] Apple should use SHA! (or stronger) to authenticate software releases

Nicko van Someren mac_crypto@vmeng.com
Mon, 5 Apr 2004 16:51:56 +0100

On 4 Apr 2004, at 12:17, Arnold G. Reinhold wrote:
> A safer attack would be for the agent to insert an apparently innocent 
> modification to the package, with the modification selected so that 
> the MD5 hash of the package with the malicious code matches the hash 
> of the officially released package. Since the attacker (or whomever he 
> is working for) controls the malicious code, calculating the value of 
> this modification is subject to a meet-in-the-middle attack and 
> presents presents a 64-bit problem. Solving such a problem is within 
> the means of a well-funded attacker today* and will become easier in 
> the future.
> The modification could be designed to get past code reviews in a 
> number of ways. For example, 64 low order bits in a JPEG icon might be 
> altered. The agent would have to make the last modification to the 
> software package prior to release and perhaps send a final pre-release 
> version of the package to someone on the outside who does the 
> collision calculation, but those are hardly insurmountable hurdles.  
> In situations where new releases are relatively frequent, it may 
> suffice for this attack to succeed only occasionally, allowing 
> periodic entry into selected systems to recover private keys, for 
> example.  The attacker merely submits modifications late in the 
> release cycle and if his happens to be last then the full attack is 
> mounted.

While I agree that it is somewhat lax of Apple to be using MD5 for 
checking its updates it's far from clear to me that an attack of the 
sort described above would ever be practical.  The problem is that the 
while there are methods for finding has collisions by brute force, not 
only do they require a work factor around the size of the square root 
of the output space, and a large amount of storage, but the work factor 
is multiplied by a factor linear in the size of the data that follows 
the part of the message that you can manipulate.  When all you are 
doing is trying to find a single collision on the hash function (or 
perhaps trying to forge a matching pair of small key exchange messages) 
this can be made quite small but when it comes to breaking a package 
distribution system it is much harder.

Consider that the real package is represented by M, the subtly modified 
version is M1 and the bogus package is represented by M2.  Each of 
these has some high resolution TIFF image T that we can get away with 
tweaking into T1 and T2.  We probably don't have control where this 
image comes out in the package because the file ordering is 
deterministic.  The package can be broken down into three parts: the 
bit before the tweakable part, the tweakable image and the bit after 
	M  = M1a | T  | M1b
	M1 = M1a | T1 | M1b
	M2 = M2a | T2 | M2b
For the purposes of finding a collision we can pre-compute the hash 
blocks of M1a and M2a but we can't do the same for M1b and M2b.  This 
means if L(x) is the function for the number of hash blocks in message 
x then the cost of finding a hash collision in our packages is 
L(T1|M1b) + L(T2|M2b) times harder than just finding a simple collision 
by brute force.  In MD5 the blocks are 64 bytes, so for every 1K in M1b 
the process of finding a collision gets 16 times harder.  Finding a 
collision in SHA1 is about 2^16 times harder than finding a collision 
in MD5, so unless you find something that you can alter without it 
being noticed in the last 2MB of the package then this sort of forgery 
is actually going to be harder than finding a simple hash collision in 
SHA1!  In practice you'll probably find something that you can alter in 
the last few hundred KB but still the raw processing cost will be a few 
orders of magnitude harder than a simple hash collision problem.  
Furthermore, since you have to process the "tail" of the message any 
specialised hardware for doing this will have a hard time doing it's 
processing in a usefully pipelined fashion and will need access to 
large amounts of very fast RAM to store the tail of the message, and 
this will also cost you another order of magnitude or so.

Of course none of this is an argument for not using SHA1.  All of these 
complexities apply just as well to SHA1, meaning that forging packages 
checked with SHA1 will be four or five orders of magnitude harder than 
finding a single simple SHA1 collision.  On the other hand, it tells 
you that the prospect of any successful hash collision attack on 
Apple's packages are still some way away!