mailing list archives
MD5 To Be Considered Harmful Someday
From: Dan Kaminsky <dan () doxpara com>
Date: Mon, 06 Dec 2004 15:29:34 -0800
I've been doing some analysis on MD5 collision announced by Wang et al.
Short version: Yes, Virginia, there is no such thing as a safe hash
collision -- at least in a function that's specified to be
cryptographically secure. The full details may be acquired at the
A tool, Stripwire, has been assembled to demonstrate some of the attacks
described in the paper. It may be acquired at the following address:
Incidentally, the expectations management is by no means accidental --
the paper's titled "MD5 To Be Considered Harmful Someday" for a reason.
Some people have said there's no applied implications to Joux and Wang's
research. They're wrong; arbitrary payloads can be successfully
integrated into a hash collision. But the attacks are not wildly
practical, and in most cases exposure remains thankfully limited, for
now. But the risks are real enough that responsible engineers should
take note: This is not merely an academic threat, systems designed with
MD5 now need to take far more care than they would if they were
employing an unbroken hashing algorithm, and the problems are only going
to get worse.
Some highlights from the paper:
* The attack itself is pretty limited -- essentially, we can create
"doppelganger" blocks (my term) anywhere inside a file that may be
swapped out, one for another, without altering the final MD5 hash. This
lets us create any number of binary-inequal files with the same md5sum.
* MD5 uses an appendable cascade construction -- in other words, if you
happen to find yourself with two files that MD5 to the same hash, an
arbitrary payload can be applied to both files and they'll still have
the same hash. This leads to...
* Attacks are possible using only the proof of concept test vectors
released by Wang -- the actual attack is not necessary.
* Stripwire emits two binary packages. They both contain an arbitrary
payload, but the payload is encrypted with AES. Only one of the
packages ("Fire") is decryptable and thus dangerous; the other ("Ice")
shields its data behind AES. Both files share the same MD5 hash.
* Digital Signature systems are vulnerable, as they almost always sign a
hashed representation of data rather than the data itself.
* This is an excellent vector for malicious developers to get unsafe
code past a group of auditors, perhaps to acquire a required third party
signature. Alternatively, build tools themselves could be compromised
to embed safe versions of dangerous payloads in each build. At some
later point, the embedded payload could be safely "activated", without
the MD5 changing. This has implications for Tripwire, DRM, and several
package management architectures.
* HMAC's invulnerability has been slightly overstated. It's definitely
possible, given the key, to create two datasets with the same HMAC.
Attacker possession of the key violates MAC presumptions, so the impact
of this is particularly questionable.
* Very interesting possibilities open up once the full attack is made
available -- among other things, we can create self-decrypting
executables (fire.exe and ice.exe) that exhibit differential behavior
based on their internal colliding payloads. They'll still have the same
* Several doppelgangers may (relatively quickly, as per Joux) be
computed within a single multicollision-friendly block. As such, the
particular selection of doppelganger sets within a file can itself be
made to represent data. It's relatively straightforward to embed a 128
bit signature inside an arbitrary file, in such a way that no matter the
value of the signature, a constant MD5 hash is maintained. This is
* Many popular P2P networks (and innumerable distributed content
databases) use MD5 hashes as both a reliable search handle and a
mechanism to ensure file integrity. This makes them blind to any
signature embedded within MD5 collisions. We can use this blindness to
track MP3 audio data as it propagates from a custom P2P node.
"Strikeback" capacity against executable trafficking is even more
pronounced -- it's possible to create application installers that
self-modify with host identifying characteristics but still successfully
retransmit on P2P networks under the global search hash.
I hope this paper proves useful to the security community at large, and
I welcome feedback.
dan () doxpara com
- MD5 To Be Considered Harmful Someday Dan Kaminsky (Dec 07)