Re: [RFC] adding support for md5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 17 Aug 2006, David Rientjes wrote:
>
> I'd like to solicit some comments about implementing support for md5 as a 
> hash function that could be determined at runtime by the user during a 
> project init-db.

I would _strongly_ suggest against this. At least not md5. 

I can see the point of configurable hashes, but it would be for a stronger 
hash than sha1, not for a (much) weaker one.

md5 is not only shorter, it's known to be broken, and there are attacks 
out there that generate documents with the same md5 checksum quickly and 
undetectably (ie depending on what the "document format" is, you might 
actually not _see_ the corruption).

There's a real-life example of this (just google for "same md5") with a 
postscript file, which when printed out still looks "valid".

In contrast, sha1 is still considered "hard", in that while you can 
obviously always brute-force _any_ hash, the sha1 brute-forcing attack is 
considered to be impractical and nobody has at least shown any realistic 
version of the above postscript kind of hack.

In my fairly limited performance analysis, I've actually been surprised by 
the fact that the hashing has never really shown up as a major issue in 
any of my profiles. All the _real_ performance issues have been related to 
memory usage, and things like the hash lookup (ie "memcmp()" was pretty 
high on the list - just from comparing object names during lookup).

We've also had compression issues (initial check-in) and obviously the 
delta selection used to be a _huge_ time-waster until the pack info reuse 
code went in. But I don't think we've ever had a load that was really 
hashing-limited.

So considering that md5 isn't _that_ much faster to compute (let's say 
that it's ~30% slower), the biggest advantage of md5 would likely be just 
the fact that 16 bytes is smaller than 20 bytes, and thus commit objects 
and tree objects in particular could be smaller. But you'd be better off 
just using the first 16 bytes of the sha1 than the md5 hash, if that was 
the main goal.

So yes, maybe we'll want to make the hash choice a setup-time option, but 
if we ever do, I don't think we should make md5 even a choice. It's just 
not a very good hash, and no new program should start using it. 

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]