On Tue, Jan 5, 2010 at 10:42 PM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: > While looking at the DHT code, I noticed that it's using a 10-round > Davies-Meyer construction to generate the hashes used for file > placement. A little surprised, by this, I ran it by a couple of friends > who are experts in both cryptography and distributed data storage. The > consensus seems to be that the hash used for this purpose needs to be > collision resistant but not cryptographically strong. One theorized > that the choice made in DHT is probably based on prior examples (e.g. > Freenet and Mojo Nation) where cryptographically strong hashes were > chosen, but that the requirements driving those decisions probably don't > apply to GlusterFS. This is a non-trivial issue because these hashes > are used quite frequently and the current one is quite computationally > expensive. I note that Hsieh's SuperFastHash is already implemented in > GlusterFS and is used for other purposes. It's about 3x as fast as the > DM hash, and has better collision resistance as well. MurmurHash > (http://murmurhash.googlepages.com/) is even faster and more collision > resistant. For future releases, I suggest dropping the DM hash and > switching to one of these others. > Before we chose the DM hash we did benchmark many hashing algorithms both for performance and distribution. Among the lot DM gave the best hash distribution. Computational time of any of these hashing algorithms is too insignificant to cause any measurable performance gain. Even if we bring in a new algorithm which is 10x or even 100x faster, the overall gain is still a very tiny negligible fraction. Besides the computational savings, it is worth noting that the hash is computed just once per inode and the location is cached for all further access. Avati