On Wed, 23 Jan 2008, Andreas Ericsson wrote: > > Insofar as hashes go, it's not that shabby for hashing filenames. Hashing filenames is pretty easy. You can do a reasonable job with any "multiply by an odd number, add in value". Picking the odd number is a random choice, some are better than others (and it depends on whether you end up always having a power-of-two bucket size etc), but it generally won't be horrible. And considering that we generally won't have tons and tons of pathnames (ie we'd generally have thousands, not millions), and that the underlying hash not only resizes itself but actually uses the ful 32 bits as the lookup key, I don't worry too much. I suspect my random choice is fine. So no, I didn't think my hash would _suck_, although I also didn't research which odd numbers to pick. No, it's a joke because it doesn't really give an example of how to do the _expected_ hash collissions. Here's another hash that is actually going to collide *much* more (and on purpose!), but is actually showing an example of something that actually hashes UTF strings so that upper-case and lower-case (and normalization) will all still hash to the same value: static unsigned int hash_name(const char *name, int namelen) { unsigned int hash = 0x123; do { unsigned char c = *name++; if (c & 0x80) c = 0; c &= ~0x20; hash = hash*101 + c; } while (--namelen); return hash; } but the above does so by making the hash much much worse (although probably still acceptable for "normal source code name distributions" that don't have very many same-name-in-different-cases and high bit characters anyway). The above is still fairly fast, but obviously at a serious cost in hash goodness, to the point of being totally unusable for anybody who uses Asian characters in their filenames. To actually be really useful, you'd have to teach it about the character system and do a lookup into a case/normalization table. So *that* is mainly why it's a joke. But it should work fine for the case it is used for now (exact match). Picking a better-researched constant might still be a good idea. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html