On Sat, Jan 26, 2008 at 10:51:18PM -0800, Linus Torvalds wrote: > > > On Sat, 26 Jan 2008, Marko Kreen wrote: > > > > Here you misunderstood me, I was proposing following: > > > > int hash_folded(const char *str, int len) > > { > > char buf[512]; > > do_folding(buf, str, len); > > return do_hash(buf, len); > > } > > > > That is - the folded string should stay internal to hash function. > > If it's internal, it's much better, but you still missed the performance > angle. > > The fact is, hashing can take shortcuts that folding cannot do! > > Case folding, by definition, has to be "exact" (since the whole point is > what you're going to use the same folding function to do the compare, so > if you play games with folding, the compares will be wrong). Let's rename do_folding as something else, because it is not a real folding, but a preparation step for hash calculation. Keeping these steps separately simplifies the code, and allows further optimization, for instance, you do not need this do_folding step on a case-sensitive filesystem. Though it is certainly possible to mix both steps together, it bloats the code and makes it less readable. Of course, the idea to avoid a temporary buffer and do everything at once is very appealing, so I gave it a try -- and here is a 32-bit version of name_hash(), but I am not very happy with the result: #define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k)))) #define mix(a,b,c) \ { \ a -= c; a ^= rot(c, 4); c += b; \ b -= a; b ^= rot(a, 6); a += c; \ c -= b; c ^= rot(b, 8); b += a; \ a -= c; a ^= rot(c,16); c += b; \ b -= a; b ^= rot(a,19); a += c; \ c -= b; c ^= rot(b, 4); b += a; \ } #define final(a,b,c) \ { \ c ^= b; c -= rot(b,14); \ a ^= c; a -= rot(c,11); \ b ^= a; b -= rot(a,25); \ c ^= b; c -= rot(b,16); \ a ^= c; a -= rot(c,4); \ b ^= a; b -= rot(a,14); \ c ^= b; c -= rot(b,24); \ } #define hash_value(x) \ hs[hp] += (x); \ if (++hp == 3) { \ mix (hs[0], hs[1], hs[2]); \ hp = 0; \ } unsigned int name_hash(const char *name, unsigned size) { unsigned hp = 0; unsigned hs[3]; hs[0] = hs[1] = hs[2] = 0xdeadbeef + size; do { unsigned char c; if (size >= sizeof(unsigned)) { unsigned val = get_unaligned_uint(name); if (!(val & 0x80808080)) { val &= ~0x20202020; hash_value(val); name += sizeof(val); size -= sizeof(val); continue; } } while (!((c = *name) & 0x80)) { hash_value(c & ~0x20); name++; if (!--size) goto done: } do { // TODO: add denormalization for Mac unsigned val = towupper (utf8_to_wchar(&name, &size)); hash_value(val); } while (size && (*name & 0x80)); } while (size); done: if (hp) final(a,b,c); return hs[2]; } Dmitry - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html