Re: I'm a total push-over..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 26, 2008 at 10:51:18PM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 26 Jan 2008, Marko Kreen wrote:
> > 
> > Here you misunderstood me, I was proposing following:
> > 
> > int hash_folded(const char *str, int len)
> > {
> >    char buf[512];
> >    do_folding(buf, str, len);
> >    return do_hash(buf, len);
> > }
> > 
> > That is - the folded string should stay internal to hash function.
> 
> If it's internal, it's much better, but you still missed the performance 
> angle.
> 
> The fact is, hashing can take shortcuts that folding cannot do!
> 
> Case folding, by definition, has to be "exact" (since the whole point is 
> what you're going to use the same folding function to do the compare, so 
> if you play games with folding, the compares will be wrong).

Let's rename do_folding as something else, because it is not a real
folding, but a preparation step for hash calculation. Keeping these
steps separately simplifies the code, and allows further optimization,
for instance, you do not need this do_folding step on a case-sensitive
filesystem. Though it is certainly possible to mix both steps together,
it bloats the code and makes it less readable. Of course, the idea to
avoid a temporary buffer and do everything at once is very appealing,
so I gave it a try -- and here is a 32-bit version of name_hash(), but
I am not very happy with the result:


#define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))

#define mix(a,b,c) \
{ \
	a -= c;  a ^= rot(c, 4);  c += b; \
	b -= a;  b ^= rot(a, 6);  a += c; \
	c -= b;  c ^= rot(b, 8);  b += a; \
	a -= c;  a ^= rot(c,16);  c += b; \
	b -= a;  b ^= rot(a,19);  a += c; \
	c -= b;  c ^= rot(b, 4);  b += a; \
}
#define final(a,b,c) \
{ \
	c ^= b; c -= rot(b,14); \
	a ^= c; a -= rot(c,11); \
	b ^= a; b -= rot(a,25); \
	c ^= b; c -= rot(b,16); \
	a ^= c; a -= rot(c,4);  \
	b ^= a; b -= rot(a,14); \
	c ^= b; c -= rot(b,24); \
}

#define hash_value(x) \
	hs[hp] += (x); \
	if (++hp == 3) { \
		mix (hs[0], hs[1], hs[2]); \
		hp = 0; \
	}
unsigned int name_hash(const char *name, unsigned size)
{
	unsigned hp = 0;
	unsigned hs[3];
	hs[0] = hs[1] = hs[2] = 0xdeadbeef + size;

	do {
		unsigned char c;
		if (size >= sizeof(unsigned)) {
			unsigned val = get_unaligned_uint(name);
			if (!(val & 0x80808080)) {
				val &= ~0x20202020;
				hash_value(val);
				name += sizeof(val);
				size -= sizeof(val);
				continue;
			}
		}

		while (!((c = *name) & 0x80)) {
			hash_value(c & ~0x20);
			name++;
			if (!--size)
				goto done:
		}

		do {
			// TODO: add denormalization for Mac
			unsigned val = towupper (utf8_to_wchar(&name, &size));
			hash_value(val);
		} while (size && (*name & 0x80));

	} while (size);
done:
	if (hp)
		final(a,b,c);
	return hs[2];
}


Dmitry
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux