Re: I'm a total push-over..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 26 Jan 2008, Marko Kreen wrote:
> 
> Here you misunderstood me, I was proposing following:
> 
> int hash_folded(const char *str, int len)
> {
>    char buf[512];
>    do_folding(buf, str, len);
>    return do_hash(buf, len);
> }
> 
> That is - the folded string should stay internal to hash function.

If it's internal, it's much better, but you still missed the performance 
angle.

The fact is, hashing can take shortcuts that folding cannot do!

Case folding, by definition, has to be "exact" (since the whole point is 
what you're going to use the same folding function to do the compare, so 
if you play games with folding, the compares will be wrong).

But hashing doesn't have to be exact. It's ok to hash '{' and '[' as if 
they were different cases of the same character, if that gives you a 
faster hash function. Especially as those charactes are rather rare in 
filenames.

So if you do hashing as a function of its own, you can simply do a better 
job at it.

I do agree that the functions that create a folded set of characters from 
a _complex_ UTF-8 character should be shared between folding and hashing, 
since that code is too complex and there are no simple shortcuts for doing 
a faster hash that still retains all the properties we want. 

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux