On Sat, 26 Jan 2008, Marko Kreen wrote: > > Here you misunderstood me, I was proposing following: > > int hash_folded(const char *str, int len) > { > char buf[512]; > do_folding(buf, str, len); > return do_hash(buf, len); > } > > That is - the folded string should stay internal to hash function. If it's internal, it's much better, but you still missed the performance angle. The fact is, hashing can take shortcuts that folding cannot do! Case folding, by definition, has to be "exact" (since the whole point is what you're going to use the same folding function to do the compare, so if you play games with folding, the compares will be wrong). But hashing doesn't have to be exact. It's ok to hash '{' and '[' as if they were different cases of the same character, if that gives you a faster hash function. Especially as those charactes are rather rare in filenames. So if you do hashing as a function of its own, you can simply do a better job at it. I do agree that the functions that create a folded set of characters from a _complex_ UTF-8 character should be shared between folding and hashing, since that code is too complex and there are no simple shortcuts for doing a faster hash that still retains all the properties we want. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html