Re: I'm a total push-over..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 25 Jan 2008, Marko Kreen wrote:
> 
> Well, although this is very clever approach, I suggest against it.
> You'll end up with complex code that gives out substandard results.

Actually, *your* operation is the one that gives substandard results.

> I think its better to have separate case-folding function (or several),
> that copies string to temp buffer and then run proper optimized hash
> function on that buffer.

I'm sorry, but you just cannot do that efficiently and portably.

I can write a hash function that reliably does 8 bytes at a time for the 
common case on a 64-bit architecture, exactly because it's easy to do 
"test high bits in parallel" with a simple bitwise 'and', and we can do 
the same with "approximate lower-to-uppercase 8 bytes at a time" for a 
hash by just clearing bit 5.

In contrast, trying to do the same thing in half-way portable C, but being 
limited to having to get the case-folding *exactly* right (which you need 
for the comparison function) is much much harder. It's basically 
impossible in portable C (it's doable with architecture-specific features, 
ie vector extensions that have per-byte compares etc).

And hashing is performance-critical, much more so than the compares (ie 
you're likely to have to hash tens of thousands of files, while you will 
only compare a couple). So it really is worth optimizing for.

And the thing is, "performance" isn't a secondary feature. It's also not 
something you can add later by optimizing. 

It's also a mindset issue. Quite frankly, people who do this by "convert 
to some folded/normalized form, then do the operation" will generally make 
much more fundamental mistakes. Once you get into the mindset of "let's 
pass a corrupted strign around", you are in trouble. You start thinking 
that the corrupted string isn't really "corrupt", it's in an "optimized 
format". 

And it's all downhill from there. Don't do it.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux