Re: I'm a total push-over..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, 23 Jan 2008, Linus Torvalds wrote:

> On Wed, 23 Jan 2008, Johannes Schindelin wrote:
> > 
> > I fully expect it to be noticable with that UTF-8 "normalisation".  
> > But then, the infrastructure is there, and whoever has an itch to 
> > scratch...
> 
> Actually, it's going to be totally invisible even with UTF-8 
> normalization, because we're going to do it sanely.
> 
> And by "sanely" I mean just having the code test the high bit, and using 
> US-ASCII as-is (possibly with that " & ~0x20 " thing to ignore case in 
> it).
> 
> End result: practically all projects will never notice anything at all for 
> 99.9% of all files. One extra well-predicted branch, and a few more hash 
> collissions for cases where you have both "Makefile" and "makefile" etc.

Well, that's the point, to avoid having both "Makefile" and "makefile" in 
your repository when you are on case-challenged filesystems, right?

> Doing names with *lots* of UTF-8 characters will be rather slower. It's 
> still not horrible to do if you do it the smart way, though. In fact, 
> it's pretty simple, just a few table lookups (one to find the NFD form, 
> one to do the upcasing).
> 
> And yes, for hashing, it makes sense to turn things into NFD because 
> it's generally simpler, but the point is that you really don't actually 
> modify the name itself at all, you just hash things (or compare things) 
> character by expanded character.
> 
> IOW, only a total *moron* does Unicode name comparisons with
> 
> 	strcmp(convert_to_nfd(a), convert_to_nfd(b));
> 
> which is essentially what Apple does.

Heh, indeed that is what I would have done as an initial step (out of 
laziness).

> It's quite possible to do
> 
> 	utf8_nfd_strcmp(a,b)
> 
> and (a) do it tons and tons faster and (b) never have to modify the 
> strings themselves. Same goes (even more) for hashing.

Okay.  Point taken.

But I really hope that you are not proposing to use the case-ignoring 
hash when we are _not_ on a case-challenged filesystem...

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux