Hi, On Wed, 23 Jan 2008, Linus Torvalds wrote: > On Wed, 23 Jan 2008, Johannes Schindelin wrote: > > > > I fully expect it to be noticable with that UTF-8 "normalisation". > > But then, the infrastructure is there, and whoever has an itch to > > scratch... > > Actually, it's going to be totally invisible even with UTF-8 > normalization, because we're going to do it sanely. > > And by "sanely" I mean just having the code test the high bit, and using > US-ASCII as-is (possibly with that " & ~0x20 " thing to ignore case in > it). > > End result: practically all projects will never notice anything at all for > 99.9% of all files. One extra well-predicted branch, and a few more hash > collissions for cases where you have both "Makefile" and "makefile" etc. Well, that's the point, to avoid having both "Makefile" and "makefile" in your repository when you are on case-challenged filesystems, right? > Doing names with *lots* of UTF-8 characters will be rather slower. It's > still not horrible to do if you do it the smart way, though. In fact, > it's pretty simple, just a few table lookups (one to find the NFD form, > one to do the upcasing). > > And yes, for hashing, it makes sense to turn things into NFD because > it's generally simpler, but the point is that you really don't actually > modify the name itself at all, you just hash things (or compare things) > character by expanded character. > > IOW, only a total *moron* does Unicode name comparisons with > > strcmp(convert_to_nfd(a), convert_to_nfd(b)); > > which is essentially what Apple does. Heh, indeed that is what I would have done as an initial step (out of laziness). > It's quite possible to do > > utf8_nfd_strcmp(a,b) > > and (a) do it tons and tons faster and (b) never have to modify the > strings themselves. Same goes (even more) for hashing. Okay. Point taken. But I really hope that you are not proposing to use the case-ignoring hash when we are _not_ on a case-challenged filesystem... Ciao, Dscho - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html