On Sat, Jan 19, 2008 at 05:04:09PM -0800, Linus Torvalds wrote: > > > On Sun, 20 Jan 2008, Wincent Colaiuta wrote: > > > > For what it's worth, their choice wasn't entirely "insane" ie. it did have an > > element of rationality: that decomposed forms are a little bit simpler to > > sort. > > No they are *not*. > > In many languages, 'ä' does *not* sort like 'a' at all, and if you think > it does, you'll sort at least Finnish and Swedish totally wrong (åäö are > real letters, and they sort at the *end* of the alphabet, they have > nothing what-so-ever to do with the letters 'a' or 'o'). But there is no way to know whether 'ä' in a document is the Finnish 'ä' or a 'ä' from, say, German, that sorts after 'a'. > The fact that in *some* languages the decomposed forms sort as the base > letter is immaterial. It's only true in some cases. > > So no, sort order is not it. To sort right, you need to use the a real > Unicode sort (and the decomposed form is *not* going to help you one bit, > quite the reverse). Unicode sort is not enough, there is no language indicator in an Unicode document, which is why Unicode, while solving a bunch of problems, has its very own, cf. the infamous CJK problem. But that's all very OT. Mike - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html