Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 19, 2008 at 05:04:09PM -0800, Linus Torvalds wrote:
> 
> 
> On Sun, 20 Jan 2008, Wincent Colaiuta wrote:
> > 
> > For what it's worth, their choice wasn't entirely "insane" ie. it did have an
> > element of rationality: that decomposed forms are a little bit simpler to
> > sort.
> 
> No they are *not*.
> 
> In many languages, 'ä' does *not* sort like 'a' at all, and if you think 
> it does, you'll sort at least Finnish and Swedish totally wrong (åäö are 
> real letters, and they sort at the *end* of the alphabet, they have 
> nothing what-so-ever to do with the letters 'a' or 'o').

But there is no way to know whether 'ä' in a document is the Finnish 'ä'
or a 'ä' from, say, German, that sorts after 'a'.

> The fact that in *some* languages the decomposed forms sort as the base 
> letter is immaterial. It's only true in some cases.
> 
> So no, sort order is not it. To sort right, you need to use the a real 
> Unicode sort (and the decomposed form is *not* going to help you one bit, 
> quite the reverse).

Unicode sort is not enough, there is no language indicator in an Unicode
document, which is why Unicode, while solving a bunch of problems, has
its very own, cf. the infamous CJK problem.

But that's all very OT.

Mike
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux