Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



torsdagen den 17 januari 2008 skrev Linus Torvalds:
> And yes, I also realize that it's not going to be realistic. We're 
> probably *closer* to that than we used to be, but I don't think you can 
> even make Windows think FAT is UTF-8.
It's UTF-16 (when needed). I think it's all in the Linux kernel for you
to see.

> I don't know how NTFS works (I know it is Unicode-aware, and I think it 
> encodes filenames in UCS-2 or possibly UTF-16, but there is an obvious 1:1 
UTF-16 (was UCS-2 until MS did a s/UCS-2/UTF-16/ on the documentation).

> translation to UTF-8, and since we use C strings, I'd assume/hope Windows 
> actually uses that unambiguous translation for any filenames).

It uses the local 8-bit codepage, which is not UTF-8, often some latin-inspired
thingy, but in Asia multi-byte encodings are used. In western Europe it is
Windows-1252, which is almost, but not exactly iso-8859-1. Oh, and then we
have the cmd prompt which has another encoding in 8-bit mode.

I think there is a cygwin patch that converts to and from UTF-8. An application
can choose to use the "A" or "W" interfaces. The W-API's are the real ones and 
the others' are just wrappers that convert to and from UTF-16 before anything
happens (i.e. CreateFileA is slower than CreateFileW and so on). 

-- robin
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux