Re: git on MacOSX and files with decomposed utf-8 file names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 18 Jan 2008, Robin Rosenberg wrote:

> torsdagen den 17 januari 2008 skrev Linus Torvalds:
> > And yes, I also realize that it's not going to be realistic. We're 
> > probably *closer* to that than we used to be, but I don't think you can 
> > even make Windows think FAT is UTF-8.
>
> It's UTF-16 (when needed). I think it's all in the Linux kernel for you
> to see.

.. well, FAT certainly wasn't. But yes, VFAT probably is.  Not that I want 
to look at it ;)

> > translation to UTF-8, and since we use C strings, I'd assume/hope Windows 
> > actually uses that unambiguous translation for any filenames).
> 
> It uses the local 8-bit codepage, which is not UTF-8, often some latin-inspired
> thingy, but in Asia multi-byte encodings are used. In western Europe it is
> Windows-1252, which is almost, but not exactly iso-8859-1. Oh, and then we
> have the cmd prompt which has another encoding in 8-bit mode.

Well, if it uses a 8-bit codepage, then that means that as far as the 
POSIX filename interface is concerned, it has nothing what-so-ever to do 
with Unicode (ie unicode is just a totally invisible internal encoding 
issue, not externally visible).

I assume you have to use some insane Windows-only UCS-2 filename function 
to actually see any Unicode behaviour.

Sad. Because there really is no reason to use a local 8-bit codepage when 
you could just use UTF-8.

> I think there is a cygwin patch that converts to and from UTF-8. An application
> can choose to use the "A" or "W" interfaces. The W-API's are the real ones and 
> the others' are just wrappers that convert to and from UTF-16 before anything
> happens (i.e. CreateFileA is slower than CreateFileW and so on). 

So the CreateFileW() is the "native UTF-16 interface", and CreateFileA() 
is the 8-bit codepage one that has nothing to do with Unicode and is 
purely some local thing.

But for a UNIX interface layer, the most logical thing would probably be 
to map "open()" and friends not to CreateFileA(), but to 
CreateFileW(utf8_to_utf16(filename)). 

Once you do that, then it sounds like Windows would basically be Unicode, 
and hopefully without any crazy normalization (but presumably all the 
crazy case-insensitivity cannot be fixed ;^).

So it probably really only depends on whether you choose to use the insane 
8-bit code page translation or whether you just use a sane and trivial 
UTF8<->UTF16 conversion.

Anybody know which one cygwin/mingw does?

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux