Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Peter Krefting schrieb:
> When opening a file through open() or fopen(), the path passed is
> UTF-8 encoded.

I don't think that this assumption is valid. Whenever the Windows API has
to convert between Unicode strings and char* strings, it uses the current
"ANSI code page". As far as I know, the UTF-8 codepage (65001) cannot be
used as the "current ANSI code page". Users will always have some code
page set that is not UTF-8.

For example, if the user specifies a file name on the command line, than
it will not enter git in UTF-8, but in the current "ANSI" or "OEM code
page" encoding. If git prints a file name under the assumption that it is
UTF-8 encoded, then it will be displayed incorrectly because the system
uses a different encoding.

> Since there is no real file system abstraction beyond using stdio
> (AFAIK), I need to hack it by replacing fopen (and open). Probably
> opendir/readdir as well (might be trickier), and possibly even hack
> around main() to parse the wchar_t command-line instead of the char copy.

I think you are grossly underestimating the venture that you want to
undertake here.

Please come up with a plan how you are going to deal with the various
issues. File names enter and leave the system through different channels:

- the command line and terminal window
- object database (tree objects)
- opendir/readdir; opening files or directories for reading or writing

And there is probably some more... How do you treat encodings in these
channels? What if the file names are not valid UTF-8? Etc.

The biggest obstacle will be that git does not have a notion of "file name
encoding" - it simply treats a file name as a stream of bytes. There is no
place to write an encoding. If the byte streams are regarded as having an
encoding, then you can have ambiguities, mixed encodings, or invalid
characters. You would have to deal with this in some way.

> This will lose all chances of Windows 9x compatibility, but I don't know
> if there are any attempts of supporting it anyway?

Windows 9x is already out of the loop. We use GetFileInformationByHandle()
that is only available since Windows 2000.

-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux