Re: [RFC PATCH] Windows: Assume all file names to be UTF-8 encoded.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



===Re:===
You cannot expect users to switch the locale. For example, I have to
test
our software with Japanese settings: I *cannot* switch to UTF-8 just
because of git.

Can you set the local codepage per program? (I don't know.) It might
help
here, but it doesn't help in all cases, particularly in certain
pipelines:
===end===

Yes, you can.  The code page can be set per thread.  The function call
is:

	SetThreadLocale (lcid);

where lcid is just 65001 for UTF-8.  (The other fields in the LCID are
high-order bits and all zero for no sublanguage and default sort order).

When a thread is created, it starts with the system default thread
locale.  So call SetThreadLocale on every thread you create.  In
particular, realize that the new thread does not inherit this from the
creating thread.

Meanwhile... the file I/O functions don't use the same code page.  The
encoding of file names on a floppy disk or whatnot was historically done
using the "OEM code page", and when a different code page is used for
text editing, that shouldn't break compatibility.  So, all functions
exported from Kernel32.dll that accept or return file names uses a
separate setting, and setting the locale as shown above will not affect
it.  This might be the source of confusion to those experimenting with
it.

So, also make a call to
	
	SetFileApisToANSI();

This affects the entire process, not just the thread.

So much for specifying UTF-8 file names in Windows.  A related issue is
the console input and output of same.  I don't know if the sh program
that is part of msys or Cygwin does anything to the console window it is
using, but each console window can have its own code page as well.  The
default for 8-bit API (char*'s) is also the OEM character set, not the
so-called ANSI character set that is specified with SetThreadLocale.
I've not experimented with setting this (and restoring it) within a
program invoked in that console.  But if you use the 16-bit API for
console I/O, it is not a problem and works regardless of how the user
chose to set it.  To make it even more confusing, the console doesn't
respect the UTF-8 setting if the font is not set properly too.

--John


TradeStation Group, Inc. is a publicly-traded holding company (NASDAQ GS: TRAD) of three operating subsidiaries, TradeStation Securities, Inc. (Member NYSE, FINRA, SIPC and NFA), TradeStation Technologies, Inc., a trading software and subscription company, and TradeStation Europe Limited, a United Kingdom, FSA-authorized introducing brokerage firm. None of these companies provides trading or investment advice, recommendations or endorsements of any kind. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux