Re: non-US-ASCII file names (e.g. Hiragana) on Windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1 December 2009 18:24, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> Thomas Singer <thomas.singer@xxxxxxxxxxx> writes:
>
>> Johannes Sixt wrote:
>>> Thomas Singer schrieb:
>>>>
>>>> Is it a German Windows limitation, that far-east characters are not
>>>> supported on it (but work fine on a Japanese Windows), are there different
>>>> (mysys)Git versions available or is this a configuration issue?
>>>
>>> It is a matter of configuration.
>>>
>>> Since 8 bits are not sufficient to support Japanese alphabet in addition
>>> to the German alphabet, programs that are not Unicode aware -- such as git
>>> -- have to make a decision which alphabet they support. The decision is
>>> made by picking a "codepage".
>>>
>>> On German Windows, you are in codepage 850 (in the console). The filenames
>>>  (that actually are in Unicode) are converted to bytes according to
>>> codepage 850 *before* git sees them. If your filenames contain Hiragana,
>>> they are substituted by the "unknown character" marker because there is no
>>> place for them in codepage 850.
> [...]
>
>>> Corollary: Stick to ASCII file names.
>>>
>>> There have been suggestions to switch the console to codepage 65001
>>> (UTF-8), but I have never heard of success reports. I'm not saying it does
>>> not work, though.
>>
>> Thanks for the detailed explanation. I know the differences between bytes
>> and characters and the needed *encoding* to convert from one to another, but
>> I did not know how Git handles it. I'm quite surprised, that -- as I
>> understand you -- msys-Git (or Git at all?) is not able to handle all
>> characters (aka unicode) at the same time. I expected it would be better
>> than older tools, e.g. SVN.
>
> The problem is not with Git, as Git is (currently) agnostic with
> respect to filename encoding; for Git filenames are opaque NUL ('\0)
> terminated binary data.  There is some infrastructure to convert
> between filename encodings and other filename quirks (like
> case-insensivity), though...

"You can use whatever encoding you want. So long as it looks like a
standard UNIX filename."






-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]