Re: non-US-ASCII file names (e.g. Hiragana) on Windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Sixt wrote:
> Thomas Singer schrieb:
>> Is it a German Windows limitation, that far-east characters are not
>> supported on it (but work fine on a Japanese Windows), are there different
>> (mysys)Git versions available or is this a configuration issue?
> 
> It is a matter of configuration.
> 
> Since 8 bits are not sufficient to support Japanese alphabet in addition
> to the German alphabet, programs that are not Unicode aware -- such as git
> -- have to make a decision which alphabet they support. The decision is
> made by picking a "codepage".
> 
> On German Windows, you are in codepage 850 (in the console). The filenames
>  (that actually are in Unicode) are converted to bytes according to
> codepage 850 *before* git sees them. If your filenames contain Hiragana,
> they are substituted by the "unknown character" marker because there is no
> place for them in codepage 850.
> 
> However, you can install Japanese language support on German Windows. Then
> you can change your console to codepage 932:
> 
>   chcp 932
> 
> When you run git from *this* console, Hiragana in the filenames are
> converted to cp932 before git sees them. The resulting byte sequence is
> different from the one in cp850, but git will be able to see that the file
> exists and was modified, and you can 'git add' it.
> 
> But if you have files with umlauts, they will not be recognized anymore
> because umlauts have no place in cp932.
> 
> In neither case can you exchange the repository with Linux if you have
> your locale set to UTF-8 on Linux, because neither byte sequence (umlauts
> from cp850 or Hiragana from cp932) are valid UTF-8 sequences, let alone
> result in the expected glyphs.
> 
> Corollary: Stick to ASCII file names.
> 
> There have been suggestions to switch the console to codepage 65001
> (UTF-8), but I have never heard of success reports. I'm not saying it does
> not work, though.

Thanks for the detailed explanation. I know the differences between bytes
and characters and the needed *encoding* to convert from one to another, but
I did not know how Git handles it. I'm quite surprised, that -- as I
understand you -- msys-Git (or Git at all?) is not able to handle all
characters (aka unicode) at the same time. I expected it would be better
than older tools, e.g. SVN.

BTW, we are invoking the Git executable from Java. Is there automatically a
console "around" Git? Should we invoke a shell-script (which sets the
console's code page) instead of the Git executable directly?

-- 
Tom
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]