Johannes Sixt wrote: > Thomas Singer schrieb: >> Is it a German Windows limitation, that far-east characters are not >> supported on it (but work fine on a Japanese Windows), are there different >> (mysys)Git versions available or is this a configuration issue? > > It is a matter of configuration. > > Since 8 bits are not sufficient to support Japanese alphabet in addition > to the German alphabet, programs that are not Unicode aware -- such as git > -- have to make a decision which alphabet they support. The decision is > made by picking a "codepage". > > On German Windows, you are in codepage 850 (in the console). The filenames > (that actually are in Unicode) are converted to bytes according to > codepage 850 *before* git sees them. If your filenames contain Hiragana, > they are substituted by the "unknown character" marker because there is no > place for them in codepage 850. > > However, you can install Japanese language support on German Windows. Then > you can change your console to codepage 932: > > chcp 932 > > When you run git from *this* console, Hiragana in the filenames are > converted to cp932 before git sees them. The resulting byte sequence is > different from the one in cp850, but git will be able to see that the file > exists and was modified, and you can 'git add' it. > > But if you have files with umlauts, they will not be recognized anymore > because umlauts have no place in cp932. > > In neither case can you exchange the repository with Linux if you have > your locale set to UTF-8 on Linux, because neither byte sequence (umlauts > from cp850 or Hiragana from cp932) are valid UTF-8 sequences, let alone > result in the expected glyphs. > > Corollary: Stick to ASCII file names. > > There have been suggestions to switch the console to codepage 65001 > (UTF-8), but I have never heard of success reports. I'm not saying it does > not work, though. Thanks for the detailed explanation. I know the differences between bytes and characters and the needed *encoding* to convert from one to another, but I did not know how Git handles it. I'm quite surprised, that -- as I understand you -- msys-Git (or Git at all?) is not able to handle all characters (aka unicode) at the same time. I expected it would be better than older tools, e.g. SVN. BTW, we are invoking the Git executable from Java. Is there automatically a console "around" Git? Should we invoke a shell-script (which sets the console's code page) instead of the Git executable directly? -- Tom -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html