Re: [msysGit] Re: Re: File path not escaped in warning message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Janusz,

It seems you're mixing up a few completely unrelated concepts here.

Core.quotepath enables quoting and escaping of special characters in file 
names. This has nothing to do with character set encoding of file names 
(i.e. Cp1250/ISO-8859-2/UTF-8). AFAIK, apart from git-svn, git currently 
doesn't support character set re-coding of file names at all, so 
core.quotepath and encoding are completely unrelated.

Regarding git-log / git-diff output, there are basically three different 
character set encodings involved:
1. commit log messages: re-coded to i18n.logoutputencoding (usually UTF-8)
2. file content: printed verbatim (no re-coding); gui tools such as gitk 
may decode this based on gui.encoding or .gitattributes settings
3. everything else (file names, diff headers, error / warning messages): 
always UTF-8 (at least in Git for Windows)

Gui tools such as gitk decode this output line by line using the 
appropriate encoding.


<jbialobr@xxxxx> wrote on 06.08.2012 08:53:17:
> File name is 1ą.txt its content is encoded in windows-1250

File name encoding and file content encoding are completely unrelated. 
File name encoding in current Git for Windows is *always* UTF-8, file 
content encoding can be anything.

> Output of git diff after reencoding to windows1250 is:
>
> warning: LF will be replaced by CRLF in 1Ä….txt.
> The file will have its original line endings in your working directory.

This looks like the file name is UTF-8, but reinterpreted (not reencoded) 
as if it were Cp1250. However, as stated above, you cannot simply 
interpret the entire git-log / git-diff output as beeing one particular 
encoding, as the encoding may vary on a line by line basis.

> Here is output from linux:
>
> [janusz@mikrus JavaCommon]$ git config --add core.quotepath false
> [janusz@mikrus JavaCommon]$ git diff  --unified=3 -- "1ą.txt"
> warning: LF will be replaced by CRLF in 1<B1>.txt.
> The file will have its original line endings in your working directory.

"<B1>" looks like less's escaping with missing LESSCHARSET setting.

Additionally, your Linux box seems to be set up with ISO-8859-2 system 
encoding. Git repositories created on this system will not be portable, 
i.e. using the same repository on other Linux systems, Git for Windows, 
Cygwin-git, or JGit/EGit will result in completely broken file names. The 
quasi-standard file name encoding in git repositories is UTF-8.

> There is nothing said in the manual, that core.quotepath affects 
> only header. But it is not the point. You don't know which part of 
> git output will be consumed by machine. Warning message is addressed
> to human, but it can be consumed by program in the same way as all 
> other messages and output data.

Error / warning messages may be localized, so they are particularly 
unsuitable for consumption by other programs. That's why many git commands 
have special switches to make their output machine readable (e.g. -z). 
Incidentally, 'git-log -z' also disables core.quotepath. So if you write a 
program that parses git output, and you're using the proper 'machine 
readable' version, you should never have to worry about quoted paths, 
irrespective of the core.quotepath setting.

> Imho, since warning comes from git, path should be quoted to
> make git behaviour consistent. 
> From git-log help:
> > Note that we deliberately chose not to re-code the commit log 
> message when a commit is made to force UTF-8 at the commit object 
> level, because re-coding to UTF-8 is not necessarily a reversible 
operation.
> 
> If re-coding from one encoding to other is not necessarily a 
> reversible operation, and you can set logoutputencoding to any 
> encoding you wish, you may loose some charatcers while recoding file
> path in warning message. Quoting it would be desired then.
> 

The i18n.commitencoding and i18n.logoutputencoding settings only affect 
commit log messages. They are completely unrelated to error / warning 
messages, file names, or file name quoting.

Hope that helps,
Karsten

��.n��������+%������w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]