On 2017-03-31 21:44, Jakub Narębski wrote: > W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze: >> On 30.03.17 21:35, Jakub Narębski wrote: >>> Hello, >>> >>> Recently I had to work on a project which uses legacy 8-bit encoding >>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX >>> documents). My terminal, that is Git Bash from Git for Windows is set >>> up for utf-8. >>> >>> I wanted for "git diff" and friends to return something sane on said >>> utf-8 terminal, instead of mojibake. There is 'encoding' >>> gitattribute... but it works only for GUI ('git gui', that is). >>> >>> Therefore I have (ab)used textconv facility to convert from cp1250 of >>> file encoding to utf-8 encoding of console. >>> >>> I have set the following in .gitattributes file: >>> >>> ## LaTeX documents in cp1250 encoding >>> *.tex text diff=mylatex >>> >>> The 'mylatex' driver is defined as: >>> >>> [diff "mylatex"] >>> xfuncname = "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$" >>> wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+" >>> textconv = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t utf-8 >>> cachetextconv = true >>> >>> And everything would be all right... if not the fact that Git appends >>> spurious ^M to added lines in the `git diff` output. Files use CRLF >>> end-of-line convention (the native MS Windows one). >>> >>> $ git diff test.tex >>> diff --git a/test.tex b/test.tex >>> index 029646e..250ab16 100644 >>> --- a/test.tex >>> +++ b/test.tex >>> @@ -1,4 +1,4 @@ >>> -\documentclass{article} >>> +\documentclass{mwart}^M >>> >>> \usepackage[cp1250]{inputenc} >>> \usepackage{polski} >>> >>> What gives? Why there is this ^M tacked on the end of added lines, >>> while it is not present in deleted lines, nor in content lines? >>> >>> Puzzled. >>> >>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity >>> that it doesn't supports in core `encoding` attribute together with >>> having `i18n.outputEncoding`. >> >> Is there a chance to give us a receipt how to reproduce it? >> A complete test script or ? >> (I don't want to speculate, if the invocation of iconv is the problem, >> where stdout is not in "binary mode", or however this is called under Windows) > > I'm sorry, I though I posted whole recipe, but I missed some details > in the above description of the case. > > First, files are stored on filesystem using CRLF eol (DOS end-of-line > convention). Due to `core.autocrlf` they are converted to LF in blobs, > that is in the index and in the repository. > > Second, a textconv with filter preserving end-of-line needs to be > configured. I have used `iconv`, but I suspect that the problem would > happen also for `cat`. > > In the .gitattributes file, or .git/info/attributes add, for example: > > *.tex text diff=myconv > > In the .git/config configure the textconv filter, for example: > > [diff "myconv"] > textconv = iconv.exe -f cp1250 -t utf-8 > > Create a file which filename matches the attribute line, and which > uses CRLF end of line convention, and add it to Git (adding it to > the index): > > $ printf "foo\r\n" >foo.tex > $ git add foo.tex > > Modify file (also with CRLF): > > $ printf "bar\r\n" >foo.tex > > Check the difference > > $ git diff foo.tex > > HTH > There seems to be a bug in Git, when it comes to "git diff". Before we feed the content of the working tree into the diff machinery, a call to convert_to_git() should be made. But it seems as there is something missing, the expected "+fox" becomes a "+foxQ" #!/bin/sh test_description='CRLF with diff filter' . ./test-lib.sh test_expect_success 'setup' ' git config core.autocrlf input && printf "foo\r\n" >foo.tex && git add foo.tex && echo >.gitattributes && git checkout -b master && git add .gitattributes && git commit -m "Add foo.txt" && cat >.git/config <<-\EOF [diff "myconv"] textconv = sed -e "s/f/g" EOF ' test_expect_success 'check EOL in diff' ' printf "fox\r\n" >foo.tex && cat >expect <<-\EOF && diff --git a/foo.tex b/foo.tex index 257cc56..88c2893 100644 --- a/foo.tex +++ b/foo.tex @@ -1 +1 @@ -foo +fox EOF git diff foo.tex | tr "\015" Q >actual && test_cmp expect actual ' test_done