Re: [PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 30 Jan 2018 16:14:03 +0100

> On 30 Jan 2018, at 15:40, Torsten Bögershausen <tboegi@xxxxxx> wrote:
> 
> On Tue, Jan 30, 2018 at 12:23:47PM +0100, Lars Schneider wrote:
>> 
>>> On 29 Jan 2018, at 21:19, tboegi@xxxxxx wrote:
>>> 
>>> From: Torsten Bögershausen <tboegi@xxxxxx>
>>> 
>>> UTF-16 encoded files are treated as "binary" by Git, and no CRLF
>>> conversion is done.
>>> When the UTF-16 encoded files are converted into UF-8 using the new
>> s/UF-8/UTF-8/
>> 
>> 
>>> "working-tree-encoding", the CRLF are converted if core.autocrlf is true.
>>> 
>>> This may lead to confusion:
>>> A tool writes an UTF-16 encoded file with CRLF.
>>> The file is commited with core.autocrlf=true, the CLRF are converted into LF.
>>> The repo is pushed somewhere and cloned by a different user, who has
>>> decided to use core.autocrlf=false.
>>> He uses the same tool, and now the CRLF are not there as expected, but LF,
>>> make the file useless for the tool.
>>> 
>>> Avoid this (possible) confusion by ignoring core.autocrlf for all files
>>> which have "working-tree-encoding" defined.
>> 
>> Maybe I don't understand your use case but I think this will generate even 
>> more confusion because that's not what I would expect as a user. I think Git 
>> should behave consistently independent of the used encoding. Here are my arguments:
> 
> To start with: I have probably seen too many repos with CRLF messed up.
> 
>> 
>>  (1) Legacy users are *not* affected. If you don't use the "working-tree-encoding"
>>      attribute then nothing changes for you.
> 
> People who don't use "working-tree-encoding" are not affected,
> I never ment to state that.
> 
> I am thinking about people who use "working-tree-encoding" without thinking
> about line endings.
> Or the ones that have in mind that core.autocrlf=true will leave the
> line endings for UTF-16 encoded files as is, but that changes as soon as they
> are converted into UTF-8 and the "auto" check is now done
> -after- the conversion. I would find that confusing.
> 
>> 
>>  (2) If you use the "working-tree-encoding" attribute *and* you want to ensure 
>>      your file keeps CRLF then you can define that in the attributes too. E.g.:
>> 
>>      *.proj textworking-tree-encoding=UTF-16 eol=crlf
> 
> That is a good one.
> If you ever plan a re-roll (I don't at the moment) the *.proj extemsion
> make much more sense in Documentation/gitattributes that *.tx
> There no text files encoded in UTF-16 wich are called xxx.txt, but those
> are non-ideal examples. *.proj makes good sense as an example.

OK, I'll do that. Would that fix the problem which this patch tries to address for you?
(I would also explicitly add a paragraph to discuss line endings)

- Lars