Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/06/2022 14:14, Philip Oakley wrote:
> On 01/06/2022 11:07, Philip, Bevan wrote:
>> Hey Philip,
>>
>> Thanks for the response!
>>
>>> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
>>> (utf8 / ascii bytes) were to be unchanged, including random '\r'
>>> characters. So in that respect I think it is working as initially designed.
>> This makes sense.
>>
>>> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
>> I wish I knew how this file came about, but the people that put these files in our VCS have long left. I suspect some broken generation tool.
> I vaguely remember tales that early Macs use \r as their EOL character,
> so may have been that.
>>> Should those extra \r characters also be separate EOLs? (and how to
>>> decide..?)
>> Most tooling I use seems to do this, but I agree that this is an ambiguous topic.
> maybe an extra `sed` invocation changing all the \r to \n in such cases!

It looks like StackOverflow has an answer
https://stackoverflow.com/a/42914886/717355

$ sed -i 's/\r/\n/g; s/\n$//' for the all-at-once conversion filter
using sed (with explanation!). I believe its idempotent (great word to
know ;-)

>>> Are the docs missing anything that would have helped clarify the issue earlier?
>> A brief note on the limitations of renormalization might have proven helpful
> I'll maybe add that to my list of todo's (though it's a bit long and
> aspirational;-)
>
>>  - in particular, the bit that tripped me up was the requirement to remove and restore the files from the Git repository itself.
> I think it's just a checkout and then an `add` of the renormalised files
> `git add --renormalize . ` (not forgetting the all important `dot`),
> though some may have termed the checkout as the files being 'removed'.
>
> I did notice (when cross checking a few points) that there is also a
> `merge.renormalize` config option that will then make sure that when
> branches are merged you get the required re-normalisation (check the man
> pages ..).
>
>>  It wasn't obvious to me that this would have any impact on renormalization. Additionally, a note about the restriction on converting only \r\n to \n might also have proven useful.
> OK.
>
> PS, in-line replies preferred on the list.
>> Thanks,
>> Bevan
>>
>>
>> -----Original Message-----
>> From: Philip Oakley <philipoakley@iee.email>
>> Sent: 31 May 2022 22:12
>> To: Philip, Bevan <Bevan.Philip@xxxxxxxxxxxxxx>; git@xxxxxxxxxxxxxxx
>> Subject: Re: Files with \r\n\n line endings can result in needing to renormalize twice, after deleting checked out file and restoring from repo
>>
>> On 31/05/2022 15:24, Philip, Bevan wrote:
>>> Hello all,
>>>
>>> I've experienced an odd bug/limitation with `git add --renormalize`, requiring me to run the command twice on a specific file. Here is a bug report.
>>>
>>> What did you do before the bug happened? (Steps to reproduce your
>>> issue)
>>>
>>> #!/bin/bash -x
>>> printf "Test\\r\\r\\nTest Another Line\\r\\r\\nFinal
>>> Line\\r\\r\\n\\r\\r\\n" > git.bdf printf "* text=auto\\n*.bdf text" >
>>> .gitattributes mkdir test1 cd test1 git init cp ../git.bdf .
>>> git add .
>>> git status
>>> git commit -m "Add file git.bdf"
>>> cp ../.gitattributes .
>>> git add .gitattributes
>>> git add --renormalize .
>>> git status
>>> git commit -m "Renormalize git.bdf"
>>> git add --renormalize .
>>> git status
>>> rm git.bdf
>>> git restore .
>>> git add --renormalize .
>>> git status
>>>
>>> What did you expect to happen? (Expected behavior) Only needing to
>>> renormalize the file once.
>> That sounds like an obvious expectation, ...
>>> What happened instead? (Actual behavior) Renormalize the file once,
>>> then renormalize again after deleting the file that is checked out on disk and restoring it from the object stored within the Git repo.
>>>
>>> What's different between what you expected and what actually happened?
>>> Needed to run the renormalize step again, after deleting the file checked out on disk and restoring the file from the object stored within the Git repo.
>>>
>>> Anything else you want to add:
>>> This only occurs for files with \r\r\n line endings (and possibly also
>>> ending the file with \r\r\n\r\n)
>> ... however, if I remember the design discussion correctly, normalisation was decided to be just the conversion of the Windows style EOL = `\r\n` to the Linux/*nix style EOL =`\n`, and any other characters
>> (utf8 / ascii bytes) were to be unchanged, including random '\r'
>> characters. So in that respect I think it is working as initially designed.
>>
>>> The file is in three states:
>>> - Initial state: \r\r\n line endings within Git object
>>> - Initial renormalization state: \r\n line endings within Git object
>>> - Second renormalization state: \n line endings within Git object
>>>
>>> Happens on both Windows and Linux (replicated on a fresh install of Git for Windows within Windows Sandbox). Additionally, tested with `next` trunk on Linux.
>>> System info is for a Windows build where it does happen.
>>>
>>> Directory, and file names should be irrelevant.
>>>
>>> We encountered this naturally, with some files within a SVN repo we're migrating.
>> Do you have any information on how the mixed EOL styles (extra \r etc) came about?
>> Should those extra \r characters also be separate EOLs? (and how to
>> decide..?)
>> Are the docs missing anything that would have helped clarify the issue earlier?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux