RE: git clone corrupts file.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brian,  

Thanks for your interest in this issue.   The issue has been determined to have 2 factors. 

1.  The files corrupted are in Unicode.   Though the .h file mentioned certainly doesn't have to be Unicode, it can be ANSI, we have other files that must be Unicode.  We use Unicode in quite a number of our text files.
2.  Git appears to corrupt the file by making line endings changes.  
          a.   Github has the correct file.  It views correct there.  When downloaded as a binary or text from Github in a browser, it is not corrupted. 
          b.   Git seems to change line endings as if the file were ANSI or single byte encoding, not Unicode. 
          c.   Git has the setting git config core.autocrlf false.   But apparently, it is not being observed.   
          d.   The .gitconfig file has the [core] section with the entry autocrlf = false following the section.  
          e.   There is a .gitattributes file in the repo.   
          f.    Entries in .gitattributes specified by type are specified for the affected files. 
                        *.h     text eol=crlf
                        *.ini   text eol=crlf

If you look at the 1st line of the binary view of the original file, it looks like this:

FF FE 2F 00 2F 00 7B 00   7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00  4E 00 44 00 45 00 4E 00 
43 00 49 00 45 00 53 00  7D 00 7D 00 0D 00 0A 00   	Note - Unicode CR LF  0D 00 0A 00   

2nd line 
2F 00 2F 00 20 00 4D 00  69 00 63 00 72 00 6F 00  etc.   

If you look at the git file, it looks very similar.   
However, git has put a non Unicode CF LF into the end of line. 
Plus an extra NULL.   This extra NULL throws the 2 byte Unicode encoding off.   It corrupts the line.  On the next line, the extra NULL lines up the 2 byte encoding, so that line appears uncorrupted.  
You can see that in my original email below.   Every other line is not readable.  

FF FE 2F 00 2F 00 7B 00   7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00  4E 00 44 00 45 00 4E 00 
43 00 49 00 45 00 53 00  7D 00 7D 00 0D 00 0D 0A0   	Note - Unicode CR LF  0D 00 0A 00   

2nd line 
00 2F 00 2F 00 20 00 4D 00  69 00 63 00 72 00 6F  etc.   

I would like git to observe the autocrlf false as directed.   

It's important that we retain 2 byte Unicode file encoding in many of our files.   And that git not add single byte CR LF into our 2 byte files.  
We can't convert the files to other encoding for convenience of git.  

Thanks, 

Scott Russell
Staff SW Engineer 
NCR Corporation 
Phone: +17706237512
Scott.Russell2@xxxxxxx  |  ncr.com
       

-----Original Message-----
From: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> 
Sent: Friday, August 13, 2021 6:30 PM
To: Russell, Scott <Scott.Russell2@xxxxxxx>
Cc: git@xxxxxxxxxxxxxxx
Subject: Re: git clone corrupts file.

*External Message* - Use caution before opening links or attachments

On 2021-08-13 at 18:54:43, Russell, Scott wrote:
> File from git.
> 
> ਍⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ
> // Used by CamTest.rc
> ਍⼀⼀ഀഀ
> #define IDM_ABOUTBOX                    0x0010
> ਍⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀                    ㄀  ഀഀ
> 
> File in github.
> 
> //{{NO_DEPENDENCIES}}
> // Microsoft Visual C++ generated include file.
> // Used by CamTest.rc
> //

We're probably going to need a little more information about this.  My guess as to what's happening here is that the editor you're using to view the file is set to read files as UTF-16, but the repository has them stored in UTF-8, or (less likely) vice versa.

Can you tell us what editor or other tool you're using to view the file and what settings it's using for text encoding?  Can you tell us about any working-tree-encoding declarations in your .gitattributes?  You can use "git check-attr -a PATH" to see more information about that.

What code page are you using on your system?  Are you using PowerShell, CMD, or Git Bash?  If you're using Git Bash, what are your locale settings?
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux