Re: An interaction with ce_match_stat_basic() and autocrlf

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At this point, the index records a blob with LF line ending,
while the work tree file has the same content with CRLF line
ending.

I think this needs more than just sleeping on.

There are two separate problems related to crlf treatment in git that manifest themselves in the quirks you see in the current implementation:

(1) The fact that the index may be misaligned with the work tree. Junio's example demonstrates this well. I have resorted to

$ rm -rf *
$ git reset --hard

in the past to get a work tree that passes

$ git status

without false positives after changing the value of autocrlf.

(2) The fact that repository content may be mangled in an indeterminate way because of the current work tree <-> repository transformation algorithm. While criticism in the past has mainly been levelled at not knowing whether a truly binary file will be correctly determined as such, content can be lost in the round trip work tree -> repository -> work tree much more simply:

$ git init
$ git config core.autocrlf true
$ echo ab | tr ab \\r\\n >a.txt
$ od -t a a.txt
0000000  cr  nl  nl
0000003
$ git add a.txt
$ git commit
$ rm a.txt
$ git reset --hard
$ od -t a a.txt
0000000  cr  nl  cr  nl
0000004

In summary, it irks me that autocrlf true mode is a second cousin of autocrlf false and I think that there *should* be an acceptable deterministic solution to this.

The solution to (2) seems easier than (1): could the transformation algorithm be made deterministic and changed to something like "convert all crlf pairs to lf if and only if no singleton cr or lf exist in the file before conversion"? If a binary file gets mangled in error, it would be an easy transformation with standard tools to get the file back again. If an otherwise text file has mixed lf and crlf endings, or additional cr or lf sprinkled randomly through it, the file is not transformed.

Given a deterministic transformation algorithm, the solution to (1) boils down to recording for each file in the work tree whether the transformation algorithm was used or not in arriving at the file's current contents, together with a way of telling git to force the use of the transformation algorithm or not for a particular file. It seems to me the place that this information *should* be recorded is the index, given that both .git/config and .gitattributes can be changed independently of the work tree. Recording the information in the index would mean that both autocrlf true and autocrlf false clones of the same repository would produce equally valid work trees with no loss of information. I am however not well versed enough in git internals at the moment to know whether this is an acceptable solution or not.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux