Re: [PATCH/RFC 0/3] Per-repository end-of-line normalization

Robert Buck <buck.robert.j@xxxxxxxxx> · Fri, 7 May 2010 23:31:04 -0400

On Fri, May 7, 2010 at 10:49 PM, hasen j <hasan.aljudy@xxxxxxxxx> wrote:
> On 7 May 2010 19:49, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> Don't be silly.
>>
>> The whole AND ONLY point of CRLF translation is that line-endings are
>> different on different platforms.
>>
>>                        Linus

Actually, Linus, that depends. And while you will recognize this, let
me state the obvious, that there are cases where for certain text
files the platform does not matter, that for all platforms they MUST
normalize to one setting. For instance there are cases where text
files MUST be LF ended on ALL platforms. Have you considered XML to be
one such example? The W3 XML spec states:

   ... [XML processors] MUST behave as if it normalized all line
breaks in external parsed entities (including the document entity) on
input, before parsing, by translating both the two-character sequence
#xD #xA and any #xD that is not followed by #xA to a single #xA
character.

So here is an example of a text file that by convention MUST be
LF-based, yes, even on Windows. And for the record, solution (sln)
files have been an XML format for seven years now. So in any one
workspace it is entirely reasonable that there may be some text files
that MUST have LF, while for other files they SHOULD have CR/LF. There
are also cases where some text files MUST have CR/LF (some scripting
languages barf on Windows otherwise).

[snip ...]

> The way git handles crlf is just confusing; in fact it's so confusing
> that it's often better to just turn it off. I'm not the only person
> who thinks that. It's specifically confusing because git thinks "if
> you're on windows then ALL your files should be CRLF", which is
> clearly what you think.

Hasen makes a good point here. It is simply this, the LF issue does
not boil down to a single boolean switch. People who think of the
LF/CRLF issue as a boolean switch are not dealing with all the facts.
There's a lot of grey, not simply black and white.

Commercial systems, decent ones that is, have had this right for years
(12+ years as I recall). We wouldn't be asking Git to do the right
thing if we weren't sold on Git already. Git is otherwise fantastic
(with using it on Windows being the apparent exception, hence this
conversation).

[snip ...]

> When that happens, it's most likely the case that these files are
> platform-dependent anyway, and so converting them back and forth
> between LF and CRLF is just a waste of time.

I disagree on this one actually, this comment is not spot on. Again,
it depends. I'd generally say,

* perform conversions, or no conversions as the case may be, on the
obvious file types
* when conversions occur, normalize internally to only one convention
* otherwise perform no conversions

> The whole idea behind my suggestion is to minimize confusion.

Confusion, yes. The Git documentation is very confusing on this
point... Linus and Junio may want to lift a page from the Perforce
book ;)

I would hope that people do agree there is a problem here, that Git
SHOULD have a good answer to the issue of line feeds. I am no expert
on Git, and I will not pretend to be, but at Iron Mountain we are
looking at adopting Git, but this is one of two questions that I have.
Having worked with complete pleasure for years with Perforce,
line-feeds had NEVER been an issue, but the documentation about
line-feed support in Git seems a bit "odd". Mind you, as much as I
love Perforce, I also love Git, perhaps more (except for Git on
Windows). But I am now digress, so back to the point...

By the way, Linus and Junio, have you read this yet:

*   http://kb.perforce.com/?article=063

It would seem to me there are some text files that by convention MUST
have LF regardless of the platform, and there are examples of text
files that MAY have CRLF depending upon the platform.

So long as an SCM has a provision to permit, whether by prescription
and/or by convention, various line-feed types, files will naturally
fall into one of the following three categories:

* normalization to LF on input, preserving otherwise; e.g. XML
* automatic conversions to platform line feeds for files otherwise
considered ordinary text
* no conversions for everything else, treated as binary

Classic examples of files that MUST have conversions to platform
line-feeds are scripts (but not all types of scripts mind you) that
otherwise would not parse properly. I'm sure we've all seen cases of
this, especially when copying files from one system type to another
over a mount. XML-based build environments are particularly
troublesome in this regard (e.g. Ant).

- Bob
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html