Re: [PATCH] Respect crlf attribute even if core.autocrlf has not been set

Steffen Prohaska <prohaska@xxxxxx> · Wed, 30 Jul 2008 07:35:19 +0200

On Jul 29, 2008, at 11:17 PM, Eyvind Bernhardsen wrote:

As you say, the reason I want the setting to be per-repository is  
that
I don't think the cost is worthwhile for every repository.

Side note: Personally, I am not very concerned about this cost, but  
some
people are...

Yeah :)

I think the real penalty is that with autocrlf enabled, Git no  
longer stores exactly what I committed.

Git does *never* exactly store what you committed.  Git compresses
your data and creates packs containing many of your individual
files in a single pack.

What matters is that git gives you exactly back what you committed.  It
does so with core.autocrlf=true, unless you check out with a different
setting for autocrlf.  There is a small chance that git decides that
a file is text even though it should be binary and that the content of
this file does not allow for reversible CRLF-conversion.  In this case
git warns about the irreversible conversion and the user gets a chance
to correct git's choice.

We accept this slight chance of irreversible conversion because we do
want to handle line-endings of text files for cross-platform use.  For
this, the goal of "giving you *exactly* back what you committed" is
modified.  Instead, we want to give you exactly back what you committed,
except for line-endings (in text files), which should be converted to
the platform-dependent line-endings (LF or CRLF), depending on the  
user's
setting.

Because of a design choice we made, CRLF must be converted on Windows.
We decided that the token that git uses *internally* to represent
a line-ending in a text file is LF.  We made this choice because git
originally supported only Unix and so we chose the Unix line-ending for
representing line-endings internally.  Now, Windows uses CRLF to
indicate line-endings but git internally uses LF, so we must convert
them.  Note that if we had users that completely ignored their native
Windows environment and only used well-selected tools, all configured to
*never* write native Windows line-ending, for these users we could set
autocrlf=false and the repository would nonetheless only contain LFs.
Those exceptional super-expert users could manually modify their
settings.  The average user (including me) will not be able to guarantee
that he will never create CRLF in text files on Windows.  Those users
simply accept that they work on Windows and use the native line-endings
(CRLF) and because we care about these average users we set  
autocrlf=true.

In contrast, setting autocrlf=input on Unix is only a safety valve.  The
average user who is only working on Unix will most likely *never* create
CRLF line-endings.  In a Unix-only environment it is actually very hard
to create CRLF line-endings.  Thus, the current default (autocrlf unset)
assumes that all text files on Unix contain only LF, and git wants LF
internally, which means we do not need to convert the line-endings.  In
cross-platform environments however, our assumption that all files on
Unix contain only LFs probably no longer holds.  In a cross-platform
environment you can easily copy files from Windows to Unix and thus
*easily* create files on Unix that contains CRLF.  In this case
autocrlf=input can save you, by correcting the line-endings for you.  In
this case, git *does not* give you exactly back what you committed, but
gives you back the very same text you committed however with the native
LF line-endings.

Personally I believe that our assumption that it is virtually impossible
to unintentionally create CRLF line-endings on Unix is wrong; but the
prevailing opinion on the list is different.  Personally, I believe that
autocrlf=input should be the default on Unix to shield the repository
from CRLFs.  I am using autocrlf=input for some time now and it has
already saved me several times.  Note that I am not working in a
Unix-only environment, but in a mixed Unix/Mac/Windows environment, so
unintentionally creating CRLFs is quite easy.

Another valid concern is speed.  But the timings that Dmitry presented
indicate that the overhead of autocrlf is so small that it is hard to
measure in practice.  I think we should stop raising this concern unless
someone comes up with timings that indicate a larger overhead than
measured by Dmitry.

	Steffen
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html