Re: Git EOL Normalization

Jakub Narebski <jnareb@xxxxxxxxx> · Wed, 25 May 2011 23:02:35 -0700 (PDT)

Dmitry Potapov <dpotapov@xxxxxxxxx> writes:

> On Wed, May 25, 2011 at 7:20 PM, Stephen Bash <bash@xxxxxxxxxxx> wrote:
> >
> > The open questions for me are:
> >  1) what is the actual text file detection algorithm?
> >  2) what is the autocrlf LF/CRLF detection algorithm?
> >  3) how does autocrlf handle mixed line endings? (either in the working copy or repo)
> 
> Git looks at the text attribute of a file. If it is set or unset then it
> treats the file as text or binary accordingly. If the text attribute is
> 'auto', or it is unspecified but core.autocrlf is true, then git uses
> heuristics to detect text files.
> 
> Currently, the following heuristics are used:
> 
> A file is considered as text if it does not have '\0' or a bare CR, and
> the number of non-printable characters is less than 1 in 128.
> 
> Non-printable characters are DEL (127) and anything less than 32 except
> CR, LF, BS, HT, ESC and FF.

I think git examines only first block of a file or so.  The heuristic
to detect binary-ness of a file is, as I have heard, the same or
similar to the one that GNU diff uses.

See also `perldoc -f -X`, description of "-T" and "-B" switches,
though this might differ somewhat in detection and thresholds.

> Also, to avoid problems with autocrlf=true when someone has already put
> a text file with CRLF, CRLF->LF conversion happens only if the tracked
> file in the index does not have any CR.

See also documentation of `core.safecrlf` config variable (defaults to
true IIRC).

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html