Dmitry Potapov <dpotapov@xxxxxxxxx> writes: > On Wed, May 25, 2011 at 7:20 PM, Stephen Bash <bash@xxxxxxxxxxx> wrote: > > > > The open questions for me are: > > 1) what is the actual text file detection algorithm? > > 2) what is the autocrlf LF/CRLF detection algorithm? > > 3) how does autocrlf handle mixed line endings? (either in the working copy or repo) > > Git looks at the text attribute of a file. If it is set or unset then it > treats the file as text or binary accordingly. If the text attribute is > 'auto', or it is unspecified but core.autocrlf is true, then git uses > heuristics to detect text files. > > Currently, the following heuristics are used: > > A file is considered as text if it does not have '\0' or a bare CR, and > the number of non-printable characters is less than 1 in 128. > > Non-printable characters are DEL (127) and anything less than 32 except > CR, LF, BS, HT, ESC and FF. I think git examines only first block of a file or so. The heuristic to detect binary-ness of a file is, as I have heard, the same or similar to the one that GNU diff uses. See also `perldoc -f -X`, description of "-T" and "-B" switches, though this might differ somewhat in detection and thresholds. > Also, to avoid problems with autocrlf=true when someone has already put > a text file with CRLF, CRLF->LF conversion happens only if the tracked > file in the index does not have any CR. See also documentation of `core.safecrlf` config variable (defaults to true IIRC). -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html