Re: mingw, windows, crlf/lf, and git

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 14 Feb 2007 07:51:00 -0800 (PST)

On Wed, 14 Feb 2007, Mark Levedahl wrote:
> 
> To recap, we have the following various suggestions to determine textness:
> 
> 1) ratio of ascii to non-ascii characters, possibly weighting some chars more
> than others
> 2) line length
> 3) existence of a null (\0)
> 4) file name globbing
> 5) roundtrip ( lf(crlf(file) ) == file

Actually, my patch already had one that you didn't mention: 
 6) CR never shows up alone.

So the patch I sent out basicallyhad the following rules:
 - no more than ~10% of all characters being other than regular printable 
   ASCII (where any control character except for newline/cr/tab was deemed 
   nonprintable)
 - any "lonely" CR automatically means it's binary, and I would refuse 
   to convert that to a LF (the test in the code is that CRLF count must 
   match CR count)

but the "roundtrip" rule is much too strict (it's actually perfectly 
possible for an editor to add CRLF characters only to new _lines_, leaving 
old lines with just LF - or the other way around. In fact, the editor I 
use under Linux does exactly that in reverse - if I add new lines, it will 
add those without CR, but will leave old lines with CRLF alone).

I think that to help asian languages (or strange text-files in utf8 or 
Latin1 too, for that matter: test-files with _just_ special characters), I 
should probably make the rule be that only the 0-31 range is special.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html