Re: mingw, windows, crlf/lf, and git

Mark Levedahl <mdl123@xxxxxxxxxxx> · Wed, 14 Feb 2007 09:26:22 -0500

Johannes Schindelin wrote:
Last time I checked, the text files never had lines longer than 200 
characters (I chose this intentionally large). So, it might be a good 
heuristic to check the maximal line length, and refuse to believe that 
it's text once a certain (configurable) threshold is reached.

Ciao,
Dsch
Unfortunately, on my program we have folks using text files with single 
lines over 60,000 characters long, these are data files. Think for 
example of a comma or tab separated data file saved from a spreadsheet. 
In this case, the files are pure ascii. So, the line length could be 
something else to take into account, but is not decisive by itself.

To recap, we have the following various suggestions to determine textness:

1) ratio of ascii to non-ascii characters, possibly weighting some chars 
more than others
2) line length
3) existence of a null (\0)
4) file name globbing
5) roundtrip ( lf(crlf(file) ) == file

I don't think any one suggestion is completely adequate for all uses, 
all need to be available, somehow configurable. This suggests to me a 
core.AutoCRLFstrategy variable that is a comma separated list of methods 
to use (set to a reasonable default of course that does not cause 
runtime headaches on Unix): a file would be deemed binary unless all 
listed methods declare the file as text (with an empty list disabling 
AutoCRLF detection).

Mark

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html