Johannes Schindelin wrote:
Last time I checked, the text files never had lines longer than 200
characters (I chose this intentionally large). So, it might be a good
heuristic to check the maximal line length, and refuse to believe that
it's text once a certain (configurable) threshold is reached.
Ciao,
Dsch
Unfortunately, on my program we have folks using text files with single
lines over 60,000 characters long, these are data files. Think for
example of a comma or tab separated data file saved from a spreadsheet.
In this case, the files are pure ascii. So, the line length could be
something else to take into account, but is not decisive by itself.
To recap, we have the following various suggestions to determine textness:
1) ratio of ascii to non-ascii characters, possibly weighting some chars
more than others
2) line length
3) existence of a null (\0)
4) file name globbing
5) roundtrip ( lf(crlf(file) ) == file
I don't think any one suggestion is completely adequate for all uses,
all need to be available, somehow configurable. This suggests to me a
core.AutoCRLFstrategy variable that is a comma separated list of methods
to use (set to a reasonable default of course that does not cause
runtime headaches on Unix): a file would be deemed binary unless all
listed methods declare the file as text (with an empty list disabling
AutoCRLF detection).
Mark
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html