On Wed, 14 Feb 2007, Mark Levedahl wrote: > > To recap, we have the following various suggestions to determine textness: > > 1) ratio of ascii to non-ascii characters, possibly weighting some chars more > than others > 2) line length > 3) existence of a null (\0) > 4) file name globbing > 5) roundtrip ( lf(crlf(file) ) == file Actually, my patch already had one that you didn't mention: 6) CR never shows up alone. So the patch I sent out basicallyhad the following rules: - no more than ~10% of all characters being other than regular printable ASCII (where any control character except for newline/cr/tab was deemed nonprintable) - any "lonely" CR automatically means it's binary, and I would refuse to convert that to a LF (the test in the code is that CRLF count must match CR count) but the "roundtrip" rule is much too strict (it's actually perfectly possible for an editor to add CRLF characters only to new _lines_, leaving old lines with just LF - or the other way around. In fact, the editor I use under Linux does exactly that in reverse - if I add new lines, it will add those without CR, but will leave old lines with CRLF alone). I think that to help asian languages (or strange text-files in utf8 or Latin1 too, for that matter: test-files with _just_ special characters), I should probably make the rule be that only the 0-31 range is special. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html