Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > Actually, my patch already had one that you didn't mention: > 6) CR never shows up alone. Older Macs ;-)? > So the patch I sent out basicallyhad the following rules: > - no more than ~10% of all characters being other than regular printable > ASCII (where any control character except for newline/cr/tab was deemed > nonprintable) > - any "lonely" CR automatically means it's binary, and I would refuse > to convert that to a LF (the test in the code is that CRLF count must > match CR count) > ... > I think that to help asian languages (or strange text-files in utf8 or > Latin1 too, for that matter: test-files with _just_ special characters), I > should probably make the rule be that only the 0-31 range is special. I would agree. 0-31 except HT, CR, LF and ESC would be a good idea; that would not harm text in UTF-8, EUC based various locales nor ISO 2022. Patch is relative to 'pu'. -- >8 -- diff --git a/convert.c b/convert.c index ebcf717..b6b7c66 100644 --- a/convert.c +++ b/convert.c @@ -13,7 +13,7 @@ struct text_stat { unsigned cr, lf, crlf; /* These are just approximations! */ - unsigned printable, nonprintable, nul; + unsigned printable, nonprintable; }; static void gather_stats(const char *buf, unsigned long size, struct text_stat *stats) @@ -34,13 +34,11 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * stats->lf++; continue; } - if (c == '\t' || (c >= 32 && c < 127)) { - stats->printable++; + if ((c < 32) && (c != '\t' && c != '\033')) { + stats->nonprintable++; continue; } - if (!c) - stats->nul++; - stats->nonprintable++; + stats->printable++; } } @@ -50,7 +48,7 @@ static void gather_stats(const char *buf, unsigned long size, struct text_stat * static int is_binary(unsigned long size, struct text_stat *stats) { - if (stats->nul) + if (stats->nonprintable) return 1; /* * Other heuristics? Average line length might be relevant, - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html