On 13. mai 2010, at 11.58, Robert Buck wrote: > Quick question here, while people would be in the convert.c functions > when making the above changes. This question is related to detecting > whether a file is text, but the question could be spun off to a > different thread if you so wish... > > Have you considered skipping the UTF8 BOM and provided that the > remaining content is considered text allow auto conversions? The check > is simple, and would cover at least 50% of latin-derived languages. > Since you have the buffer at hand, and are in the same file > (convert.c), simply check for an initial EF BB BF. This would fix some > text files created on Windows (someone had mentioned Notepad I > believe). Out of the box experience for eol and text detection for > Windows users would be improved. I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM. Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all. Do you have an example utf8 text file that is misdetected as binary? -- Eyvind -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html