On Thu, May 13, 2010 at 7:47 AM, Eyvind Bernhardsen <eyvind.bernhardsen@xxxxxxxxx> wrote: > On 13. mai 2010, at 11.58, Robert Buck wrote: > >> Quick question here, while people would be in the convert.c functions >> when making the above changes. This question is related to detecting >> whether a file is text, but the question could be spun off to a >> different thread if you so wish... >> >> Have you considered skipping the UTF8 BOM and provided that the >> remaining content is considered text allow auto conversions? The check >> is simple, and would cover at least 50% of latin-derived languages. >> Since you have the buffer at hand, and are in the same file >> (convert.c), simply check for an initial EF BB BF. This would fix some >> text files created on Windows (someone had mentioned Notepad I >> believe). Out of the box experience for eol and text detection for >> Windows users would be improved. > > I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM. Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all. Do you have an example utf8 text file that is misdetected as binary? Sorry, my bad. I misread a line in convert.c. It handles UTF-8 beautifully. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html