Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"

Eyvind Bernhardsen <eyvind.bernhardsen@xxxxxxxxx> · Thu, 13 May 2010 13:47:45 +0200



On 13. mai 2010, at 11.58, Robert Buck wrote:

> Quick question here, while people would be in the convert.c functions
> when making the above changes. This question is related to detecting
> whether a file is text, but the question could be spun off to a
> different thread if you so wish...
> 
> Have you considered skipping the UTF8 BOM and provided that the
> remaining content is considered text allow auto conversions? The check
> is simple, and would cover at least 50% of latin-derived languages.
> Since you have the buffer at hand, and are in the same file
> (convert.c), simply check for an initial EF BB BF. This would fix some
> text files created on Windows (someone had mentioned Notepad I
> believe). Out of the box experience for eol and text detection for
> Windows users would be improved.

I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM.  Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all.  Do you have an example utf8 text file that is misdetected as binary?
-- 
Eyvind

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html