Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"

Robert Buck <buck.robert.j@xxxxxxxxx> · Thu, 13 May 2010 09:19:15 -0400



On Thu, May 13, 2010 at 7:47 AM, Eyvind Bernhardsen
<eyvind.bernhardsen@xxxxxxxxx> wrote:
> On 13. mai 2010, at 11.58, Robert Buck wrote:
>
>> Quick question here, while people would be in the convert.c functions
>> when making the above changes. This question is related to detecting
>> whether a file is text, but the question could be spun off to a
>> different thread if you so wish...
>>
>> Have you considered skipping the UTF8 BOM and provided that the
>> remaining content is considered text allow auto conversions? The check
>> is simple, and would cover at least 50% of latin-derived languages.
>> Since you have the buffer at hand, and are in the same file
>> (convert.c), simply check for an initial EF BB BF. This would fix some
>> text files created on Windows (someone had mentioned Notepad I
>> believe). Out of the box experience for eol and text detection for
>> Windows users would be improved.
>
> I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM.  Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all.  Do you have an example utf8 text file that is misdetected as binary?

Sorry, my bad. I misread a line in convert.c. It handles UTF-8 beautifully.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html