Re: [RFC/PATCH v3 4/5] Rename "crlf" attribute as "eolconv"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13. mai 2010, at 11.58, Robert Buck wrote:

> Quick question here, while people would be in the convert.c functions
> when making the above changes. This question is related to detecting
> whether a file is text, but the question could be spun off to a
> different thread if you so wish...
> 
> Have you considered skipping the UTF8 BOM and provided that the
> remaining content is considered text allow auto conversions? The check
> is simple, and would cover at least 50% of latin-derived languages.
> Since you have the buffer at hand, and are in the same file
> (convert.c), simply check for an initial EF BB BF. This would fix some
> text files created on Windows (someone had mentioned Notepad I
> believe). Out of the box experience for eol and text detection for
> Windows users would be improved.

I just did a quick test with a plain text file; it was detected as text both with and without a utf8 BOM.  Looking at the code, characters >= 128 are considered printable so the BOM shouldn't make any difference at all.  Do you have an example utf8 text file that is misdetected as binary?
-- 
Eyvind

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]