On Thu, May 13, 2010 at 01:47:45PM +0200, Eyvind Bernhardsen wrote: > > I just did a quick test with a plain text file; it was detected as > text both with and without a utf8 BOM. Looking at the code, > characters >= 128 are considered printable so the BOM shouldn't make > any difference at all. Do you have an example utf8 text file that is > misdetected as binary? Though UTF-8 BOM does not present any problem for automatic text detector, it is another piece from Microsoft that creates some interoperability issues when you work with non-ASCII text files. In short: 1. Microsoft editors and tools like to add utf8 BOM to files, and you cannot turn this behavior off. 2. Many tools (such as Microsoft compiler) incapable to recognize UTF-8 files without BOM, so they screw up all non-ASCII chars. #1 is a problem, because it creates changes consisting solely of adding utf8 BOM. Moreover, users of non-Windows platforms are not exactly thrilled with having utf8 BOM at the beginning of every text file. Probably, ability of automatic add utf8 BOM on Windows to text files (which are marked as "unicode") can be helpful, but it is just a part of the problem of how to deal with text files in "legacy" encoding, which are still widely used on Windows. Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html