Re: utf8 BOM

Dmitry Potapov <dpotapov@xxxxxxxxx> · Sun, 16 May 2010 09:19:27 +0400

On Sat, May 15, 2010 at 10:23:52PM +0200, Eyvind Bernhardsen wrote:
> On 14. mai 2010, at 12.16, Dmitry Potapov wrote:
> 
> > Probably, ability of automatic add utf8 BOM on Windows to text files
> > (which are marked as "unicode") can be helpful, but it is just a part
> > of the problem of how to deal with text files in "legacy" encoding,
> > which are still widely used on Windows.
>
> Sounds like something a clean/smudge filter should be able to do.

Yes, it should if you handful files that need such conversion. However,
if you want it for every text file, running filters are slow (especially
on Windows), and they are not capable to autodetect text.

> (which hopefully works no matter what your code
> page is?  I don't know much about Windows i18n).

Yes, it does. I am not an expert on Windows either, but as far as I
know, BOM are used to mark unicode files, which could be either UTF-8
or UTF-16. BTW, UTF-16 are treated by Git as "binary" now, which may
not always convenient, because impossible to do "merge" or "diff".

> Adding this to convert.c would be more difficult, at least
> politically, since I assume it would be Windows-specific code.

I don't think it needs any Windows-specific code. We already have some
functions to convert text from different charsets, which could be used.
But this feature should be developed and tested by people who work on
Windows regularly and need this feature, because there is no substitute
for testing and experience of how well it works in practice. Currently,
I rarely use Windows and can get by clean/smudge filters.

Dmitry
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html