Re: Should the IETF be condoning, even promoting, BOM pollution?

Julian Reschke <julian.reschke@xxxxxx> · Tue, 19 Sep 2017 19:28:06 +0200

On 2017-09-19 19:17, John C Klensin wrote:

--On Tuesday, September 19, 2017 7:05 PM +0200 Julian Reschke
<julian.reschke@xxxxxx> wrote:

...
(2) I note that Dave's tests applied to Microsoft bundled
applications.   If they are the main problem, then Microsoft
should be ashamed of themselves for updating those
applications to handle non-ASCII codes and then violating the
clear rules for UTf-8 (if they allow UTF-8 at all -- if they
decided to not do that and only allow, e.g., UTF-16, that
would be a different matter).  While I hope bug reports have
been filed, the IETF (or RFC Editor) setting out to break
those applications is just not what we do.

Microsoft's support for non-ASCII characters predates Unicode
(AFAIU). Notepad has been dealing with non-ASCII characters
for ages.

Understood, my opinions about how well that worked, especially
for non-Latin scripts, notwithstanding.  But, again, using BOM
as a substitute for charset=UTF8, is, at least IMO, not the
brightest of ideas even though I'm also aware that we had a
difficult transition when the web went from a default of Latin-1
to Unicode.
...

Where is "charset=..." supposed to come from when you open a file resource?

Not *defaulting* to UTF-8 is not a bug. It may not be what our
preference is nowadays, but that's it.

See about.  Slightly different discussion.   But I note that it
isn't hard to distinguish between Latin-1 and UTF-8 without
relying on BOM -- the hard problem there involves distinguishing
between the various species of 8859 and assorted code pages.
...

I agree that Notepad *could* be (heuristically) sniffing for UTF-8, and 
it would be interesting to hear why Microsoft doesn't do that.

...

Best regards, Julian