On 2017-09-19 19:17, John C Klensin wrote:
--On Tuesday, September 19, 2017 7:05 PM +0200 Julian Reschke
<julian.reschke@xxxxxx> wrote:
...
(2) I note that Dave's tests applied to Microsoft bundled
applications. If they are the main problem, then Microsoft
should be ashamed of themselves for updating those
applications to handle non-ASCII codes and then violating the
clear rules for UTf-8 (if they allow UTF-8 at all -- if they
decided to not do that and only allow, e.g., UTF-16, that
would be a different matter). While I hope bug reports have
been filed, the IETF (or RFC Editor) setting out to break
those applications is just not what we do.
Microsoft's support for non-ASCII characters predates Unicode
(AFAIU). Notepad has been dealing with non-ASCII characters
for ages.
Understood, my opinions about how well that worked, especially
for non-Latin scripts, notwithstanding. But, again, using BOM
as a substitute for charset=UTF8, is, at least IMO, not the
brightest of ideas even though I'm also aware that we had a
difficult transition when the web went from a default of Latin-1
to Unicode.
...
Where is "charset=..." supposed to come from when you open a file resource?
Not *defaulting* to UTF-8 is not a bug. It may not be what our
preference is nowadays, but that's it.
See about. Slightly different discussion. But I note that it
isn't hard to distinguish between Latin-1 and UTF-8 without
relying on BOM -- the hard problem there involves distinguishing
between the various species of 8859 and assorted code pages.
...
I agree that Notepad *could* be (heuristically) sniffing for UTF-8, and
it would be interesting to hear why Microsoft doesn't do that.
...
Best regards, Julian