Re: Should the IETF be condoning, even promoting, BOM pollution?

Julian Reschke <julian.reschke@xxxxxx> · Tue, 19 Sep 2017 19:05:04 +0200

On 2017-09-19 17:49, John C Klensin wrote:
 ...
(1) As far as I know, we still believe in running code,
especially widely-deployed running code.  I hate the idea of
BOMs in Utf-8 and have seen the harm, but, if there are widely
deployed and heavily used applications out there that depend on
it, our breaking those applications is just not what we should
be doing.

+1

(2) I note that Dave's tests applied to Microsoft bundled
applications.   If they are the main problem, then Microsoft
should be ashamed of themselves for updating those applications
to handle non-ASCII codes and then violating the clear rules for
UTf-8 (if they allow UTF-8 at all -- if they decided to not do
that and only allow, e.g., UTF-16, that would be a different
matter).  While I hope bug reports have been filed, the IETF (or
RFC Editor) setting out to break those applications is just not
what we do.

Microsoft's support for non-ASCII characters predates Unicode (AFAIU). 
Notepad has been dealing with non-ASCII characters for ages.

Not *defaulting* to UTF-8 is not a bug. It may not be what our 
preference is nowadays, but that's it.

...
(4) At the same time, if the complaint is about terrible
typography, that is a complaint about plain-text files without
any formatting controls and markup, not about ASCII.  If someone
dislikes plain-text files, they should, IMO, be looking for a
way to do something else (e.g., PDF or HTML), not trying to
"fix" plain-text files.
...

Nobody is doing that, as far as I can tell. And yes, we'll have official 
HTML variants with better typography. In the meantime, people can look 
at unofficial ones.

...
(6) If any of the new norms and tools result in plain-text files
with only ASCII characters in them starting with a BOM because
ASCII is just a subset of UTF-8, I'd consider that seriously
broken, a violation of the ASCII standard, and a few other
things.  I hope tools and test suites would check for that case
and complain if it is encountered.

I'd consider that a feature, far better than adding it on a case-by-case 
basis. And no, it's not a violation of the ASCII standard, as that 
standard wouldn't apply anymore.

And, yeah, I think some (perhaps many) of us are going to need
to have simple BOM adding and removing tools around, just as we
have had tools that convert from LF-only to CRLF formats handy
and get to use them often (I note, e.g., that the online
> ...

dos2unix does this very well.

...

Best regards, Julian