On 2017-09-19 17:49, John C Klensin wrote:
... (1) As far as I know, we still believe in running code, especially widely-deployed running code. I hate the idea of BOMs in Utf-8 and have seen the harm, but, if there are widely deployed and heavily used applications out there that depend on it, our breaking those applications is just not what we should be doing.
+1
(2) I note that Dave's tests applied to Microsoft bundled applications. If they are the main problem, then Microsoft should be ashamed of themselves for updating those applications to handle non-ASCII codes and then violating the clear rules for UTf-8 (if they allow UTF-8 at all -- if they decided to not do that and only allow, e.g., UTF-16, that would be a different matter). While I hope bug reports have been filed, the IETF (or RFC Editor) setting out to break those applications is just not what we do.
Microsoft's support for non-ASCII characters predates Unicode (AFAIU). Notepad has been dealing with non-ASCII characters for ages.
Not *defaulting* to UTF-8 is not a bug. It may not be what our preference is nowadays, but that's it.
... (4) At the same time, if the complaint is about terrible typography, that is a complaint about plain-text files without any formatting controls and markup, not about ASCII. If someone dislikes plain-text files, they should, IMO, be looking for a way to do something else (e.g., PDF or HTML), not trying to "fix" plain-text files. ...
Nobody is doing that, as far as I can tell. And yes, we'll have official HTML variants with better typography. In the meantime, people can look at unofficial ones.
... (6) If any of the new norms and tools result in plain-text files with only ASCII characters in them starting with a BOM because ASCII is just a subset of UTF-8, I'd consider that seriously broken, a violation of the ASCII standard, and a few other things. I hope tools and test suites would check for that case and complain if it is encountered.
I'd consider that a feature, far better than adding it on a case-by-case basis. And no, it's not a violation of the ASCII standard, as that standard wouldn't apply anymore.
And, yeah, I think some (perhaps many) of us are going to need to have simple BOM adding and removing tools around, just as we have had tools that convert from LF-only to CRLF formats handy and get to use them often (I note, e.g., that the online
> ... dos2unix does this very well.
...
Best regards, Julian