--On Tuesday, September 19, 2017 7:05 PM +0200 Julian Reschke <julian.reschke@xxxxxx> wrote: >... >> (2) I note that Dave's tests applied to Microsoft bundled >> applications. If they are the main problem, then Microsoft >> should be ashamed of themselves for updating those >> applications to handle non-ASCII codes and then violating the >> clear rules for UTf-8 (if they allow UTF-8 at all -- if they >> decided to not do that and only allow, e.g., UTF-16, that >> would be a different matter). While I hope bug reports have >> been filed, the IETF (or RFC Editor) setting out to break >> those applications is just not what we do. > > Microsoft's support for non-ASCII characters predates Unicode > (AFAIU). Notepad has been dealing with non-ASCII characters > for ages. Understood, my opinions about how well that worked, especially for non-Latin scripts, notwithstanding. But, again, using BOM as a substitute for charset=UTF8, is, at least IMO, not the brightest of ideas even though I'm also aware that we had a difficult transition when the web went from a default of Latin-1 to Unicode. > Not *defaulting* to UTF-8 is not a bug. It may not be what our > preference is nowadays, but that's it. See about. Slightly different discussion. But I note that it isn't hard to distinguish between Latin-1 and UTF-8 without relying on BOM -- the hard problem there involves distinguishing between the various species of 8859 and assorted code pages. >> ... >> (4) At the same time, if the complaint is about terrible >> typography, that is a complaint about plain-text files without >> any formatting controls and markup, not about ASCII. If >> someone dislikes plain-text files, they should, IMO, be >> looking for a way to do something else (e.g., PDF or HTML), >> not trying to "fix" plain-text files. >> ... > > Nobody is doing that, as far as I can tell. And yes, we'll > have official HTML variants with better typography. In the > meantime, people can look at unofficial ones. Exactly. >> ... >> (6) If any of the new norms and tools result in plain-text >> files with only ASCII characters in them starting with a BOM >> because ASCII is just a subset of UTF-8, I'd consider that >> seriously broken, a violation of the ASCII standard, and a >> few other things. I hope tools and test suites would check >> for that case and complain if it is encountered. > > I'd consider that a feature, far better than adding it on a > case-by-case basis. And no, it's not a violation of the ASCII > standard, as that standard wouldn't apply anymore. But there I believe there has been strong community consensus for preserving the ASCII format and coding for plain-text files (or at least one type of plain-text files that do not contain (substantive) non-ASCII characters. Won't bother me much (my plain-text tools work well with and without BOM), but I'd expect that, if the IETF made a decision to dump ASCII entirely, some people would look for other places to get work done. I don't believe that would be in IETF's, or the Internet's best interests. YMMD. >> And, yeah, I think some (perhaps many) of us are going to need >> to have simple BOM adding and removing tools around, just as >> we have had tools that convert from LF-only to CRLF formats >> handy and get to use them often (I note, e.g., that the online > > ... > > dos2unix does this very well. Indeed it does. And that is more or less what I was trying to say. I keep dos2unix around, if I needed it, I'd keep some sort of BOM-remover around. And I wouldn't expect to invest a lot of energy into having to use the latter, just as I don't invest a lot of energy complaining about having to use dos2unix. >> ... best, john