Sorry for the delay -- catching up on this thread after temporarily giving up on it. --On Wednesday, September 27, 2017 08:38 +1300 Brian E Carpenter <brian.e.carpenter@xxxxxxxxx> wrote: > So why don't we, the Internet standards people who believe in > rough consensus and running code, request the RFC Editor (a > friend of ours) to supply two text versions of each RFC, like > > https://www.rfc-editor.org/rfc/rfc8187.txt as today, with > BOM if relevant > https://www.rfc-editor.org/rfc/rfc8187.ut8 > containing pure UTF-8 with no BOM ever If one were really going to do that, one would need three representations (pick your own three-character suffixes for the first two): rfc8176.utf8 (standard/normal Unicode in UTF-8, no BOM) rfc8176.utf8-with-BOM (as above, but...) rfc8176.txt (ASCII, with characters outside the ASCII repertoire expressed as \u'[N[N]]NNNN' (see RFC 5137) or another escaping system of the RFC Editor's choice. Note that there is no good reason to assume that a text file that contains octets outside the ASCII range is UTF-8, especially if the creation date is unknown. Historically, it could as easily be encoded as specified in one of ISO/IEC 8859-X standards, some proprietary code page, etc.) Because "\u'2639'" requires significantly more horizontal space than "☹", the txt form with escapes would require some reformatting, but the native XML idea will solve all those problems, right? john