Re: So do both [was Re: Should the IETF be condoning, even promoting, BOM pollution?]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay -- catching up on this thread after
temporarily giving up on it.

--On Wednesday, September 27, 2017 08:38 +1300 Brian E Carpenter
<brian.e.carpenter@xxxxxxxxx> wrote:

> So why don't we, the Internet standards people who believe in
> rough consensus and running code, request the RFC Editor (a
> friend of ours) to supply two text versions of each RFC, like
> 
> https://www.rfc-editor.org/rfc/rfc8187.txt   as today, with
> BOM if relevant 
> https://www.rfc-editor.org/rfc/rfc8187.ut8
> containing pure UTF-8 with no BOM ever

If one were really going to do that, one would need three
representations (pick your own three-character suffixes for the
first two):

	rfc8176.utf8   (standard/normal Unicode in UTF-8, no BOM)
	rfc8176.utf8-with-BOM (as above, but...)
	rfc8176.txt    (ASCII, with characters outside the ASCII
repertoire expressed as \u'[N[N]]NNNN' (see RFC 5137) or
another escaping system of the RFC Editor's choice.
Note that there is no  good reason to assume that a text
file that contains octets outside the ASCII range is
UTF-8, especially if the creation date is unknown.
Historically, it could as easily be encoded as specified
in one of ISO/IEC 8859-X standards, some proprietary
code page, etc.)

Because "\u'2639'" requires significantly more horizontal space
than "☹", the txt form with escapes would require some
reformatting, but the native XML idea will solve all those
problems, right?

     john





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]