Re: RFC Series publishes first RFC with non-ASCII characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/09/2017 20:42, Carsten Bormann wrote:
> On Sep 15, 2017, at 22:50, Denis Ovsienko <denis@xxxxxxxxxxxxx> wrote:
>>
>> The presence of BOM in a UTF-8 file exactly follows Section 2 of RFC 7994 (Requirements for Plain-Text RFCs).
> 
> Yes, I missed that sentence buried in that RFC at the time.
> 
> Still FAIL, still utterly disappointing.
> 
> STD0063
> https://tools.ietf.org/html/rfc3629#section-6
> (Look for all the sentences starting “A protocol SHOULD forbid use of U+FEFF as a signature...“.)
> 
> "Use of a BOM is neither required nor recommended for UTF-8":
> http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf
> 
> RFC 7994 is a massive regression here.

As far as I can tell, both RFC 3629 and RFC 5198 refer to byte streams transmitted
in PDUs. RFC 7994 defines a file format, not a protocol byte stream format.
That's quite explicit in section 2:
"Plain-text files for RFCs will use the UTF-8 [RFC3629] character
encoding...
The plain-text file will include a Byte Order Mark (BOM)..."

Now if we were defining a protocol for the transmission of RFCs, or
more generally a protocol for the transmission of UTF-8 documents,
it would be a different story. We could say:

If the document starts with a Byte Order Mark (BOM) this MUST be
removed prior to transmission.

(Quite what FTP or RSYNC should do in such a case is an interesting
discussion.)

   Brian





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]