On 16/09/2017 20:42, Carsten Bormann wrote: > On Sep 15, 2017, at 22:50, Denis Ovsienko <denis@xxxxxxxxxxxxx> wrote: >> >> The presence of BOM in a UTF-8 file exactly follows Section 2 of RFC 7994 (Requirements for Plain-Text RFCs). > > Yes, I missed that sentence buried in that RFC at the time. > > Still FAIL, still utterly disappointing. > > STD0063 > https://tools.ietf.org/html/rfc3629#section-6 > (Look for all the sentences starting “A protocol SHOULD forbid use of U+FEFF as a signature...“.) > > "Use of a BOM is neither required nor recommended for UTF-8": > http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf > > RFC 7994 is a massive regression here. As far as I can tell, both RFC 3629 and RFC 5198 refer to byte streams transmitted in PDUs. RFC 7994 defines a file format, not a protocol byte stream format. That's quite explicit in section 2: "Plain-text files for RFCs will use the UTF-8 [RFC3629] character encoding... The plain-text file will include a Byte Order Mark (BOM)..." Now if we were defining a protocol for the transmission of RFCs, or more generally a protocol for the transmission of UTF-8 documents, it would be a different story. We could say: If the document starts with a Byte Order Mark (BOM) this MUST be removed prior to transmission. (Quite what FTP or RSYNC should do in such a case is an interesting discussion.) Brian