Re: Should the IETF be condoning, even promoting, BOM pollution?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-09-26 13:06, Carsten Bormann wrote:
On Sep 26, 2017, at 12:55, Julian Reschke <julian.reschke@xxxxxx> wrote:

Please cite *specifically* what you think is relevant with respect to the use of BOMs in plain text files.

That’s all already been said in the thread, but to repeat, with links:

STD0063 section 6:
https://tools.ietf.org/html/rfc3629#section-6

Like:

   o  A protocol SHOULD NOT forbid use of U+FEFF as a signature for
      those textual protocol elements for which the protocol does not
      provide character encoding identification mechanisms, when a ban
      would be unenforceable, or when it is expected that
      implementations of the protocol will not be in a position to
      always use the mechanisms properly.  The latter two cases are
      likely to occur with larger protocol elements such as MIME
      entities, especially when implementations of the protocol will
      obtain such entities from file systems, from protocols that do not
      have encoding identification mechanisms for payloads (such as FTP)
      or from other protocols that do not guarantee proper
      identification of character encoding (such as HTTP).

...which is *exactly* what we're discussing here?


"Use of a BOM is neither required nor recommended for UTF-8": > http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf

That talks about whether a BOM is or is not useful to distinguish between Unicode encoding schemes. But that's not really relevant here, unless all plain text files were indeed already in one of the Unicode encoding schemes. They are not, and that's the problem.

And RFC 5198, section 2, item 5:
https://tools.ietf.org/html/rfc5198#section-2
...

That has the same problem - it assumes a world that is already fully Unicode, in which case it's correct to say that the BOM is not needed.

However, plain text files are something that predates all of this, and the tools that the consumers of plain text RFCs use deal with this mixed encoding world in several ways.

I agree that if the goal was to promote an all-unicode world, the answer would be different. But the goal of the RFC Editor is to deliver documents that people will be able to read properly with the tools they have. The tests we did showed that adding the BOM is beneficial for this.

(That said: this is a in-between period - once the transition to the format is finished, the preferred consumption format will be HTML anyway)

Of course, BOM-pollution apologists will find enough rope in these documents to hang themselves.
That is really the problem here: the tendency to weasel around decisions in standards.
(Or to make them in the first place.  UCS-2-BE vs. UCS-2-LE all over again.)

My point being: none of the things you apparently refer to applies to what we are discussing here.

Best regards, Julian

BTW: if you believe that the text *I* quoted from RFC3629 is bad, you might want to submit an erratum and/or start a discussion on updating the document.




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]