On 2017-09-26 13:06, Carsten Bormann wrote:
On Sep 26, 2017, at 12:55, Julian Reschke <julian.reschke@xxxxxx> wrote:
Please cite *specifically* what you think is relevant with respect to the use of BOMs in plain text files.
That’s all already been said in the thread, but to repeat, with links:
STD0063 section 6:
https://tools.ietf.org/html/rfc3629#section-6
Like:
o A protocol SHOULD NOT forbid use of U+FEFF as a signature for
those textual protocol elements for which the protocol does not
provide character encoding identification mechanisms, when a ban
would be unenforceable, or when it is expected that
implementations of the protocol will not be in a position to
always use the mechanisms properly. The latter two cases are
likely to occur with larger protocol elements such as MIME
entities, especially when implementations of the protocol will
obtain such entities from file systems, from protocols that do not
have encoding identification mechanisms for payloads (such as FTP)
or from other protocols that do not guarantee proper
identification of character encoding (such as HTTP).
...which is *exactly* what we're discussing here?
"Use of a BOM is neither required nor recommended for UTF-8": > http://www.unicode.org/versions/Unicode10.0.0/ch02.pdf
That talks about whether a BOM is or is not useful to distinguish
between Unicode encoding schemes. But that's not really relevant here,
unless all plain text files were indeed already in one of the Unicode
encoding schemes. They are not, and that's the problem.
And RFC 5198, section 2, item 5:
https://tools.ietf.org/html/rfc5198#section-2
...
That has the same problem - it assumes a world that is already fully
Unicode, in which case it's correct to say that the BOM is not needed.
However, plain text files are something that predates all of this, and
the tools that the consumers of plain text RFCs use deal with this mixed
encoding world in several ways.
I agree that if the goal was to promote an all-unicode world, the answer
would be different. But the goal of the RFC Editor is to deliver
documents that people will be able to read properly with the tools they
have. The tests we did showed that adding the BOM is beneficial for this.
(That said: this is a in-between period - once the transition to the
format is finished, the preferred consumption format will be HTML anyway)
Of course, BOM-pollution apologists will find enough rope in these documents to hang themselves.
That is really the problem here: the tendency to weasel around decisions in standards.
(Or to make them in the first place. UCS-2-BE vs. UCS-2-LE all over again.)
My point being: none of the things you apparently refer to applies to
what we are discussing here.
Best regards, Julian
BTW: if you believe that the text *I* quoted from RFC3629 is bad, you
might want to submit an erratum and/or start a discussion on updating
the document.