Re: So do both [was Re: Should the IETF be condoning, even promoting, BOM pollution?]

Yoav Nir <ynir.ietf@xxxxxxxxx> · Tue, 10 Oct 2017 21:48:57 +0300

On 10 Oct 2017, at 5:29, John C Klensin <john-ietf@xxxxxxx> wrote:

--On Monday, October 9, 2017 16:36 -0700 "Heather Flanagan (RFC
Series Editor)" <rse@xxxxxxxxxxxxxx> wrote:

On 10/9/17 10:14 AM, John C Klensin wrote:
--On Wednesday, September 27, 2017 08:38 +1300 Brian E
Carpenter <brian.e.carpenter@xxxxxxxxx> wrote:

So why don't we, the Internet standards people who believe in
rough consensus and running code, request the RFC Editor (a
friend of ours) to supply two text versions of each RFC, like

https://www.rfc-editor.org/rfc/rfc8187.txt   as today, with
BOM if relevant 
https://www.rfc-editor.org/rfc/rfc8187.ut8
containing pure UTF-8 with no BOM ever
If one were really going to do that, one would need three
representations (pick your own three-character suffixes for
the first two):

	rfc8176.utf8   (standard/normal Unicode in UTF-8, no BOM)
	rfc8176.utf8-with-BOM (as above, but...)
	rfc8176.txt    (ASCII, with characters outside the ASCII
repertoire expressed as \u'[N[N]]NNNN' (see RFC 5137) or
another escaping system of the RFC Editor's choice.

A few points to consider. First, the RFC Editor will review,
at least to some extent, every file we produce, and our tools
will need to be modified to create the additional formats;
that complexity would then need to be maintained going
forward. The more files added, the more resources it will take
to produce. This has implications for either the time it takes
to publish or the cost it takes to publish. Second, there have
also been some discussions about creating separate files for
paginated versus unpaginated text files. That would take us up
to six files just for the plain-text outputs (noting the RFC
Editor also has the PDF/A-3 and HTML to review).

Alternatively, the IETF community that prefers plain text can
develop tools that takes the one file created by the RFC
Editor and strip the BOM, add pagination, or run it through a
translation tool to get it in their native language--these
will not be produced or reviewed by the RFC Editor, but will
perhaps meet the individual desires here. Given the number of
options, opinions, and resources involved, I think this makes
the most sense.

Up to a point, yes.  On the other hand, unless the RFC Editor
intends to make a rule requiring either that sections (or
subsections) not extend over circa a page, or numbering lines,
or doing something else that facilities references into a
document, I think you'd best retain a canonical / distributed
version with page numbers, headers, and footers.  

In that case we’d all have to look up that version whenever we received a reference to something in RFC xxxx page 7. So even if it’s more comfortable for us to read the RFC in a browser or on a phone, we’d need access to this canonical version.

IMO it’s far easier to reference section and paragraph number, as in “the formula in RFC 6962, section 2.1.2, paragraph 3”. This works with any format, paginated or not.

This gets clunky if people have 4-page long paragraphs or 50-paragraph sections, but that kind of badness can and should be caught by working groups, shepherd reviews or if all else fails, gen-art reviews.  This one is not up to the RFC editor to make rules against.

Yoav