Re: XML2RFC submission (was Re: ASCII art)

Jeffrey Hutzelman <jhutz@xxxxxxx> · Fri, 02 Dec 2005 22:48:48 -0500

On Monday, November 28, 2005 12:00:44 PM -0800 Bob Braden <braden@xxxxxxx> 
wrote:

The RFC Editor has experimented with using xml2rfc for this purpose,
and found it awkward and inefficent for producing properly formatted
ASCII text.  But the two issues of primary concern to the IETF should
be the acceptable input formats (currently ASCII text and/or RFC 2629
XML) and the desired publication format(s).

And, perhaps, the interchange formats used between authors and the RFC 
Editor during AUTH48.

But I think you've made an important point here, and it surprises me 
somewhat that it needs to be said aloud among people who spend so much time 
designing network protocols.

The formats of documents...
- submitted the I-D repository
- discussed in working groups
- in last call
- submitted for publication
- exchanged during AUTH48
- published as RFC's

... are all _interchange_ formats.  They are network protocol elements, of 
a sort, and it is approriate to require for them a particular, well-defined 
format.  It's appropriate to require them to use a particular natural 
language, a particular file format, and particular conventions for document 
structure, layout, etc.

We could use different sets of requirements for each stage of the process, 
or the same set for all.  Currently, we're not at either extreme.  We have 
a fairly loose set of requirements for things submitted to the I-D 
repository, and somewhat more stringent requirements by the time a document 
is sent to the RFC Editor.  And, the _output_ of the RFC Editor process is 
yet another format which is slightly different and a lot more stringent.

The formats of documents being edited by authors and editors, or by the RFC 
Editor, are _not_ interchange formats.  They are what in protocol design we 
call "implementation details", and they are best left up to the individuals 
involved, not dictated centrally.

We don't tell implementors what language or compiler to use.
We don't tell people what MUA they must use to participate in IETF lists.
We shouldn't tell authors and editors what tools they must use, either.

So, we need to ask ourselves some questions:

- What interchange format do we want to use at each stage of the process?
- Are there stages at which we want more than one interchange format,
 and, if so, which one is authoritative?

Personally, I think the following are reasonable requirements:

- For documents at every stage up to and including RFC-Editor input:
 - Documents MUST be plain english text, encoded in US-ASCII.
 - Aside from the required header at the top of the document, no
   particular formatting is required.
 - Headers, footers, and page breaks are not required (if people really
   want them, so be it; I find them of marginal use and a source of much
   pain in computing diffs, even with good tools).
 - We can argue about line length limits; I happen to find them
   convenient, but they probably don't matter much any more.
 - Documents MAY include references to diagrams, etc., using one of
   the popular image formats of the day (JPEG? PNG?).  However, it MUST
   be possible to correctly understand the document and to comply with
   its requirements without referring to the image (this can perhaps
   be waived early in the process, but I'd certainly want to see it
   met by last call).
 - Documents SHOULD include copies of whatever source form the editor
   is using, to facilitate transfer to a new editor if necessary.  The
   preferred form is WHATEVER THE AUTHOR IS ACTUALLY USING; the idea is
   to avoid information loss by using something as close as possible to
   the source.
 - The document, images, and source are published as a group.  IMHO it
   should be possible to retrieve the whole set or just the text.

- For documents published by the RFC Editor:
 - Plain English text, UTF-8, formatted in some reasonable fashion
 - Associated images published alongside the text
 - Presentation form (PDF?) published alongside the text, with images
 - Structured source in a standard format, suitable for use as a starting
   point in creating new versions.
 - Author's original source should be archived and available on request.
 - Embedded code, MIB's, ASN.1 modules (anything that today would have
   to compile to get past the IESG) available as separate files.

Both during last call and when an RFC is published, the authoritative 
version should be the plain text version, for several reasons:

- It is the most future-proof
- Many people will review that version (not the source) anyway
- Different people reviewing the document might see something different
 due to differences in their tools or environment.  This inconsistency
 is problematic when we are trying to get a large number of people to
 agree on the _same_ text.
- It is imperative that the published authoritative version have the same
 content as the version that was reviewed.  This seems most easily
 achieved by using formats which are similar enough that it is possible
 to verify the content is the same(*).

I believe that diagrams should be handled similarly.  While they should 
always be considered non-normative, diagrams will often be consulted by the 
users of our specifications, and thus should be subject to the same review 
as the text itself.

Both the reviewed and published "authoritative" versions of diagrams should 
be images, not the structured data used to create them (though that should 
be made available as well, where feasible).  This allows someone to look at 
the published diagram and insure it is the same as what was reviewed.

-- Jeff

(*) Note that this verification _does_ happen prior to publication, during
   AUTH48.  So, the authors need to see what the final published version
   will look like.  They don't so much need to see the RFC Editor's
   internal representation, because the authors don't edit that directly.

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf