Re: RFC archival format, was: Re: More liberal draft formatting standards required

Douglas Otis <dotis@xxxxxxxxxxxxxx> · Mon, 13 Jul 2009 12:56:43 -0700

On Jul 12, 2009, at 4:42 PM, Doug Ewell wrote:

This thread has been headed down the wrong path from the outset, as  
soon as Tony Hain wrote on July 1:

An alternative would be for some xml expert to fix xml2rfc to parse  
through the xml output of Word. If that happened, then the  
configuration options described in RFC 3285 would allow for wysiwyg  
editing, and I would update 3285 to reflect the xml output process.  
I realize that is a vendor specific option, but it happens to be a  
widely available one.

I modified that, along the course of the thread, to suggest that a  
separate "word2rfc" tool might be a more sensible option.

To the extent the .doc format is "highly flexible" -- which isn't  
really true anyway; it's been rather stable since 1997, and the new  
XML-based format is called .docx -- I can see that as an obstacle  
for someone writing such a conversion tool.  But I challenge anyone  
to find the slightest suggestion in this thread that we should  
publish IETF documents directly in Word format. Let's at least argue  
the same point, folks.

These concerns took your concept to a logical conclusion.  Notice the  
definition for "sttbListNames" in:
http://www.microsoft.com/interop/docs/OfficeBinaryFormats.mspx

Logically, rather than modifying TCL xml2rfc code to interpret xml2rfc  
structures embedded within Word structures, Visual Basic would  
represent a more likely tool, since it is already supported by the  
Word application.  To view this support, double click a control in  
Design Mode, and see Word open a Visual Basic editor.  Visual Basic  
provides access to ActiveX routines, where in 2007, additional content  
based routines along with custom XML storage for its binary format had  
been added.  Although placing controls directly into a Word document  
is not the norm (prints as a graphic),  these controls can generate  
RFC compliant outputs, and even bibliographic XML fragments to assist  
in the generation of the bibliographic sections.  No TCL code would be  
needed.    A less risky alternative to that of Word might be to use  
Java with Open Office.

From the IETF perspective, in addition to the ASCII text files being  
used as the archived form, xml2rfc files are retained to generate  
alternative presentations and as input for generation process.  The  
concern related to the use of the Word input format, which has changed  
in 97, 00, 02, 03, 07, and is likely again in 10, remains that of  
security.  Changes are not always apparent, and even format  
documentation can not be relied upon when details related to active  
components are ill defined.  The security concern is in regard to the  
embedded program language, especially when the program is to be relied  
upon as the means to generate IETF compliant outputs.  The Internet is  
not a safe place, where a practice of embedding programs that can  
cause harm into what could have been innocuous text should be  
considered a bad practice.  Currently, collaboration between  
individuals might be accomplished by sharing xml2rfc input files,  
which are also retained with the plain text  RFC output.  Reliance  
upon Word input files as a replacement for xml2rfc files will  
invariably lead to a bad practice of depending upon potentially  
harmful embedded programs.

Use of xml2rfc conversions has uncovered some odd quirks.  The tool  
does not cache bibliographic database selections.  Either this works  
on-line, or the entire database needs to be local.  Not to diminish  
the service offered by Carl Malamud, occasional sporadic connections  
to the xml.resource.org servers can be a cause of angst for authors  
who have not obtained the entire tarred xml bibliographic database.   
Lately, the dependability of the xml2rfc approach has become less  
reliable when dealing with cryptic entries and beta TCL needed to  
generate I-D boilerplate language as required by nit checker.

This makes one wonder whether there could be a better way.  A hybrid  
approach might offer the similar features found in xml2rfc with the  
simpler the inputs supported by 'roff.  This would not exclude the use  
of Word, but would not depend upon any of Word's content automations.   
Perhaps a bit of Perl could provide the pre and post processors to  
handle something that resembles the xml2rfc front section.  While roff  
is not perfect, it has been more stable than other WISIWYG word  
processors and, when used in conjunction with separate pre/post  
processors, can generate the desired alternative outputs.

-Doug
_______________________________________________

Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf