Re: Why the normative form of IETF Standards is ASCII

Andrew Sullivan <ajs@xxxxxxxxxxxx> · Fri, 12 Mar 2010 11:32:15 -0500

On Thu, Mar 11, 2010 at 11:37:55PM -0500, Donald Eastlake wrote:
> > PDF/A is a deliberately-limited format designed specifically for
> > archival purposes.
> 
> And is clearly a non-starter because I have no idea how to produce PDF
> so limited, not idea how to test a PDF to see if its "PDF/A", etc. On
> the other hand, since I produced my first ID sometime in 1992, I've
> had no particular problem producing them with nroff and I've never had
> to hunt for, write, debug, or install a single piece of software. It's
> just there already, including in Mac OS X.

Wait a minute.  That argument just boils down to, "I don't know how to
do this, so it's obviously wrong."  

First, that you don't know how to do something is by no means evidence
that it can't be done, and done easily. 

Second, I'm sure it won't come as a complete surprise that many people
find nroff to be cryptic, arcane, hard to use, and designed for an era
when the primary publication mechanism was ink on paper using output
mechanisms with limited capabilities.  If people have such trouble,
then the same argument form ("I have no idea . . .") can be used by
them.  And I dare say that, in this day and age, more people have
trouble using nroff than have trouble producing PDF/A, since
OpenOffice.org includes a little button that generates such PDFs.

Third -- and this is a point since made in this thread by others more
clearly than I originally made it -- the IETF format _is not_ plain
ASCII.  It's a page layout format that happens to restrict itself also
to ASCII characters only.  So there are completely separate issues to
address here, and we shouldn't conflate them.  

There is the archival format issue.  In my view, if we really want to
have a format for archival purposes, then something other than files
made for printing on a printer (with paper not even widely available
in parts of the world) would be an improvement.  PDF/A is one
candidate format, standardized by another SDO and apparently embraced
by a community (librarians) that really know about long-term archives
and who already have extensive experience with the pain of supporting
old computer formats.  So it seems to me to be a useful candidate for
archival purposes.  It isn't the only one.  Pointing and laughing at
an implementation of a viewer of such files because it happens to be
riddled with bugs is in no way an argument that the standard itself is
somehow dangerous, any more than noting the mess that many home
gateways make of DNS packets is an argument that we should go back to
distributing the hosts.txt file via FTP.

If we turn our attention to the utility for readers and reviewers and
those wanting to incorporate parts of text into other contexts, then
the official format that idnits permits (never mind exactly what the
RFC Editor ends up with) is _still_ inconvenient.  You can't rewrap
lines for small screens.  You can't anchor to particular sections or
diagrams (which are, anyway, hard to use because they have to be
produced as ASCII art, as though the IETF was some sort of giant
warlording fan club).  Complex equations are hard to represent and
hard to read.  And so on: the reasons why the format doesn't even work
reliably for the community actually using the documents today are
legion; repeated; and when raised often simply denied, as though such
were an argument.  

Moreover, from a process point of view, I've had at least one
contributor in DNSEXT recently refuse to update a draft because the
idnits tool checks for both form and content.  This makes the exact
formatting conventions of the page into a problem that contributors
have to worry about when trying to hammer out technical details of a
protocol.  Every contributor has to be an amateur typesetter, only
still targetting a technology that was a significant step backwards in
typeset quality compared to things professional typesetters had been
doing for centuries.  (This is not a criticism of the idnits
maintainers, who are doing the necessary thing to support a broken
rule we have in place.  If you find you have to adjust the
boilerplate manually, and that moves things around, then suddenly you
have to start counting lines on a page in order to get everything
right.)

Finally, as someone noted in this thread, the underlying assumption
that the input format has to be perfectly aligned to exactly one
output format is wrong.  We have more than one purpose in consumption
of the final product.  Why would we insist on having exactly one
format for it?

None of the above distinctions are new.  The last time this
frustrating topic heaved into view, I recall those distinctions being
made too.  I'd find it really nice if, in future when this topic comes
up, we at least stop making demonstrably false claims that the RFC
format is "plain ASCII".  I'm not so optimistic as to imagine we'll
really address the different issues and find a way forward, but not
misrepresenting the way things are would be nice.

A

-- 
Andrew Sullivan
ajs@xxxxxxxxxxxx
Shinkuro, Inc.
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf