Re: Why the normative form of IETF Standards is ASCII

Martin Rex <mrex@xxxxxxx> · Fri, 12 Mar 2010 19:43:39 +0100 (MET)

Julian Reschke wrote:
> 
> On 12.03.2010 16:58, Martin Rex wrote:
> > Julian Reschke wrote:
> >>
> >>>
> >>> Actually, the page breaks _are_ useful.  Like when referencing specific
> >>> parts/paragraph in a document with an URL in a long section, e.g.
> >>>      http://tools.ietf.org/html/rfc5246#page-36
> >>> which contains the message flow of a full TLS handshake.
> >>> And that message flow is just perfect in ASCII arts.
> >>
> >> That URL points to an HTML document, not a TXT document. There is
> >> (unfortunately) no fragment identifier syntax for text/plain (at least
> >> not one that UAs actually support)
> >
> > Wrong.  It points to a TXT document that is rendered as HTML.
> 
> No, it does not. It points to an HTML document that was converted from 
> the original TXT version (on the server, by Henrik's rfcmarkup script).

Whether the converted document is long-term cached or converted on
the fly is an insignificant implementation detail.  The point is,
that the original document is a plain ASCII text file.  And the
existing standardization for I-D and RFC formatting enables a simple
tool to recognize references and anchors and create HTML tags from it.

> 
> > If you abide to certain conventions in your plain ASCII text,
> > then everyone can recognize and use them (RFC/ID ->  HTML or ->  PDF
> > converters, accessibility tools like text->speech).  And it still
> > renders just fine on pure text environments and over very low
> > bandwidth links.
> 
> So what do accessibility tools do when they encounter page breaks, with 
> the header & footer lines? What does a screen reader to with ASCII art?

Because of the page breaks and the consistent presence of these
headers and footers just before and after the page breaks, an
accessibility tool should be able to recognize them as such.

> 
> > I-Ds and RFCs are not "publish and forget" documents, but instead
> > they're vivid snapshots of working group discussions in constant
> > motion and under discussion, and one of the most important aspects
> > is that others can easily build a derivative work from an existing
> > document (especially for expired I-Ds).
> 
> Yes. That's an argument to require an easy to re-use *submission* 
> format. But the submission format doesn't need to be identical to the 
> publication format.
> 
> The submission tool already allows you to supply additional source 
> files, such as XML. I recommend to use that.

I'm so glad that NroffEdit will happily convert a plain ASCII RFC
or I-D into suitable authoring format.  It would be horror if I
would want to suggest changes to a document and had to jump hoops to
get hold of the XML-type authoring format of a document first
and use tools like xml2rfc in order to create an updated version
of the document myself.

> 
> > Just try NRoffEdits conversion I-D ->  authoring nroff source and
> > see how easy that is.  It's a single all-in-one tool written in Java,
> > basically wysiwyg with spell checker included and makes I-D editing
> > extremely easy.
> 
> It's probably a nice tool for people willing to use NROFF. There are 
> other nice tools, based on the RFC2629 XML format.

"willing to use nroff"?

You have likely never looked at NRoffEdit.  It's an all-in-one
wysiwyg tool written in Java, that uses nroff-formatting commands
for authoring,  Instant(!) preview and output is formatted ASCII,
spell checker is built-in.   You do _NOT_ need any extra tools,
and in particular you do not need to figure out how to combine
a bunch of tools from various different sources somehow into a
productive workflow, as with xml2rfc.

For me, the usage pattern and environment for Unix man-pages and
RFC's is similar.  Opening 4 or 5 different documents simultaneously
next to each other on my screen is trivial.  Trying to accomplish that
with the online HTML-version of the documents would be a lot of work
with a significantly worse result.  The Online HTML-version of the
document is fine for Email discussions if you want to reference
a particular piece of an existing document.  Quoting the relevant
parts is still useful (in particular for those who are reading
mail while offline), but for those who are online, the URL into
the HTML-version is a shortcut for accessing the context around
the quoted part.

> 
> >> And guess what: if we go directly to HTML, we'd have anchors as well,
> >> but not only for section numbers, but also figures, tables, or even
> >> individual paragraphs.
> >
> > "Anchors" in plain-ASCII text that are human-comprehensible can
> > be automatically converted into real URLs and anchors with simple
> > tools.  These tools exist and work just fine with the existing
> > plain-ascii text documents.
> 
> Example?

Section headings, page breaks, normative/informative references
to sections of other RFC documents, in-document references to
other sections.

When I wrote my first I-D, I tried to make sure that the conversion
tool on tools.ietf.org correctly recognizes and interprets all my
references. (I was actually surpized that it picked up most of
my references to sections of other documents before I even knew
that it was doing this).

> 
> >
> > And it makes perfect sense to not only standardize on that single
> > language in spoken communication, but also in written communication.
> >
> > Anyone who enters IETF discussions, which are Email-based for a large
> > part, should provide a description of his own name with letters
> > from the US-ASCII alphabet, rather than forcing others to make
> > guesses how to do it given some kind of gibberish codepoints from
> > awkward codepages.
> > ...
> 
> I don't have any problem with that. But requiring an ASCII transcription 
> doesn't imply that the real name can't be used in *addition*. (We had a 
> discussion about this a few years ago, and that's exactly what was 
> proposed).

The content of I-Ds and RFCs should be limited to US-ASCII characters
only (with some limits for the allowed control characters).
Non-ASCII characters are going to mess up a number of visualizations
and printouts badly, and should therefore remain prohibited.
Providing a plain-ascii URL to an IETF Web-Server, where the
author can enter and maintain personal and contact information
in HTML & UTF8 should be ok.

> 
> Another one are examples for I18N in specs which are incredibly hard to 
> write unless you can actually use a few example characters. See RFC 
> 3987, for example.

If something is difficult to accomplish, maybe it should not be
done in the first place.  Personally, I think that the use of
I18N at the protocol level (I18N hostnames, I18N Email addresses,
I18N URIs) is a huge mistake and will result in lots of needless
pain when used.

A few years ago a support call was forwarded to me from a japanese
customer having problems with Single Sign-On configuration based
on the Windows domain authentication.  So I asked the customer
about his current settings in the "Local Security Policy" (the
communication is translated through our support organization).
I got back a screen shot, but all the text that was shown was
completely incomprehensible (I can not read japanese) -- and
because of a different lexical order, counting from the top
didn't work either...

> 
> > Some people think internationalized domain names are a good idea.
> > I think they are a pretty stupid idea, because they're a significant
> 
> That's a completely orthogonal discussion.

I18N in protocols (instead of in application/presentation data)
is creatung and manifesting a digital divide between cultures,
and as such it is IMHO a pretty bad idea.  Originally, the IETF
was about promoting interoperability.  The efforts around I18N
are mostly about breaking interoperability (not necessarily
between implementations, but definitely between humans
that discuss, develop, implement, configure, maintain and support
the technology).

> 
> >>> Using HTML or PDF for RFCs is about the same as moving from
> >>> English language RFCs to mandarin language RFCs.  There is
> >>> a huge number of people who can read it, but there is a
> >>> also a large number of current RFC and I-D consumers and
> >>> producers which can not and does not want to use mandarin.
> >>
> >> Sorry? Are you implying anybody is unable to display HTML?
> >
> > Yes, of course.  The majority of devices and a huge number of
> > environments is completely unable to display HTML.
> 
> And those *can* display text/plain with form feeds in a sane manner? 
> Example?

The presence of form feeds in text/plain doesn't disturb any
of the text-only environments I use.  it's still 1byte=1char character,
it doesn't <FONT FACE="Caibri, Verdana, Helvetica, Arial"><SPAN STYLE="'font-size:12pt'>completely&nbsp;mess&nbsp;up</SPAN></FONT> the flow of text and neither
extends lines beyond 80 columns.  Most HTML floating the internet
is an ecological disaster.

> 
> > I'm pretty sure there are similar ascii to pdf converters by now.
> > I would not be surprised if there even was a web service that will
> > convert an I-D into PDF in case that you can neither print formatted
> > ASCII text on your printer nor Postscript.  And if not, creating
> > such a tool is probably trivial, much much simpler than a tool
> > to make a fancy document format like HTML or PDF viewable on
> > a pure textual display.
> > ...
> 
> I have a hard time remembering when I saw a "pure textual" display the 
> last time. But anyway, even these devices can display HTML using lynx.

My NAS, my DLS-router and my Sattelite set-top box don't have lynx,
and neither of them came with a develomement environment.

Using rfcmarkup for those that prefer html and have the necessary
resources and tools already available is much more reasonable
than forcing a large part of the current users to get hold of tools
for environments that never had those tools, don't need those tools,
and often don't have the resources to support those tools (which is
why they don't have it in the first place).

If there is any migration to new technology that the IETF should
promote, then it is neither non-ASCII I-Ds/RFCs, nor is it I18N,
but instead it is IPv6.  HTML-based or PDF-based RFCs are not
going to make the transition IPv4->IPv6 easier or quicker,
so discussing how close to nil the benefit of new fancy
document formats is for some and just how bad it would be to
others is not a very productive use of IETF resources.

>
> > But more often than not, the screen-oriented formatting in HTML
> > resuls in the printouts being truncated at the right border
> > or filled with white spaces.  And removing parts of the page
> > (like a navigation box that becomes useless when printed
> > before printing it is a feature not currently supported.
> 
> The people who are in favor of HTML aren't proposing to use any "screen 
> oriented formatting", at least as far as I know. They want to use HTML 
> for the features it was originally designed for, as markup language for 
> technical documents.

But it only works well for "online" documents.  RFCs still work
well when printed out or visualized in a constrained text-only
environment.  

Even for the document rfc3987, the use of human-comprehensible
references like "section 3.2" or "[RFC2237]" or
"(section 5.2 of [RFC3986])" work just fine for printouts
_and_ for rfcmarkup input.

When HTML was originally designed, URLs were designed to link/cross-reference
arbitrary publications stored at arbitrary locations.  As we all know,
this never worked out.  The useful lifetime of URLs are extremely
low, and cross-organization there is not useful maintenance
strategy to keep them in sync.

As a standardizations body, the IETF has a different set of requirements.
The IETF can standardize exactly how documents should reference each
other in a simple, plain-ASCII, human-comprehensible, printable
and in a form that is invariant over time from specific storage
organization on particular web-sites.

It's already bad to have the URLs in RFCs to external resources
expire, it would be horror if the references among RFCs within the
RFCs were hardwired by URLs and therefore hardwired to a very
specific storage organization.

And for those that want HTML, they can use http://tools.ietf.org/html
or do what it does themselves by postprocessing the existing
ASCII documents.  And it'll work for a large number of the
existing ~5000 RFCs and much higher number of I-Ds--no need for
several different tools nor for re-authoring of those documents.

-Martin
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf