Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)

John C Klensin <john-ietf@xxxxxxx> · Thu, 18 Dec 2014 11:09:13 -0500

--On Thursday, December 18, 2014 15:16 +0100 Julian Reschke
<julian.reschke@xxxxxx> wrote:

>... 
> So RFC 20 says it defines a "coded character set". However, in
> current specs (at least in APPS) we frequently talk about
> "character encoding schemes"
> (<http://tools.ietf.org/html/rfc6365#section-2>), in general
> mapping Unicode code points to octet sequences.
> 
> So does RFC 20 define a CES as well? If it does not, should we
> have an additional document taking care of this?

With the understanding that RFC 20's being used successfully for
45 years continues to be a very strong argument that it doesn't
need changes or supplemental materials, and noting that many
IETF participants were not reading ANSI/USASA standards (or much
of anything else) 45 years ago,

(1) The terms "coded character set" and "[character] code for
information interchange" were in use for long before Unicode and
its multiple encodings/ representation forms started to redefine
it/them.   In this context, "long" is measured in decades, not
years.

(2) Early versions of ASCII did not specify what we would now
call "encoding" information.  It just specified repertoire and
associated 7 bit CCS.  Late ones, IIR, do specify encoding
information.  That type of difference is one of the reasons we
need to be careful about version numbers or dates when
referencing other people's standards (and why stable references
are important).  For ASCII, the result was that we ended up with
at least two different ways to put those 7 bit characters into
an 8 bit "byte" and at least two different ways to put them into
a 36 bit word.

(3) I assume partially because the encoding issues mentioned in
(2) had most people working on anything resembling applications
on the network to be familiar with the issues, RFC 20 does
specify an on-the-wire encoding for ASCII.  That is one of the
things that makes it more useful than a reference to ASCII
alone: it specifies what we started calling a "charset" in the
early MIME days, i.e., a combination between a CCS and a CES.

So, AFICT, nothing else needed in this area other than getting
on with it and ceasing to embarrass ourselves by needing to drag
out this discussion of a 45-year-old spec of something we've
used heavily and for which there has never been a problem for
what is now called the Basic Latin repertoire.

best,
     john