Re: Status of RFC 20 (was: Re: Gen-ART and OPS-Dir review of draft-ietf-json-text-sequence-09)

ned+ietf@xxxxxxxxxxxxxxxxx · Thu, 18 Dec 2014 07:22:09 -0800 (PST)

On 2014-12-07 01:19, Barry Leiba wrote:
>> PS: If Barry or anyone else wants to do this instead that's
>> fine by me.
>
> I already have; the status-change document is in Last Call Requested state.
>
> Barry

So RFC 20 says it defines a "coded character set". However, in current
specs (at least in APPS) we frequently talk about "character encoding
schemes" (<http://tools.ietf.org/html/rfc6365#section-2>), in general
mapping Unicode code points to octet sequences.

Actually, we try and talk about charsets, which are mappings from a series of
octets to a series of characters, and avoid all of the complicated ISO
bafflegab.

A CCS is a mapping from characters to integers. A CES is a mapping from one or
more sets of integers to octets. The combination of one or more CCSs with a CES
produces something that is usually (but not always) the same as a charset. The
distinction lies in the fact that a CCS/CES doesn't always fully specify the
meaning of all octet sequences, whereas a charset does.

The supposed utility of the complex CCS/CES approach lay in it's ability to
accomodate very complex CESs. This was thought to be the right way to do things
back in the days when no universal charset existed and thus multiple CCSs had
to be allowed in a single stream of octets.

For example, in X.400 you had the generaltext body part, which used ISO 2022 as
the CES combined with an essentially arbitrary set of CCSs that were specified
both inline and out of band.

But instead we ended up with multiple CESs, which were either profiled subsets
of ISO 2022 or schemes where the hi bit was essentially a CCS flag. It's much
more straightforward to handle such things as charsets, so that's what we did.

See RFC 2978 for additional details.

So does RFC 20 define a CES as well?

No. Of course there's an obvious CES to associate with it: The mapping of the
128 integer values it defines to octets with the same value. Do that and you
essentially have the US-ASCII charset.

If it does not, should we have an
additional document taking care of this?

I don't see why. The utility of RFC 20 lies in its specification of the meaning
of various characters. If you're using it as a specification for the US-ASCII
charset, you're using it incorrectly because like it or not, it doesn't specify
that. RFC 2046 does that by referencing ANSI X3.4-1986. No doubt it would
have been better to reference RFC 20 for the CCS part, but it wasn't online
at the time.

				Ned