On 2014-12-07 01:19, Barry Leiba wrote: >> PS: If Barry or anyone else wants to do this instead that's >> fine by me. > > I already have; the status-change document is in Last Call Requested state. > > Barry
So RFC 20 says it defines a "coded character set". However, in current specs (at least in APPS) we frequently talk about "character encoding schemes" (<http://tools.ietf.org/html/rfc6365#section-2>), in general mapping Unicode code points to octet sequences.
Actually, we try and talk about charsets, which are mappings from a series of octets to a series of characters, and avoid all of the complicated ISO bafflegab. A CCS is a mapping from characters to integers. A CES is a mapping from one or more sets of integers to octets. The combination of one or more CCSs with a CES produces something that is usually (but not always) the same as a charset. The distinction lies in the fact that a CCS/CES doesn't always fully specify the meaning of all octet sequences, whereas a charset does. The supposed utility of the complex CCS/CES approach lay in it's ability to accomodate very complex CESs. This was thought to be the right way to do things back in the days when no universal charset existed and thus multiple CCSs had to be allowed in a single stream of octets. For example, in X.400 you had the generaltext body part, which used ISO 2022 as the CES combined with an essentially arbitrary set of CCSs that were specified both inline and out of band. But instead we ended up with multiple CESs, which were either profiled subsets of ISO 2022 or schemes where the hi bit was essentially a CCS flag. It's much more straightforward to handle such things as charsets, so that's what we did. See RFC 2978 for additional details.
So does RFC 20 define a CES as well?
No. Of course there's an obvious CES to associate with it: The mapping of the 128 integer values it defines to octets with the same value. Do that and you essentially have the US-ASCII charset.
If it does not, should we have an additional document taking care of this?
I don't see why. The utility of RFC 20 lies in its specification of the meaning of various characters. If you're using it as a specification for the US-ASCII charset, you're using it incorrectly because like it or not, it doesn't specify that. RFC 2046 does that by referencing ANSI X3.4-1986. No doubt it would have been better to reference RFC 20 for the CCS part, but it wasn't online at the time. Ned