[People, please don't send me copies of list messages by mail. I'm subscribed to the list and read it via a non-mail interface.] "Doug Ewell" <dewell@xxxxxxxxxxxxxx> writes: > Ben Finney <ben plus ietf at benfinney dot id dot au> wrote: > > > The issue remains that the informational RFC presents useful > > mnemonics for many characters, and there doesn't appear to be such > > a thing from Unicode or ISO. That's the point of an update to RFC > > 1345: it serves a purpose that I can't see served comparably well > > elsewhere. > > You might not find much enthusiasm in the character-encoding community > for the mnemonics published in RFC 1345, and later as the so-called > "repertoiremap" in ISO/IEC TR 14652. These have been widely > criticized for their incompleteness, (real or perceived) > arbitrariness, and lack of extensibility to scripts not already > covered. Thanks for this. I agree that, for *encoding* and *naming*, the mnemonics aren't much use anymore; we have superior encodings and Unicode names, so the properties you (correctly) ascribe to the mnemonics in RFC 1345 are not much use for those purposes. The "repertoiremap" of ISO/IEC TR 14652 is apparently meant to be for character transmission and translation only. It seems more extensible for that purpose than the mnemonic approach in RFC 1345. There is one specific application of the RFC 1345 mnemonics for which I've not seen a superior reference: direct character *input* at a keyboard using an input method program. There are numerous programs (e.g. Emacs, SCIM) that support the RFC 1345 character mnemonic table as an input method for typing key sequences to input the corresponding characters. > Most people will agree that "a plus apostrophe" makes a handy > mnemonic for "a with acute," and "c plus comma" works well for "c > with cedilla," but the system tends to break down rather quickly > after that, with Greek letters identified by an asterisk, Cyrillic > by an equal sign, Hebrew by a capital letter and plus sign, Arabic > by a small letter and plus sign, etc. So long as the table follows some kind of system (and the definition of the RFC 1345 character mnemonic table does at least explain the scheme it uses for those character sets), it is still useful as a means of remembering short, discrete mnemonics for a large set of characters. > There are numerous exceptions to these guidelines, especially when > the letters in question don't map cleanly to Basic Latin, and a > large number of non-ideographic characters have no mnemonic at all, > even some that were defined in ISO 10646 at the time RFC 1345 was > published. Yes, the system does have its limits; a mnemonic table cannot reasonably expect to map mnemonic pure-ASCII keyboard characters to *every* set of characters in ISO 10646. But with those limits acknowledged, the mnemonic system can be useful for those character sets where there *is* a reasonable expectation of such a mapping. > That is why you are unlikely to find an update to RFC 1345 that > brings the mnemonics up to date with 10646/Unicode: the task is > almost impossible, given the limitations of the system. Indeed. My initial comment was merely that even the characters that *are* covered by the mnemonic table are not in accord with the current Unicode data. To the extent that the character mnemonic table is useful, it is surely undermined if the data are wrong. > The motivation for inventing these mnemonics seems to have been to > specify characters "in a coded character set independent way," which > was perhaps a sensible goal in 1992 when the Universal Character Set > was quite a bit less universal. I'm beginning to understand the gap of understanding here; I've been approaching this discussion caring *only* about the character mnemonic table in RFC 1345, whereas others have (reasonably) approached the discussion in the context of the entire RFC document and its apparent purpose. > Today, however, virtually all non-10646 character sets are mapped to > 10646 code points, not to alphabetic mnemonics. This is true for the purpose of *encoding*, but for the purpose of *input* at a non-remapped largely-ASCII keyboard, input method programs certainly do map ASCII mnemonic sequences to non-ASCII characters. > Almost any charatcer that can be found in a national or industry > charset can be found in 10646. The need for a notation independent > of 10646 has passed. I think it's clear that the domain of keyboard character input clearly needs brief mnemonic ASCII sequences, not numeric ordinals or descriptive character names, to map to the desired characters. Thanks very much for the discussion, it's becoming clearer now. Two further questions: I'd like to discuss this with the people who made the original RFC 1345 character mnemonic table. How would I get in touch with the authors of RFC 1345? It wasn't my intention to write a new discussion draft, but it seems that since my purpose is significantly different to the broad purpose of RFC 1345 that a new draft aimed at the purpose I have in mind may be warranted. What should I read (URLs please) before doing so? -- \ "If we don't believe in freedom of expression for people we | `\ despise, we don't believe in it at all." -- Noam Chomsky | _o__) | Ben Finney _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf