> Folks who haven't been involved in multi-lingual > computing might not realise quite what a minefield this whole > area is. Approximately speaking, choices about how to > represent glyphs/letters and how to encode them usually have > some inherent linguistic bias. In addition, many applications of encoding glyphs need to consider direction. It is one thing to transliterate another writing system into ASCII for the purposes of providing a readable format. It is another thing to have an encoding that works in the other direction, so that some form of ASCII input can be accurately translated into a set of UNICODE glyphs. The former (output tranliteration) is not so hard to achieve, eg. Any russian speaker will understand "Ya ne znayu chto skazat'" and any Chinese who has learned a roman alphabet language should be able to understand "Wo mi-lu le". Of course "Ja nye znaju shto skazat" would be just as readable to most Russians, and most Chinese would struggle with long texts that lack the accents that official Hanyu Pinyin uses to mark the four tones. I think it is conceivable to supply an official output transliteration to ASCII for all Unicode glyphs and much of this work has already been done by language groups, for instance TISCII transliterates Tamil letters into ASCII. Translating from ASCII into Unicode is far more complex and probably impossible without introducing inter-glyph gaps and weird accent codes like RFC 1345 did. Many of the input methods that are used for typing foreign language glyphs into a computer are actually "Input Method Editors" that have inbuilt dictionaries and help users choose one of a series of possible glyphs. For instance, Japanese can represent a single syllable fu with 3 or more glyphs. To choose the right one you need to know if the entire word is borrowed from a foreign language, and if not, then there is still the choice whether or not to use a Hiragana glyph, or choose one of the Kanji glyphs borrowed from Chinese. There are at least two Kanji glyphs with an ON-reading of "fu". In spite of all this complexity on input, there is a standard transliteration for output. > People who use languages that > aren't optimally encoded in some representation tend not to > be very happy with the non-optimal encodings their language > might have been given. That is the key to this whole thing. What is the use-case for these ASCII encodings? If an encoding is not usable by native speakers of the language using the original glyphs, then I can't see any worthwhile use case at all. UTF-8 works for any machine-to-machine communication. Existing input methods work for converting from ASCII to glyphs but these are not simple mapping tables. And existing transliteration systems work for generating readable ASCII text from glyphs, although they may not be fully standardised. For instance a Russian named "Yuri" (transliterated according to English pronunciation rules) will have his name written as "Iouri" on his passport because that is transliterated according to French pronunciation rules since French was the former lingua franca of international diplomacy. RFC 1345 should be deprecated because it misleads application developers e.g. the Lynx case, and the work on transliteration and input methods is being done more effectively outside the IETF. --Michael Dillon _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf