>>>>> "Tom" == Tom Petch <sisyphus@xxxxxxxxxxxxxx> writes: Tom> You've lost me here. I don't understand the use of state in the Tom> context of Unicode Masataka was refering to the fact that the universal character set contains combining characters and some characters that otherwise alter how subsequent and/or previous characters are treated. As an example, the sequence of the two characters: ,---- | U+0061 LATIN SMALL LETTER A | U+030B COMBINING DOUBLE ACUTE ACCENT `---- which is enocoded in utf-8 as: ,---- | a̋ `---- has state between the base letter and the accent. If the a is lost, the accent will be added to whatever was before the a. Similarly, U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK affect how anything after them is displayed. Their existance in the standard, therefore, makes the standard statefull. The combining accent characters can be added the base characters in arbitrary number and sequence. Not all combinations are currently in use by any written language, of course, but they remain open ended. You can even have multiple instances of a single combining character in a sequence of combining characters. (Consider, eg, a stack of accents where there is a circumflex above a dieresis above a circumflex above the base character. Probably not used by anyone, but it /could/ be.) -JimC -- James H. Cloos, Jr. <cloos@xxxxxxxxxxx> _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf