Tim Bray wrote: >> That problem is that Unicode is stateful with complex and >> indefinitely long term states > Has this ever caused a real problem to a real programmer in real life? Yes, of course. State information preserved between lines is really annoying. But, you miss the point in my original mail: : Unicode is not even finite state, which means some pattern : matching and normalization problems are hard or insolvable. that is, with Unicode, you can not search strings in reasonable amount of time. > I have written a whole bunch of mission-critical code that reads and > generates UTF-8, and any correct implementation will have to deal with > the fact that there is no necessary connection between the number of > glyphs on the screen and bytes in its encoding. You completely miss the point. It has nothing to do with the long term state. > It would be perfectly > reasonable for an implementation to declare a limitation, for example > that it will not process than 32 trailing modifiers on any character, > and this would not cause problems in production because sequences of > such a length do not occur in the encoding of any known text. I said "long term state", which, of course, is not confined in a character with or without modifiers. > Which is to say, Ohta's statement about statefulness is true, but the > conclusion that this is a "problem" is erroneous. -Tim Instead, your statement: "I have written a whole bunch of mission- critical code that reads and generates UTF-8" is untrustworthy. Of course, it is perfectly reasonable for an implementation to declare a limitation, for example, that it will not process non-ASCII characters, which may also be the assumption of your code. Masataka Ohta _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf