Re: Troubles with UTF-8

Tim Bray <tbray@xxxxxxxxxxxxxx> · Sun, 01 Jan 2006 23:40:43 -0800

On Dec 28, 2005, at 5:05 AM, Masataka Ohta wrote:

That problem is that Unicode is stateful with complex and
indefinitely long term states

Has this ever caused a real problem to a real programmer in real life?

I have written a whole bunch of mission-critical code that reads and  
generates UTF-8, and any correct implementation will have to deal  
with the fact that there is no necessary connection between the  
number of glyphs on the screen and bytes in its encoding.  It would  
be perfectly reasonable for an implementation to declare a  
limitation, for example that it will not process than 32 trailing  
modifiers on any character, and this would not cause problems in  
production because sequences of such a length do not occur in the  
encoding of any known text.

Which is to say, Ohta's statement about statefulness is true, but the  
conclusion that this is a "problem" is erroneous. -Tim

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf