Re: Troubles with UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tim Bray wrote:

>> That problem is that Unicode is stateful with complex and
>> indefinitely long term states

> Has this ever caused a real problem to a real programmer in real life?

Yes, of course. State information preserved between lines is
really annoying.

But, you miss the point in my original mail:

: Unicode is not even finite state, which means some pattern
: matching and normalization problems are hard or insolvable.

that is, with Unicode, you can not search strings in reasonable
amount of time.

> I have written a whole bunch of mission-critical code that reads and  
> generates UTF-8, and any correct implementation will have to deal  with 
> the fact that there is no necessary connection between the  number of 
> glyphs on the screen and bytes in its encoding.

You completely miss the point. It has nothing to do with the long
term state.

> It would  be perfectly 
> reasonable for an implementation to declare a  limitation, for example 
> that it will not process than 32 trailing  modifiers on any character, 
> and this would not cause problems in  production because sequences of 
> such a length do not occur in the  encoding of any known text.

I said "long term state", which, of course, is not confined in a
character with or without modifiers.

> Which is to say, Ohta's statement about statefulness is true, but the  
> conclusion that this is a "problem" is erroneous. -Tim

Instead, your statement: "I have written a whole bunch of mission-
critical code that reads and generates UTF-8" is untrustworthy.

Of course, it is perfectly reasonable for an implementation to
declare a limitation, for example, that it will not process
non-ASCII characters, which may also be the assumption of your
code.

						Masataka Ohta 



_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]