At 13:44 23/12/2005, Masataka Ohta wrote:
Tom.Petch wrote:
> Overall, my perception is that we have the political statement -
UTF-8 will be
> used - but have not yet worked out all the engineering ramifications.
Correct. Like so many results of IETF, enforcing Unicode just does
not work.
Amen. This is an architectural feature decided for political reasons
which does not scale.
But, never mind. Unicode has nothing to do with the internationalization.
I beg to differ on wording. Internationalization is an IETF/Unicode
word. It is part of the equation "globalization=global environment
internationalization + local environment localization". Its IBM
understanding is to reduce the lingual barrier between the core and
the ends it relates with. I think it is appropriate to the IETF
US-ASCII based Internet technology.
But the real world is "multinationalization" (if to keep the same
image, or multilingualization): the same but for every end to end
relation (and languages). Let consider the IETF RFC 2277 proposition:
content must be in Unicode (client system) and the protocol is in
US-ASCII (core system). A document may look being in a language, but
when you read its source it is in English interspread with unicoded text.
The internationalization (RFC 3066bis) culture is unilateral.
Networking calls for a multilateral culture architecture (RFC 4151 may help).
The only solution I see, which addresses the requirements of Tom
Petch, is to go through a common universalisation layer (not charset
dependent), accepting the existing US-ASCII environment of Masataka
Ohta as a maximum. It should then down to Hexa. Getting rid of the
Unicode based layer violations, and permitting a full charset support
strategy where Unicode could fully play its role of common reference.
Obviously two-tier policies based on langtags could not develop as
easily as planned.
jfc
> others to
> 0000-00FF, essentially Latin-1, which suits many Western languages but
> is not truly international.
The only appropriate subset of Unicode is 0000-007f, ASCII. Latin-1,
which introduced the confusions of the currency symbol and NBSP, is
already overkill.
> Unicode lacks a no-op, a meaningless octet,
The confusion of NBSP implies that spaces are not so meaningful
octets so that it may be replaced by line break characters.
So, the situation is worse than you would have considered and even
full Latin-1 is hopeless.
Just interpret UTF-8 ASCII.
Masataka Ohta
_______________________________________________
Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf
_______________________________________________
Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf