> As for the protocol, I could have sworn that users do not type protocol data > units directly, or at least that they haven't for roughly 25 years. (Another > jibe, citing the fact that utf-8 is, itself, a modification to "raw" unicode > is probably worth repeating, here.) While it doesn't really have a bearing on the rest of your message, this is a common misperception that I'd like to take a moment to correct. When Unicode is expressed as a series of bytes, there are a number of equally valid sncoding schemes (aka serializations). UTF-8 is one of those schemes, and is no more or less a "modification", and no more or less "Unicode" than any other of these schemes. Different encoding schemes may be better for different domains, but the conversion between any of those schemes is fast and lossless. See http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf, Sections 2.4-2.6. (When Unicode started out 15 years ago, the architecture was different; but it has long been structured this way.) Mark __________________________________ http://www.macchiato.com â ààààààààààààààààààààà â