Users suffer every day from the current character-encoding mess. Anyone who ventures beyond ASCII is faced with endless trouble. There's no theoretical obstacle to making multiple character encodings work---make sure that there is, implicitly or explicitly, a perfectly clear character encoding for every byte stream; make sure that every copy and comparison includes all necessary conversions---but in practice this is a disaster. It's too much work to specify the encodings. It's far too much work to program all the necessary conversions. Look at IDNA. Yet another character encoding. An amazingly unclear specification of which byte streams are supposed to use that encoding. Massive redeployment of, at a minimum, every web browser in the world. And that's just for domain names! Is every worldwide identifier---for example, mailbox names---supposed to have its own massive upgrade? UTF-8 offers a way out of this mess. We do _one_ upgrade to make sure that UTF-8 works everywhere. For example, RFC 2277, IETF Policy on Character Sets and Languages, requires UTF-8 support in all protocols. Then we convert all stored data to UTF-8. Then, finally, we can drop support for the other character encodings. Do we want programmers in twenty years to be faced with the same mess that we have today? Or do we want them focusing on positive features for the users? Keith Moore writes: > The on-the-wire encoding of IDNs is irrelevant; what matters is the > behavior experienced by users. Everything is judged by the user experience, yes, but you are clearly incorrect in saying that the encoding is irrelevant. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago