Heather Flanagan (RFC Series Editor) wrote:
RFC 8187, "Indicating Character Encoding and Language for HTTP Header Field Parameters", is the first RFC to be published with UTF-8 encoding and include characters not in the basic ASCII character set.
Don't do that. It is as stupid as allowing programming languages use non ASCII characters. At first, it seems to be working. However, ultimately, it makes maintenance of code/rfc impossible, unless all the people maintaining the code/rfc can recognize all the characters in the code/rfc.
This document has been, with the author's consent, patience, and support, used to test the existing tool chain to produce RFCs to see where the environment has difficulty in handling non-ASCII characters.
That's not a problem. Problem is in human capability not to be able to recognize all the characters in the world. Internationalized code/rfc must be written using characters recognized by all the international people. Even if someone write a localized code for some locale, it should be written as: #define NBSP '\240' ... putchar(NBSP); not putchar('\240'); to ease maintenance. Masataka Ohta PS Language C using full ASCII is already a problem because ASCII back slash character in '\240' is displayed as YEN sign of JIS X 0201 (Japanese variant of ISO 646) on almost all computers (including mine I'm using now to write this mail) in Japan. It is not a serious problem in Japan because all the Japanese are taught that YEN sign is an escape character of C. But many Japanese who can use C do not know it is actually ASCII back slash.