Re: RFC Series publishes first RFC with non-ASCII characters

Masataka Ohta <mohta@xxxxxxxxxxxxxxxxxxxxxxxxxx> · Fri, 15 Sep 2017 19:16:03 +0900

Heather Flanagan (RFC Series Editor) wrote:

RFC 8187, "Indicating Character Encoding and Language for HTTP Header
Field Parameters", is the first RFC to be published with UTF-8 encoding
and include characters not in the basic ASCII character set.

Don't do that.

It is as stupid as allowing programming languages use non ASCII
characters.

At first, it seems to be working. However, ultimately, it makes
maintenance of code/rfc impossible, unless all the people
maintaining the code/rfc can recognize all the characters
in the code/rfc.

This
document has been, with the author's consent, patience, and support,
used to test the existing tool chain to produce RFCs to see where the
environment has difficulty in handling non-ASCII characters.

That's not a problem. Problem is in human capability not to be able
to recognize all the characters in the world.

Internationalized code/rfc must be written using characters recognized
by all the international people.

Even if someone write a localized code for some locale, it should be
written as:

	#define NBSP '\240'
	...
	putchar(NBSP);

not

	putchar('\240');

to ease maintenance.

							Masataka Ohta

PS

Language C using full ASCII is already a problem because ASCII back
slash character in '\240' is displayed as YEN sign of JIS X 0201
(Japanese variant of ISO 646) on almost all computers (including mine
I'm using now to write this mail) in Japan. It is not a serious
problem in Japan because all the Japanese are taught that YEN sign
is an escape character of C. But many Japanese who can use C do not
know it is actually ASCII back slash.