Re: RFC Series publishes first RFC with non-ASCII characters

"Heather Flanagan (RFC Series Editor)" <rse@xxxxxxxxxxxxxx> · Fri, 15 Sep 2017 09:05:26 -0700

On 9/15/17 3:16 AM, Masataka Ohta wrote:
> Heather Flanagan (RFC Series Editor) wrote:
>
>> RFC 8187, "Indicating Character Encoding and Language for HTTP Header
>> Field Parameters", is the first RFC to be published with UTF-8 encoding
>> and include characters not in the basic ASCII character set.
>
> Don't do that.
>
> It is as stupid as allowing programming languages use non ASCII
> characters.
>
> At first, it seems to be working. However, ultimately, it makes
> maintenance of code/rfc impossible, unless all the people
> maintaining the code/rfc can recognize all the characters
> in the code/rfc.

Non-ASCII characters are not trivial to include in a document, at least
if you want to make sure the document is broadly readable. So, yes, this
is an area fraught with peril. However, quite a bit of time was put into
determining what guidance should be applied so that we can handle those
characters. See https://www.rfc-editor.org/rfc/rfc7997.txt.

>
>> This
>> document has been, with the author's consent, patience, and support,
>> used to test the existing tool chain to produce RFCs to see where the
>> environment has difficulty in handling non-ASCII characters.
>
> That's not a problem. Problem is in human capability not to be able
> to recognize all the characters in the world.
>
> Internationalized code/rfc must be written using characters recognized
> by all the international people.
>
> Even if someone write a localized code for some locale, it should be
> written as:
>
>     #define NBSP '\240'
>     ...
>     putchar(NBSP);
>
> not
>
>     putchar('\240');
>
> to ease maintenance.
>
>                             Masataka Ohta

I don't think we an ignore these characters - they are in use, and we
need to be able to represent them in a more readable fashion than just
Unicode escape sequences.

-Heather

>
> PS
>
> Language C using full ASCII is already a problem because ASCII back
> slash character in '\240' is displayed as YEN sign of JIS X 0201
> (Japanese variant of ISO 646) on almost all computers (including mine
> I'm using now to write this mail) in Japan. It is not a serious
> problem in Japan because all the Japanese are taught that YEN sign
> is an escape character of C. But many Japanese who can use C do not
> know it is actually ASCII back slash.
>