Re: RFC Series publishes first RFC with non-ASCII characters

Toerless Eckert <tte@xxxxxxxxx> · Mon, 18 Sep 2017 16:53:55 +0200

I am somewhat unclear about the process leading to this work
so please excuse one question:

Which working group/mailing list would i had
to participate in to influlence this decision making ? 

Thanks
    Toerless

On Fri, Sep 15, 2017 at 09:05:26AM -0700, Heather Flanagan (RFC Series Editor) wrote:
> On 9/15/17 3:16 AM, Masataka Ohta wrote:
> > Heather Flanagan (RFC Series Editor) wrote:
> >
> >> RFC 8187, "Indicating Character Encoding and Language for HTTP Header
> >> Field Parameters", is the first RFC to be published with UTF-8 encoding
> >> and include characters not in the basic ASCII character set.
> >
> > Don't do that.
> >
> > It is as stupid as allowing programming languages use non ASCII
> > characters.
> >
> > At first, it seems to be working. However, ultimately, it makes
> > maintenance of code/rfc impossible, unless all the people
> > maintaining the code/rfc can recognize all the characters
> > in the code/rfc.
> 
> Non-ASCII characters are not trivial to include in a document, at least
> if you want to make sure the document is broadly readable. So, yes, this
> is an area fraught with peril. However, quite a bit of time was put into
> determining what guidance should be applied so that we can handle those
> characters. See https://www.rfc-editor.org/rfc/rfc7997.txt.
> 
> >
> >> This
> >> document has been, with the author's consent, patience, and support,
> >> used to test the existing tool chain to produce RFCs to see where the
> >> environment has difficulty in handling non-ASCII characters.
> >
> > That's not a problem. Problem is in human capability not to be able
> > to recognize all the characters in the world.
> >
> > Internationalized code/rfc must be written using characters recognized
> > by all the international people.
> >
> > Even if someone write a localized code for some locale, it should be
> > written as:
> >
> >     #define NBSP '\240'
> >     ...
> >     putchar(NBSP);
> >
> > not
> >
> >     putchar('\240');
> >
> > to ease maintenance.
> >
> >                             Masataka Ohta
> 
> I don't think we an ignore these characters - they are in use, and we
> need to be able to represent them in a more readable fashion than just
> Unicode escape sequences.
> 
> -Heather
> 
> >
> > PS
> >
> > Language C using full ASCII is already a problem because ASCII back
> > slash character in '\240' is displayed as YEN sign of JIS X 0201
> > (Japanese variant of ISO 646) on almost all computers (including mine
> > I'm using now to write this mail) in Japan. It is not a serious
> > problem in Japan because all the Japanese are taught that YEN sign
> > is an escape character of C. But many Japanese who can use C do not
> > know it is actually ASCII back slash.