Re: I-D file formats and internationalization

Douglas Otis <dotis@xxxxxxxxxxxxxx> · Wed, 30 Nov 2005 17:59:08 -0800

On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote:
At 1:54 PM -0800 11/30/05, Douglas Otis wrote:

Rather than opening RFCs to text utilizing any character-set  
anywhere, as this draft suggests,

That is not what the RFC suggests at all. The character set is  
Unicode. The encoding is UTF-8. That's it.

Unicode provides a unique number for every possible character within  
a current range of about 97,000 characters.  These characters include  
punctuation marks, diacritics, mathematical and technical symbols,  
arrows, dingbats, etc.  Displaying one of these characters requires a  
character-set (synonymous with a display system's font-set or  
character-repertoire), or using the unicode vernacular, a script.  It  
is not just a matter of which character is displayed, which character- 
repertoire is used, but there are also Middle Eastern right-to-left  
issues as well.

 there could be alternative UTF fields for an author's name and  
reference titles, and perhaps defined characters for simple line  
and table drawing that invoke automatic translation when an ASCII  
text version is generated.

That's a possibility (if you define what an "alternative UTF field"  
is). Why is it better than simply using UTF-8 everywhere?

Such alternative field could be an added element to the DTD or Schema  
defining the XML input document.  When the output is other than  
ASCII, the alternative field could be displayed.  To allow  
compatibility with existing tools, the ASCII version would not be  
affected.  Permitting access to _some_ extended characters could  
improve upon the quality of some line-drawing for non-ASCII outputs.

To avoid the "pain-in-the-ass" issue, improved drawings could be  
generated by a simple web based drawing application, where the  
translation back into ASCII artwork would be straight-forward, and  
yet provide comparable results.  Currently, improved graphics are  
limited to the generation of HTML tables.  The drawing application  
could even create the needed XML wrapper for an RFC.

Being able to review the ID as it would appear as an RFC would  
also seem to be a requirement.

That means changing the Internet Drafts process as well. Certainly  
possible, but more daunting that changing one process at a time.

As an ID becomes an RFC, it seems expecting last minute changes to  
the document would be even more daunting.

  It seems problematic for protocol examples to use non-ASCII  
characters owing to there not being ubiquitously displayable  
character-sets.

Unicode is universally displayable if you have the right font(s).  
Regardless of that, however, any sane document author would not  
assume that every person reading the document could display it.  
They would put a legend or explanation near the example.

Assume such characters can not be displayed, at least not with the  
ASCII version that excludes the extended character-set allowed by  
unicode.  An escape mechanism would be needed to accommodate  
alternative text, where displaying '?' for the unicode characters  
that extends beyond ASCII would not be a very satisfactory solution,  
as this would make the ASCII version less authoritative, to say the  
least, and break the way many use the RFC text files.  I liked the  
idea that Frank suggested, use the HTML escape sequence to declare  
the unicode character.  This allows the ASCII version to remain  
authoritative.

-Doug

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf