Re: I-D file formats and internationalization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote:
At 1:54 PM -0800 11/30/05, Douglas Otis wrote:

Rather than opening RFCs to text utilizing any character-set anywhere, as this draft suggests,

That is not what the RFC suggests at all. The character set is Unicode. The encoding is UTF-8. That's it.

Unicode provides a unique number for every possible character within a current range of about 97,000 characters. These characters include punctuation marks, diacritics, mathematical and technical symbols, arrows, dingbats, etc. Displaying one of these characters requires a character-set (synonymous with a display system's font-set or character-repertoire), or using the unicode vernacular, a script. It is not just a matter of which character is displayed, which character- repertoire is used, but there are also Middle Eastern right-to-left issues as well.


there could be alternative UTF fields for an author's name and reference titles, and perhaps defined characters for simple line and table drawing that invoke automatic translation when an ASCII text version is generated.

That's a possibility (if you define what an "alternative UTF field" is). Why is it better than simply using UTF-8 everywhere?

Such alternative field could be an added element to the DTD or Schema defining the XML input document. When the output is other than ASCII, the alternative field could be displayed. To allow compatibility with existing tools, the ASCII version would not be affected. Permitting access to _some_ extended characters could improve upon the quality of some line-drawing for non-ASCII outputs.

To avoid the "pain-in-the-ass" issue, improved drawings could be generated by a simple web based drawing application, where the translation back into ASCII artwork would be straight-forward, and yet provide comparable results. Currently, improved graphics are limited to the generation of HTML tables. The drawing application could even create the needed XML wrapper for an RFC.


Being able to review the ID as it would appear as an RFC would also seem to be a requirement.

That means changing the Internet Drafts process as well. Certainly possible, but more daunting that changing one process at a time.

As an ID becomes an RFC, it seems expecting last minute changes to the document would be even more daunting.


It seems problematic for protocol examples to use non-ASCII characters owing to there not being ubiquitously displayable character-sets.

Unicode is universally displayable if you have the right font(s). Regardless of that, however, any sane document author would not assume that every person reading the document could display it. They would put a legend or explanation near the example.

Assume such characters can not be displayed, at least not with the ASCII version that excludes the extended character-set allowed by unicode. An escape mechanism would be needed to accommodate alternative text, where displaying '?' for the unicode characters that extends beyond ASCII would not be a very satisfactory solution, as this would make the ASCII version less authoritative, to say the least, and break the way many use the RFC text files. I liked the idea that Frank suggested, use the HTML escape sequence to declare the unicode character. This allows the ASCII version to remain authoritative.

-Doug






_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]