On Nov 30, 2005, at 2:23 PM, Paul Hoffman wrote:
At 1:54 PM -0800 11/30/05, Douglas Otis wrote:
Rather than opening RFCs to text utilizing any character-set
anywhere, as this draft suggests,
That is not what the RFC suggests at all. The character set is
Unicode. The encoding is UTF-8. That's it.
Unicode provides a unique number for every possible character within
a current range of about 97,000 characters. These characters include
punctuation marks, diacritics, mathematical and technical symbols,
arrows, dingbats, etc. Displaying one of these characters requires a
character-set (synonymous with a display system's font-set or
character-repertoire), or using the unicode vernacular, a script. It
is not just a matter of which character is displayed, which character-
repertoire is used, but there are also Middle Eastern right-to-left
issues as well.
there could be alternative UTF fields for an author's name and
reference titles, and perhaps defined characters for simple line
and table drawing that invoke automatic translation when an ASCII
text version is generated.
That's a possibility (if you define what an "alternative UTF field"
is). Why is it better than simply using UTF-8 everywhere?
Such alternative field could be an added element to the DTD or Schema
defining the XML input document. When the output is other than
ASCII, the alternative field could be displayed. To allow
compatibility with existing tools, the ASCII version would not be
affected. Permitting access to _some_ extended characters could
improve upon the quality of some line-drawing for non-ASCII outputs.
To avoid the "pain-in-the-ass" issue, improved drawings could be
generated by a simple web based drawing application, where the
translation back into ASCII artwork would be straight-forward, and
yet provide comparable results. Currently, improved graphics are
limited to the generation of HTML tables. The drawing application
could even create the needed XML wrapper for an RFC.
Being able to review the ID as it would appear as an RFC would
also seem to be a requirement.
That means changing the Internet Drafts process as well. Certainly
possible, but more daunting that changing one process at a time.
As an ID becomes an RFC, it seems expecting last minute changes to
the document would be even more daunting.
It seems problematic for protocol examples to use non-ASCII
characters owing to there not being ubiquitously displayable
character-sets.
Unicode is universally displayable if you have the right font(s).
Regardless of that, however, any sane document author would not
assume that every person reading the document could display it.
They would put a legend or explanation near the example.
Assume such characters can not be displayed, at least not with the
ASCII version that excludes the extended character-set allowed by
unicode. An escape mechanism would be needed to accommodate
alternative text, where displaying '?' for the unicode characters
that extends beyond ASCII would not be a very satisfactory solution,
as this would make the ASCII version less authoritative, to say the
least, and break the way many use the RFC text files. I liked the
idea that Frank suggested, use the HTML escape sequence to declare
the unicode character. This allows the ASCII version to remain
authoritative.
-Doug
_______________________________________________
Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf