Comment on draft-klensin-net-utf8-07.txt: -------------------------- "Network Virtual Terminal (NVT)" occurs first in Appendix A. The explanation of the abbreviation should (also) be given at the first occurence of "NVT" in the document. -------------------------- Section 2, point 2, "Line-endings..." "discussion. The newer control characters IND (U+0084) and NEL ("Next Line", U+0085) might have been used to disambiguate the" I have a hard time figuring out what IND was supposed to be used for, but I don't think it was for line endings. Chain printer "font" change is the closest I get... (http://www.freepatentsonline.com/3699884.html). NEL is used in EBCDIC originally (IIUC), and still used in EBCDIC... The description "might have been used to disambiguate" is more appropriate for U+2028 and U+2029. -------------------------- "it, lines end in CRLF and only in CRLF. Anything that does not end in CRLF is either not a line or is severely malformed." The sentence starting with "Anything" seems severely malformed... You don't really meant to say "Anything", I hope. "Using other line ending or line separation conventions" perhaps. And "severely malformed", I hope you did not mean that either. "is lacking in conversion to 'net-utf8'/'net-Unicode'" perhaps. To be "rescrictive in what one emits and permissive/liberal in what one receives" might be applicable here. Upon reciept, the following SHOULD be seen as at least line ending (or line separating), and in some cases more than that: LF, CR+LF, VT, CR+VT, FF, CR+FF, CR (not followed by NUL...), NEL, CR+NEL, LS, PS where LF U+000A VT U+000B FF U+000C CR U+000D NEL U+0085 LS U+2028 PS U+2029 even FS, GS, RS where FS U+001C GS U+001D RS U+001E should be seen as line separating (Unicode specifies these as having bidi property B, which effectively means they are paragraph separating). Apart from CR+LF, these SHOULD NOT be emitted for net-utf8, unless that is overriden by the protocol specification (like allowing FF, or CR+FF). When faced with any of these in input **to be emitted as net-utf8**, each of these SHOULD be converted to a CR+LF (unless that is overridden by the protocol in question). -------------------------- Section 2, point 3: You have made an exception for FF (because they occur in RFCs?). I think FF SHOULD be avoided, just like VT, NEL, and more (see above). Even when it is allowed, it, and CR+FF, should be seen as line separating. You have also (by implication) dismissed HT, U+0009. The reason for this in unclear. Especially since HT is so common in plain texts (often with some default tab setting). Mapping HT to SPs is often a bad idea. I don't think a default tab setting should be specified, but the effect of somewhat (not wildly) different defaults for that is not much worse than using variable width fonts. SP, U+0020, is nowadays not seen as a control character, not even in your own text... (same paragraph). -------------------------- "However, because they were optional in NVT applications and this specification is an NVT superset, they cannot be prohibited entirely." Why not? Why must this be a strict NVT superset? I think it would be rather important to rule these strange beasts out from net-utf8. These were really ASCII (ISO 646) features, but have been ruled out much before Unicode. -------------------------- "The most important of these rules is that CR MUST NOT appear unless it is immediately followed by LF (indicating end of line) or NUL." I don't see how that follows (read: that does not follow). -------------------------- "[ISO10646] International Organization for Standardization, "Information Technology - Universal Multiple- Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane"", ISO/IEC 10646-1:2000, October 2000." That seems a bit old... Better with the current revision: ISO/IEC 10646:2003 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) with the amendments (which I don't think you should reference explicitly): ISO/IEC 10646:2003/Amd 1:2005 Glagolitic, Coptic, Georgian and other characters ISO/IEC 10646:2003/Amd 2:2006 N'Ko, Phags-pa, Phoenician and other characters (and more amendments in the works). -------------------------- _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf