--On Monday, 07 January, 2008 22:30 +0100 Kent Karlsson <kent.karlsson14@xxxxxxxxx> wrote: > Comment on draft-klensin-net-utf8-07.txt: > > -------------------------- > > "Network Virtual Terminal (NVT)" occurs first in Appendix A. > The explanation of the abbreviation should (also) be given at > the first occurence of "NVT" in the document. Fixed in -09 > -------------------------- > > Section 2, point 2, "Line-endings..." > > "discussion. The newer control characters IND (U+0084) > and NEL ("Next Line", U+0085) might have been used to > disambiguate the" > > I have a hard time figuring out what IND was supposed to be > used for, but I don't think it was for line endings. Chain > printer "font" change is the closest I get... > (http://www.freepatentsonline.com/3699884.html). As far as I can tell, and based on the comments that came from those who suggested that I make that addition, it is an index (same position on next line) function. > NEL is used in EBCDIC originally (IIUC), and still used in > EBCDIC... This is just notation. Whether the function are the same may or may not be relevant. > The description "might have been used to disambiguate" is more > appropriate for U+2028 and U+2029. That is why the next sentence says "Similar observations apply...". These things represent, as far as I can tell, iterative attempts to get things right. > -------------------------- > > "it, lines end in CRLF and only in CRLF. Anything that > does not end in CRLF is either not a line or is > severely malformed." > > The sentence starting with "Anything" seems severely > malformed... You don't really meant to say "Anything", I hope. > "Using other line ending or line separation conventions" > perhaps. And "severely malformed", I hope you did not mean > that either. "is lacking in conversion to > 'net-utf8'/'net-Unicode'" perhaps. Sentence has been rewritten into a conformance statement. > To be "rescrictive in what one emits and permissive/liberal in > what one receives" might be applicable here. > > Upon reciept, the following SHOULD be seen as at least line > ending (or line separating), and in some cases more than that: > > LF, CR+LF, VT, CR+VT, FF, CR+FF, CR (not followed by NUL...), > NEL, CR+NEL, LS, PS > where > LF U+000A > VT U+000B > FF U+000C >... The reasons why the robustness principle should not be applied as you are trying to apply it are an interesting philosophical discussion that does not, IMO, help here. The bottom line is that this is a spec for a single standard format, not a whole serious of variations that senders have the right to assume that receivers will support. I've elided comments below that seem to be just different ways to pursue the theme of "why don't we support every character that might imaginably be a line-ending as if it were one". > -------------------------- > > Section 2, point 3: > > You have made an exception for FF (because they occur in > RFCs?). I think FF SHOULD be avoided, just like VT, NEL, and > more (see above). Even when it is allowed, it, and CR+FF, > should be seen as line separating. No. See above. The question of what characters should be on that list has been discussed endlessly and the text has been changed repeatedly to explain why various proposals. If this work is to be completed, we need to stop somewhere. > You have also (by implication) dismissed HT, U+0009. The > reason for this in unclear. Especially since HT is so common > in plain texts (often with some default tab setting). Mapping > HT to SPs is often a bad idea. I don't think a default tab > setting should be specified, but the effect of somewhat (not > wildly) different defaults for that is not much worse than > using variable width fonts. An explanation appears in -08. > SP, U+0020, is nowadays not seen as a control character, not > even in your own text... (same paragraph). > > > -------------------------- > > "However, because they were optional in NVT applications > and this specification is an NVT superset, they cannot be > prohibited entirely." > > Why not? Why must this be a strict NVT superset? I think it > would be rather important to rule these strange beasts out > from net-utf8. These were really ASCII (ISO 646) features, but > have been ruled out much before Unicode. But you have argued that some of them should be treated as line separators and any system that supports VT100 controls (i.e., U**x or almost any of its children) still require them. > -------------------------- > "[ISO10646] > International Organization for Standardization, > "Information Technology - Universal Multiple- > Octet Coded Character Set (UCS) - Part 1: > Architecture and Basic Multilingual Plane"", > ISO/IEC 10646-1:2000, October 2000." > > That seems a bit old... Better with the current revision: > > ISO/IEC 10646:2003 Information technology -- Universal > Multiple-Octet Coded Character Set (UCS) > > with the amendments (which I don't think you should reference > explicitly): ISO/IEC 10646:2003/Amd 1:2005 Glagolitic, > Coptic, Georgian and other characters ISO/IEC 10646:2003/Amd > 2:2006 N'Ko, Phags-pa, Phoenician and other characters (and > more amendments in the works). Changed in -09. I hope you like the new form better. john _______________________________________________ Ietf@xxxxxxxx http://www.ietf.org/mailman/listinfo/ietf