--On Thursday, 10 January, 2008 15:21 +0100 Frank Ellermann <nobody@xxxxxxxxxxxxxxxxx> wrote: >... > Hopefully somebody can confirm that IND is correct, or not. > For HT and FF I hope the final version will somehow express > that both are not really bad, and as far as they're bad FF is > worse than HT. I'm open to consensus about changes for either HT or FF, but the theory of "bad" that was used to construct the spec was: (i) If a "spacing" control has the effect of setting the position of the next character, it is "bad" unless that position is unambiguous. In addition, things are "bad" unless they are necessary in running text (as distinct from faking things that are better handled in markup, followed by either device-specific output or standard page representations, neither of which are normal text). It is unambiguous for SP. It is unambiguous for CRLF. Independent of the "what is a line-end" problem, it is somewhat ambiguous for CR or LF alone and for IND. It is ambiguous for HT. It would be ambiguous for FF except that FF is assigned fairly clear semantics in NVT -- "FF" is not a line ending (CRLF FF is needed) and as Bob Braden noted, there is a fairly clear rule that FF is to be interpreted as "top of next page" if one knows what a page is and as "blank line" otherwise. But that rule is sufficiently often ignored to call for considerable caution about FF, and the text now contains a cautionary note for that reason. There is an interesting demonstration of the law of unintended consequences here. If we could tell that a string was unambiguously UTF-8 (or whatever) by looking at it, even if it contains nothing but ASCII characters, then there would be no reason to try to make net-utf8 a proper superset of NVT. If we could do that, we could also do away with the entire "next line" debate by prohibiting even CRLF and requiring the use of LS (U+2028). In retrospect, there might have been considerable advantages to forcing the ASCII- UTF-8 distinction by requiring that UTF-8 strings all start with a BOM, but it is far too late for that (and probably not, on balance, a good idea despite its advantages). So I don't see how to get there from here -- we are stuck, for historical reasons, with CRLF on the wire as what The Unicode Standard calls NLF (incidentally, Unicode 5.0, Section 5.8, provides significant insight into the complexity of this problem and probably should have been referenced. It would be even more helpful had Table 5-2 included identifying CRLF as a standard Internet "wire" form of NLF, not just binding that form to Windows. > My impression from reading the draft was exactly the opposite, > FF not too bad, HT really bad, that's odd for protocols > allowing WSP. See above. john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf