John C Klensin wrote: > It is ambiguous for HT. Yes, but we typically don't care about this in protocols as long as it behaves like one or more spaces. I think that's the idea of "WSP = SP / HTAB ; white space" in RFC 4234bis, waiting for its STD number. We talked about the 4234bis issue of "trailing white space", which could cause havoc when it is silently removed, and a "really empty line" is not the same as an "apparently empty line" (i.e. CRLF CRLF vs. CRLF 1*WSP CRLF). A similar robustness principle would support to accept old "HTAB-compression" or "HTAB-beautification" (e.g. as first character in a folded line). In other words WSP, not only SP. It is clear that the outcome is ambiguous, but in some protocols I care about (headers in MIME, mail, and news) *WSP or 1*WSP are acceptable. Admittedly it is a pain when signatures need white space canonicalization. But replacing *WSP by *SP would only simplify this step, not get rid of it. [About CRLF] > Unicode 5.0, Section 5.8, provides significant insight into > the complexity of this problem and probably should have > been referenced. It would be even more helpful had Table > 5-2 included identifying CRLF as a standard Internet "wire" > form of NLF, not just binding that form to Windows. Indeed, this chapter offers significantly *broken* insight for our purposes. What they found was a horrible mess, then they introduced wannabe-unambiguous LS + PS, and what they arrived at was messier than before. Claiming that CRLF is "windows" is odd for DOS + OS/2 users, it is also at odds with numerous Internet standards - precisely the reason why we need your draft. The chapter talks about line and paragraph separators without mentioning relevant ASCII controls such as RS. On the other hand it mentions MS Word interna which are nobody's business outside of MS Word. It is interesting, but IMO unusable for net-utf8. Frank _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf