Re: Last Call: draft-klensin-net-utf8 (Unicode Format forNetwork Interchange) to Proposed Standard

"Frank Ellermann" <nobody@xxxxxxxxxxxxxxxxx> · Tue, 15 Jan 2008 03:21:22 +0100

Karlsson, Kent wrote:

> Doing what you call 'old "HTAB-compression"' is a bad idea

Yes, nevertheless it is something that happens sometimes, and
protocols accepting *WSP or 1*WSP can handle the odd effects.
Starting a "folded" line (in headers) with HT isn't too bad.

 [LS and PS] 
> They were introduced in Unicode 1.1, long before the text
> for section 5.8 was drafted (originally as UTS 10).

Sure, the author(s) of this section had to (try to) make sense
out of something that existed.  If adding LS and PS to the zoo
was a bad idea it wasn't the fault of the author(s).  I wasn't
clear what "they" (5.8 or Unicode) I had in in mind, sorry.

> One important point that you have missed is that LS and PS,
> and the difference between THEM, are essential to the bidi
> algorithm.

Indeed, what I've read about "BiDi" is limited.  Often I stop
reading when I see that Harald or Martin already checked it.

> What is or may be done with other NLFs is basically a hack
> (most NLF are treated as if they were PS).

That sounds like familiar format=flowed issues, with "hard"
and "soft" line ends, and it would be out of scope for the
net-utf8 draft.  The net-utf8 draft tells us how to update
or design protocols like telnet and whois, it can't go into
details like e.g. BiDi-limits for nested direction changes,
or how to quote "flowed" paragraphs in e-mail.

> So **DON'T** imply that HT should be replaced by spaces;
> such a replacement WILL have ill display effects.

Protocols using *WSP or 1*WSP in their syntax often need to
separate "white space delimited words", and maybe offer some
folding-magic for overlong lines.  They're not concerned with displaying paragraphs in BiDi-documents correctly, they try
to get whatever it is "over the wire".  The net-utf8 draft
isn't "TUS in a nutshell", it is for lower level protocols.

> I do think that these four characters should not be used.

ACK, and net-utf8 supports this view.  But not mentioning RS
in the relevant TUS section is odd.

>> it mentions MS Word interna which are nobody's business
>> outside of MS Word.

> I guess you refer to VT (LINE TABULATION).

Yes...

> The real reason it is mentioned is that the C and C++
> standards give a special escape for it

...then the section should state the real reason, instead of
talking about "old" variants of a proprietary format, which
could use anything it wishes instead of VT for its purposes.

Blindly adopting what C considers as white space (incl. VT)
is a real problem, e.g. the REXX standard did, erroneously
from my POV.  Arguably irrelevant for net-utf8, but if John
feels that mentioning VT (in addition to FF) is important 
it is fine.  As near to "MUST NOT" as possible, if at all.

IBTD from your opinion that net-utf8 is far from ready, my
HT vs. FF point is only a nit.  In a DKIM-SSP discussion I
already assumed that net-utf8 makes it "as is" (= HT bad),
compare <http://article.gmane.org/gmane.ietf.dkim/8900>.

 Frank

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf