----- Original Message ----- From: "Harald Tveit Alvestrand" <harald@xxxxxxxxxxxxx> To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>; "Ned Freed" <ned.freed@xxxxxxxxxxx> Cc: "ietf" <ietf@xxxxxxxx> Sent: Wednesday, December 28, 2005 1:30 PM Subject: Re: Troubles with UTF-8 > --On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch" > <sisyphus@xxxxxxxxxxxxxx> wrote: > > > The Unicode data I am thinking of may have come from an upper layer > > protocol and needs to be passed transparently (as with an error or hello > > message, identity even); it may or may not already be NUL-terminated > > (ever had that security foul-up where some userid/password are > > entered/stored NUL-terminated and some are not?) - hence I see the need > > to terminate the string in some other way, or to escape or in some other > > way transfer encode (parts of) the string. I looked at existing RFC, > > found many different approaches, all viable but none that really said to > > me 'this is good engineering, this is best practice'. Hence, floating > > the issue to see if there were any better ones out there. I think not, > > which is of itself worth knowing. > > There are many strong opinions around "proper" treatment of XML and of > text, and it would be a shame to ask for advice now, reach a seemingly > reasonable conclusion, and then encounter violent objections at IETF Last > Call. > > (the WG that went for illegal syntax as "terminator" SHOULD have caused > such a reaction, IMHO; I guess the people who care missed it....) The 'illegal syntax' is not yet an RFC but is in draft-ietf-netconf-ssh-05.txt which says "As the previous example illustrates, a special character sequence, ]]>]]>, MUST be sent by both the client and the server after each XML document in the NETCONF exchange. This character sequence cannot legally appear in an XML document, so it can be unambigiously used to indentify the end of the current document, allowing resynchronization of the NETCONF exchange in the event of an XML syntax or parsing error." For me, that is ok; the 'illegal syntax' is part of the transport syntax not part of the XML syntax and so is not illegal, if you follow me:-) The differing treatments of NUL I see in UTF-8 strings, out of many such RFC, are RFC3748 [EAP] only the portion of the field prior to the null is displayed, MUST NOT be null terminated RFC2444 [SASL] terminated by a NUL (0) octet RFC2869 [RADIUS] MUST be able to deal with embedded nulls RFC3315 [DHCP] MUST NOT be null-terminated. and, just to be different, RFC2595 [TLS+IMAP] authorization identity followed by a US-ASCII NUL followed by the authentication identity followed by a US-ASCII NUL character followed by the clear-text password [this also subsets UTF-8] RFC3413 [SNMP tag] Delimiter characters are defined to be one of the following characters: - space character (0x20), TAB character (0x09), carriage return character (0x0D), line feed character (0x0A) All doubtless with good reason for being the way they are but it does cover the spectrum. As I said above, I quite like 'illegal syntax' and so would happily terminate UTF-8 (and it is UTF-8 that IETF has mandated, not UTF-nnn) with %xC1 - but I see that that will not find favour. Tom Petch _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf