----- Original Message ----- From: "Julian Reschke" <julian.reschke@xxxxxx> To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx> Cc: "ietf" <ietf@xxxxxxxx> Sent: Wednesday, December 28, 2005 4:16 PM Subject: Re: Troubles with UTF-8 > Tom.Petch wrote: > > ----- Original Message ----- > > From: "Harald Tveit Alvestrand" <harald@xxxxxxxxxxxxx> > > To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>; "Ned Freed" <ned.freed@xxxxxxxxxxx> > > Cc: "ietf" <ietf@xxxxxxxx> > > Sent: Wednesday, December 28, 2005 1:30 PM > > Subject: Re: Troubles with UTF-8 > >> --On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch" > >> <sisyphus@xxxxxxxxxxxxxx> wrote: > >> > >>> The Unicode data I am thinking of may have come from an upper layer > >>> protocol and needs to be passed transparently (as with an error or hello > >>> message, identity even); it may or may not already be NUL-terminated > >>> (ever had that security foul-up where some userid/password are > >>> entered/stored NUL-terminated and some are not?) - hence I see the need > >>> to terminate the string in some other way, or to escape or in some other > >>> way transfer encode (parts of) the string. I looked at existing RFC, > >>> found many different approaches, all viable but none that really said to > >>> me 'this is good engineering, this is best practice'. Hence, floating > >>> the issue to see if there were any better ones out there. I think not, > >>> which is of itself worth knowing. > >> There are many strong opinions around "proper" treatment of XML and of > >> text, and it would be a shame to ask for advice now, reach a seemingly > >> reasonable conclusion, and then encounter violent objections at IETF Last > >> Call. > >> > > The 'illegal syntax' is not yet an RFC but is in draft-ietf-netconf-ssh-05.txt > > which says > > "As the previous example illustrates, a special character sequence, > > ]]>]]>, MUST be sent by both the client and the server after each XML > > document in the NETCONF exchange. This character sequence cannot > > legally appear in an XML document, so it can be unambigiously used to > > indentify the end of the current document, allowing resynchronization > > of the NETCONF exchange in the event of an XML syntax or parsing > > error." > > For me, that is ok; the 'illegal syntax' is part of the transport syntax not > > part of the XML syntax and so is not illegal, if you follow me:-) > > ... > Why don't you use an illegal *character* instead, such as Formfeed? > That's certainly easier to parse... > > Best regards, Julian I agree, for XML, but my main concern is with UTF-8 encoded strings, where FormFeed is a legal character, encoded as it would be in ASCII. I was using the 'illegal syntax' to float an alternative approach, like using %xC1 - which is illegal in UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to have caught on within the IETF. Tom Petch _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf