Re: Troubles with UTF-8

"Tom.Petch" <sisyphus@xxxxxxxxxxxxxx> · Wed, 28 Dec 2005 17:06:05 +0100

----- Original Message -----
From: "Julian Reschke" <julian.reschke@xxxxxx>
To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>
Cc: "ietf" <ietf@xxxxxxxx>
Sent: Wednesday, December 28, 2005 4:16 PM
Subject: Re: Troubles with UTF-8

> Tom.Petch wrote:
> > ----- Original Message -----
> > From: "Harald Tveit Alvestrand" <harald@xxxxxxxxxxxxx>
> > To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>; "Ned Freed"
<ned.freed@xxxxxxxxxxx>
> > Cc: "ietf" <ietf@xxxxxxxx>
> > Sent: Wednesday, December 28, 2005 1:30 PM
> > Subject: Re: Troubles with UTF-8
> >> --On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch"
> >> <sisyphus@xxxxxxxxxxxxxx> wrote:
> >>
> >>> The Unicode data I am thinking of may have come from an upper layer
> >>> protocol and needs to be passed transparently (as with an error or hello
> >>> message, identity even); it may or may not already be NUL-terminated
> >>> (ever had that security foul-up where some userid/password are
> >>> entered/stored NUL-terminated and some are not?) - hence I see the need
> >>> to terminate the string in some other way, or to escape or in some other
> >>> way transfer encode (parts of) the string.  I looked at existing RFC,
> >>> found many different approaches, all viable but none that really said to
> >>> me 'this is good engineering, this is best practice'.  Hence, floating
> >>> the issue to see if there were any better ones out there. I think not,
> >>> which is of itself worth knowing.
> >> There are many strong opinions around "proper" treatment of XML and of
> >> text, and it would be a shame to ask for advice now, reach a seemingly
> >> reasonable conclusion, and then encounter violent objections at IETF Last
> >> Call.
> >>
> > The 'illegal syntax' is not yet an RFC but is in
draft-ietf-netconf-ssh-05.txt
> > which says
> >    "As the previous example illustrates, a special character sequence,
> >     ]]>]]>, MUST be sent by both the client and the server after each XML
> >     document in the NETCONF exchange.  This character sequence cannot
> >     legally appear in an XML document, so it can be unambigiously used to
> >     indentify the end of the current document, allowing resynchronization
> >     of the NETCONF exchange in the event of an XML syntax or parsing
> >     error."
> > For me, that is ok; the 'illegal syntax' is part of the transport syntax not
> > part of the XML syntax and so is not illegal, if you follow me:-)
> > ...
> Why don't you use an illegal *character* instead, such as Formfeed?
> That's certainly easier to parse...
>
> Best regards, Julian

I agree, for XML, but my main concern is with UTF-8 encoded strings, where
FormFeed is a legal character, encoded as it would be in ASCII.  I was using the
'illegal syntax' to float an alternative approach, like using %xC1 - which is
illegal in
UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to have
caught on  within the IETF.

Tom Petch

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf