Re: Troubles with UTF-8

"Tom.Petch" <sisyphus@xxxxxxxxxxxxxx> · Wed, 28 Dec 2005 14:54:40 +0100

----- Original Message -----
From: "Harald Tveit Alvestrand" <harald@xxxxxxxxxxxxx>
To: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>; "Ned Freed" <ned.freed@xxxxxxxxxxx>
Cc: "ietf" <ietf@xxxxxxxx>
Sent: Wednesday, December 28, 2005 1:30 PM
Subject: Re: Troubles with UTF-8
> --On onsdag, desember 28, 2005 10:09:05 +0100 "Tom.Petch"
> <sisyphus@xxxxxxxxxxxxxx> wrote:
>
> > The Unicode data I am thinking of may have come from an upper layer
> > protocol and needs to be passed transparently (as with an error or hello
> > message, identity even); it may or may not already be NUL-terminated
> > (ever had that security foul-up where some userid/password are
> > entered/stored NUL-terminated and some are not?) - hence I see the need
> > to terminate the string in some other way, or to escape or in some other
> > way transfer encode (parts of) the string.  I looked at existing RFC,
> > found many different approaches, all viable but none that really said to
> > me 'this is good engineering, this is best practice'.  Hence, floating
> > the issue to see if there were any better ones out there. I think not,
> > which is of itself worth knowing.
>
> There are many strong opinions around "proper" treatment of XML and of
> text, and it would be a shame to ask for advice now, reach a seemingly
> reasonable conclusion, and then encounter violent objections at IETF Last
> Call.
>
> (the WG that went for illegal syntax as "terminator" SHOULD have caused
> such a reaction, IMHO; I guess the people who care missed it....)

The 'illegal syntax' is not yet an RFC but is in draft-ietf-netconf-ssh-05.txt
which says
   "As the previous example illustrates, a special character sequence,
    ]]>]]>, MUST be sent by both the client and the server after each XML
    document in the NETCONF exchange.  This character sequence cannot
    legally appear in an XML document, so it can be unambigiously used to
    indentify the end of the current document, allowing resynchronization
    of the NETCONF exchange in the event of an XML syntax or parsing
    error."
For me, that is ok; the 'illegal syntax' is part of the transport syntax not
part of the XML syntax and so is not illegal, if you follow me:-)

The differing treatments of NUL I see in UTF-8 strings, out of many such RFC,
are
RFC3748 [EAP] only the portion of the field prior to the null is displayed, MUST
NOT be null terminated
RFC2444 [SASL] terminated by a NUL (0) octet
RFC2869 [RADIUS] MUST be able to deal with embedded nulls
RFC3315 [DHCP] MUST NOT be null-terminated.
and, just to be different,
RFC2595 [TLS+IMAP]  authorization identity followed by a US-ASCII NUL followed
by the authentication identity followed by a US-ASCII NUL character followed by
the clear-text password  [this also subsets UTF-8]
RFC3413 [SNMP tag]  Delimiter characters are defined to be one of the following
characters: - space character (0x20), TAB character (0x09), carriage return
character (0x0D), line feed character (0x0A)

All doubtless with good reason for being the way they are but it does cover the
spectrum.  As I said above, I quite like 'illegal syntax' and so would happily
terminate UTF-8 (and it is UTF-8 that IETF has mandated, not UTF-nnn) with
%xC1 - but I see that that will not find favour.

Tom Petch

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf