Re: Troubles with UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
From: "Randy Presuhn" <randy_presuhn@xxxxxxxxxxxxxx>
To: "ietf" <ietf@xxxxxxxx>
Sent: Wednesday, December 28, 2005 9:46 PM
Subject: Re: Troubles with UTF-8
>
> > From: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx>
> > To: "Julian Reschke" <julian.reschke@xxxxxx>
> > Cc: "ietf" <ietf@xxxxxxxx>
> > Sent: Wednesday, December 28, 2005 8:06 AM
> > Subject: Re: Troubles with UTF-8
> ...
> > I agree, for XML, but my main concern is with UTF-8 encoded strings, where
> > FormFeed is a legal character, encoded as it would be in ASCII.  I was using
the
> > 'illegal syntax' to float an alternative approach, like using %xC1 - which
is
> > illegal in
> > UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to
have
> > caught on  within the IETF.
> ...
>
> I think the use of explicitly encoded length, rather than special terminator
> or deliminator sequences, is simpler to code and debug, as well as
> being more robust in avoiding buffer overflow problems, etc.  This
> is especially true given the variable-length encoding inherent in UTF-8,
> as well as the open-ended way that combining marks follow, rather than
> precede the characters to which they apply.  (I think this was the "state"
> that Masataka Ohta was referring to.)
>
> Reserving NUL as a special terminator is a C library-ism.  I think that
> history has shown that the use of this kind of mechanism, rather than
> explicitly tracking the string's length, was a mistake.
>
> Randy
>

I agree with you for 'binary' protocols, intended for machine consumption (OSPF,
SNMP), where the string is usually wrapped up in a binary-encoded TLV; but not
for character ones, intended for humans, in +-printable characters, with
positional or keyword parameters (LDAP[RFC2254], SDP[RFC2327], SASL OTP[RFC2444]
or MIME) where a numeric length, in ASCII characters, would look a little odd to
me.

I always saw NUL as an **IX-ism more than a C-ism, and so of wider use, could be
wrong on that.

Tom Petch


_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]