Hi - > From: "Tom.Petch" <sisyphus@xxxxxxxxxxxxxx> > To: "Julian Reschke" <julian.reschke@xxxxxx> > Cc: "ietf" <ietf@xxxxxxxx> > Sent: Wednesday, December 28, 2005 8:06 AM > Subject: Re: Troubles with UTF-8 ... > I agree, for XML, but my main concern is with UTF-8 encoded strings, where > FormFeed is a legal character, encoded as it would be in ASCII. I was using the > 'illegal syntax' to float an alternative approach, like using %xC1 - which is > illegal in > UTF-8 - to delimit a UTF-8 string, but as I say, that idea does not seem to have > caught on within the IETF. ... I think the use of explicitly encoded length, rather than special terminator or deliminator sequences, is simpler to code and debug, as well as being more robust in avoiding buffer overflow problems, etc. This is especially true given the variable-length encoding inherent in UTF-8, as well as the open-ended way that combining marks follow, rather than precede the characters to which they apply. (I think this was the "state" that Masataka Ohta was referring to.) Reserving NUL as a special terminator is a C library-ism. I think that history has shown that the use of this kind of mechanism, rather than explicitly tracking the string's length, was a mistake. Randy _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf