Randy Presuhn wrote: [Tom Petch said:] >> I was using the 'illegal syntax' to float an alternative >> approach, like using %xC1 - which is illegal in UTF-8 Illegal today, it wasn't for some time. My UTF-8 "decoder" script would return one SUB for a %xC1 plus the next octet. %xFF and %xFE were always illegal, %xFD was the worst case for 5*6+1 bits u+7FFFFFF in UCS-4. >> that idea does not seem to have caught on within the IETF. u+FFFF (UTF-8 %xEFBFBF) is guaranteed to be no character, it is AFAIK reserved for this purpose. But not "on the wire". > I think the use of explicitly encoded length, rather than > special terminator or deliminator sequences, is simpler to > code and debug, as well as being more robust in avoiding > buffer overflow problems, etc. Yes, abusing %xFF or similar tricks would be like an PDU with an empty header and a constant trailer. Your idea "length in the header" (and maybe a checksum as trailer ?) is better. If that hits the limit for encoded lengths add a mechanism for a "more" flag, or chunks with a "length = 0 is the end", etc. > Reserving NUL as a special terminator is a C library-ism. A leading length has its own drawbacks if you want a string with more than 255 octets after one octet for the length. ;-) > history has shown that the use of this kind of mechanism, > rather than explicitly tracking the string's length, was a > mistake. <CRLF> or whatever isn't too bad with a decent maximal line length (like 1000). If you want arbitrary encoded lengths you would need a delimiter to separate the length from the SDU, or another trick for this effect. Attackers could then try their luck with huge encoded lengths. Bye, Frank _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf