Valdis.Kletnieks@vt.edu writes: > Hmm.. so you're saying that *ALL* that code out there that > double-checked that things that claimed (possibly implicitly) to be > USASCII were in fact in the 0-127 range are "crusty" code? > Damn. Sendmail 8.12.3.Beta1 is crusty - it actually bothers checking. Time for some facts. Sendmail, by default, does _not_ enforce the 0-127 restriction for mail message headers. It allows bytes 160-255. Otherwise European users would be dumping Sendmail even more quickly than they are today; ISO 8859-1 Subject lines are extremely popular. Sendmail _does_ discard bytes 128-159 in mail message headers, because it uses those bytes internally for its internal macro handling. Those bytes aren't used in ISO 8859-1, but they are used in UTF-8. See http://pi.cr.yp.to for a concrete example. I sent Allman some email in February 1999 suggesting that he convert 128 -> 255 160 129 -> 255 161 ... 159 -> 255 191 255 -> 255 255 with the opposite conversion on output. There have been several security-fix releases of sendmail since then, so we could have had the 128-159 problem fixed on a huge number of machines. But he ignored the suggestion. Apparently he doesn't care about international users. People proposed more than a decade ago that the IETF require 8-bit-clean mail software. (See, for example, Andre Pirard's ietf-smtp message dated Tue, 19 Feb 91 12:08:00 +0100.) The only objection to this requirement was the claim that 8-bit support would take a long time to be deployed. Paul Vixie said that he had some seven-year-old sendmail binaries, for example, and concluded ``with near-certainty'' that ``any changes to the SMTP spec will take at least a decade to reach 90% of the critical server population.'' In fact, it took less than a decade for every critical server to add support for 8-bit message bodies, even though the IETF _still_ doesn't require this. If the SMTP specification had been changed in 1991 to require transparent 8-bit handling in both the header and the body, we wouldn't have Sendmail's UTF-8 problems today. Sendmail's continued data corruption is an embarrassment to the Sendmail company. The fact that RFC 2821 and RFC 2822 allow this garbage is an embarrassment to the IETF. ---D. J. Bernstein, Associate Professor, Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago