Should the IETF be condoning, even promoting, BOM pollution?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> The reason for the BOM was so that existing tools will load the file correctly in absence of character encoding information.
> 
> (AFAIR, the ability to make tools like Notepad "do the right thing" was an important step to actually get to the decision to allow non-ASCII characters).
> 
> And yes, this is only relevant for plain text (as opposed to HTML), served from the file system.

Employing the Byte Order Mark (BOM), which is needed in UTF-16 but not in UTF-8, as a file “signature” (magic number) to identify plain text files that use UTF-8 beyond ASCII, is well known to have caused many of the problems in migrating to UTF-8.

The problems come both from tools that otherwise would have no problem upgrading from ASCII to UTF-8, that are now malfunctioning because of those BOMs, and from tools that now suddenly *expect* that all UTF-8 files beyond ASCII have that signature and no longer work when they don’t.  The first set of problems is confounded by tools that are silently inserting BOMs at various stages of processing UTF-8 files (BOM pollution), and by other tools that make any BOM present in a plain text file invisible to casual examination so problems caused by BOM pollution are hard to recognize.

The problems caused by BOM pollution were already well understood at the time when the various standards around UTF-8 were written.  Unicode itself recommends against it.  RFC 3629 has a whole section denouncing it.  RFC 5198 is careful to avoid BOM pollution in network unicode.

So the standards message is clear.  No BOM pollution.

Yet, on the operational side, the IETF has failed for more than a decade to properly serve UTF-8 in its own systems.  Now that we finally provide RFCs with UTF-8 beyond ASCII, we go ahead and embrace BOM pollution as if we didn’t know what we are doing.

This sends the message that BOM pollution is actually OK, even maybe the right thing everybody else should be doing as well, and the standards documents are for preaching on Sundays but to be ignored when it comes to actual practice.  It’s as if we were running all our servers without security because that might be considered operationally more expedient.

Grüße, Carsten





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]