Re: Should the IETF be condoning, even promoting, BOM pollution?

Brian E Carpenter <brian.e.carpenter@xxxxxxxxx> · Wed, 20 Sep 2017 08:26:59 +1200

On 20/09/2017 07:19, Julian Reschke wrote:
> On 2017-09-19 20:35, Ted Lemon wrote:
>> On Sep 19, 2017, at 1:16 PM, Julian Reschke <julian.reschke@xxxxxx 
>> <mailto:julian.reschke@xxxxxx>> wrote:
>>> Can you please point to *something* that says it's wrong to use the 
>>> BOM in UTF-8 encoded documents of type text/plain?
>>
>> It's pretty clearly wrong to download a document that's labeled 
>> "text/plain;charset=utf8" and then store it in a way that will result in 
>> it being treated as having a different encoding, or to display it 
>> directly using a different encoding, as Explorer does.   Since the BOM 
>> is not required or even encouraged by the Unicode Consortium, failing to 
>> get this right is clearly a bug.
> 
> I'm pretty sure that if the browser modified the downloaded file, 
> somebody else would claim that *that* would be a big. For instance, it 
> would affect signatures.

Yes. If you consider that the exact content of a file is what you want
when you transfer it from another machine, hidden changes to the content
are plain wrong. Storing metadata with the file is another matter - in
an ideal world you would store "text/plain;charset=utf8" as part of the
metadata. But that isn't the world we live in; all we store for sure
is the filename, which may include the string ".txt", which may mean
the same as "text/plain" but certainly doesn't imply "charset=utf8".

In any case, regardless of how we believe UTF-8 *strings* should be
embedded in protocols, the decision to prepend the equivalent of
"charset=utf8" to a file containing a UTF-8 string is not a protocol
issue.

Modest suggestion: store BOM-free versions named rfc8187.utx etc.
Start a trend that tool implementers can follow.

   Brian