Re: BOMs

"Pete Cordell" <petejson@xxxxxxxxxxxxx> · Mon, 18 Nov 2013 13:36:13 -0000

----- Original Message ----- 
From: ""Martin J. Dürst"" <duerst@xxxxxxxxxxxxxxx>
On 2013/11/18 20:11, Henry S. Thompson wrote:
Pete Cordell writes:

Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs 
needed
and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
and MAY accept BOMs for UTF-16 and / or UTF-32"?

My sense is that you'll see more UTF-16 BOMs than anything else.

Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire 
UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are 
discussing.)

The in-memory case is not entirely irrelevant because a number of JSON 
messages will be constructed in memory and then squirted to line.

I did a little experiment with Visual Studio.  It will allow me to save in 
UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it 
UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.

JSON doesn't need BOMs.  However, there are cases where people might hand 
edit messages, and if they choose to save in UTF-16 they will likely have a 
BOM.

Is it acceptable to tell people not to save hand editted files in UTF-16, 
suggesting UTF-8 (possibly with an encoded BOM) as an alternative?

I would imagine that if someone did have a hand editted UTF-8 file on 
Windows then the allowance of a BOM would help their sanity immeasurably, 
but it's not something I have firsthand knowledge of.

I believe Unix/Linux works with UTF-8 without BOMs.  Is this the case?

Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com