----- Original Message -----
From: ""Martin J. Dürst"" <duerst@xxxxxxxxxxxxxxx>
On 2013/11/18 20:11, Henry S. Thompson wrote:
Pete Cordell writes:
Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32? In other words, are BOMs
needed
and/or used in the wild for UTF-16 and UTF-32?
Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
and MAY accept BOMs for UTF-16 and / or UTF-32"?
My sense is that you'll see more UTF-16 BOMs than anything else.
Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire
UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are
discussing.)
The in-memory case is not entirely irrelevant because a number of JSON
messages will be constructed in memory and then squirted to line.
I did a little experiment with Visual Studio. It will allow me to save in
UTF-8 with or without a BOM (like thing). Saving in UTF-16 (Or was it
UCS2?) is always with a BOM. There didn't seem to be a UTF-32 option.
JSON doesn't need BOMs. However, there are cases where people might hand
edit messages, and if they choose to save in UTF-16 they will likely have a
BOM.
Is it acceptable to tell people not to save hand editted files in UTF-16,
suggesting UTF-8 (possibly with an encoded BOM) as an alternative?
I would imagine that if someone did have a hand editted UTF-8 file on
Windows then the allowance of a BOM would help their sanity immeasurably,
but it's not something I have firsthand knowledge of.
I believe Unix/Linux works with UTF-8 without BOMs. Is this the case?
Pete Cordell
Codalogic Ltd
C++ tools for C++ programmers, http://codalogic.com
Read & write XML in C++, http://www.xml2cpp.com