Re: BOMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Mon, Nov 18, 2013 at 8:36 AM, Pete Cordell <petejson@xxxxxxxxxxxxx> wrote:
----- Original Message ----- From: ""Martin J. Dürst"" <duerst@xxxxxxxxxxxxxxx>

On 2013/11/18 20:11, Henry S. Thompson wrote:
Pete Cordell writes:

Given the history below, would it be sensible to accept BOMs for UTF-8
encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs needed
and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8,
and MAY accept BOMs for UTF-16 and / or UTF-32"?

My sense is that you'll see more UTF-16 BOMs than anything else.

Yes indeed. BOM means Byte Order Mark. It's crucial for over-the-wire UTF-16. (It's irrelevant for in-memory UTF-16, but that's not what we are discussing.)

The in-memory case is not entirely irrelevant because a number of JSON messages will be constructed in memory and then squirted to line.

I did a little experiment with Visual Studio.  It will allow me to save in UTF-8 with or without a BOM (like thing).  Saving in UTF-16 (Or was it UCS2?) is always with a BOM.  There didn't seem to be a UTF-32 option.

JSON doesn't need BOMs.  However, there are cases where people might hand edit messages, and if they choose to save in UTF-16 they will likely have a BOM.

Is it acceptable to tell people not to save hand editted files in UTF-16, suggesting UTF-8 (possibly with an encoded BOM) as an alternative?

I would imagine that if someone did have a hand editted UTF-8 file on Windows then the allowance of a BOM would help their sanity immeasurably, but it's not something I have firsthand knowledge of.


I believe the opposite is true.

The failure of Windows to correctly process documents without BOM markers is a constant pain trying to use .NET to parse XML.

The ability to compose a JSON message by wrapping another JSON message is essential. That is, it has to be possible to write something like

printf ("{\"Object\", %s}", Text); 


I use the .NET platform heavily. Please do not let Microsoft off the hook here. The cost of doing so is having to write code to kick out spurious BOM sequences occurring at any random point in the text. Which becomes really painful when having to deal with strings where there might actually be a reason to put the BOM in. 

The benefit of not doing so is that it might encourage Microsoft to fix their tools so that they don't insert spurious BOM sequences in documents where doing so breaks them.

 
--
Website: http://hallambaker.com/

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]