Re: [Json] BOMs (Was: Re: JSON: remove gap between Ecma-404 and IETF draft)

Tim Bray <tbray@xxxxxxxxxxxxxx> · Mon, 18 Nov 2013 07:54:58 -0800

This feels backward, because BOMs are actually useful for UTF-16 and UTF-32, but essentially useless for UTF-8.

On Mon, Nov 18, 2013 at 2:05 AM, Pete Cordell <petejson@xxxxxxxxxxxxx> wrote:

Given the history below, would it be sensible to accept BOMs for UTF-8

encoding, but not for UTF-16 and UTF-32?  In other words, are BOMs needed

and/or used in the wild for UTF-16 and UTF-32?

Maybe the text can say something like "SHOULD accept BOMs for UTF-8, and MAY accept BOMs for UTF-16 and / or UTF-32"?

Thanks,

Pete Cordell

Codalogic Ltd

C++ tools for C++ programmers, http://codalogic.com

Read & write XML in C++, http://www.xml2cpp.com

----- Original Message ----- From: ""Martin J. Dürst"" <duerst@xxxxxxxxxxxxxxx>

To: "Henry S. Thompson" <ht@xxxxxxxxxxxx>

Cc: "John Cowan" <cowan@xxxxxxxxxxxxxxxx>; "IETF Discussion"

<ietf@xxxxxxxx>; "Paul Hoffman" <paul.hoffman@xxxxxxxx>; "JSON WG"

<json@xxxxxxxx>; "Joe Hildebrand (jhildebr)" <jhildebr@xxxxxxxxx>; "Anne van

Kesteren" <annevk@xxxxxxxxx>; <www-tag@xxxxxx>; "es-discuss"

<es-discuss@xxxxxxxxxxx>

Sent: Thursday, November 14, 2013 11:14 AM

Subject: Re: [Json] JSON: remove gap between Ecma-404 and IETF draft

Hello Henry, others,

On 2013/11/14 18:44, Henry S. Thompson wrote:

John Cowan writes:

Joe Hildebrand (jhildebr) scripsit:

If 404 doesn't allow [a BOM], I don't see a strong need to add it.

Parsers can always be more forgiving of what they will parse than what

the spec says, particularly since section 9 says "A JSON parser MAY

accept non-JSON forms or extensions".

It's not clear that 404 disallows it, since 404 is defined in terms of

characters, and a BOM is not a character but an out-of-band signal.

I think this is a crucial observation.

Yes, and I think it's based on the experience with XML. But while this

experience may be applicable to JSON, Anne's original comment about the

BOM and XMLHttpRequest suggests that 404 actually currently does not

tolerate a BOM, and that implementations (except for XMLHttpRequest) also

don't.

To give some historic background, the BOM for UTF-8 wasn't in the first

edition of XML (http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing).

It only later came in because Microsoft used it for notepad to be able to

quickly distinguish between UTF-8 and the legacy system encoding. Because

many people were writing some XML by hand, and some of them were using

notepad, the pressure on XML to accept a BOM at the start of an UTF-8 file

mounted, and it was included in the second edition of the XML

Recommendation (http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing).

Compared to XML, JSON may be much less edited by hand, or much less edited

on notepad, or otherwise just have a different history from XML, but we

have to make sure.

Regards,   Martin.

I note that XML approaches

this problem in what might be a useful way.  The XML ABNF makes no

mention of BOM, it's not part of any XML document as such.  But it

_is_ allowed.  The relevant wording [1] is:

   Entities ... may begin with the Byte Order Mark described by Annex H

   of [ISO/IEC 10646:2000], section 16.8 of [Unicode] (the ZERO WIDTH

   NO-BREAK SPACE character, #xFEFF). _This is an encoding signature,_

   _not part of either the markup or the character data of the XML_

   _document._ XML processors must be able to use this character to

   differentiate between UTF-8 and UTF-16 encoded documents. [emphasis

   added]

ht

[1] http://www.w3.org/TR/REC-xml/#charencoding

_______________________________________________

json mailing list

json@xxxxxxxx

https://www.ietf.org/mailman/listinfo/json

_______________________________________________

json mailing list

json@xxxxxxxx

https://www.ietf.org/mailman/listinfo/json