Re: [Json] BOMs

Tatu Saloranta <tsaloranta@xxxxxxxxx> · Tue, 19 Nov 2013 10:30:47 -0800

On Tue, Nov 19, 2013 at 4:31 AM, Bjoern Hoehrmann <derhoermi@xxxxxxx> wrote:

* Tatu Saloranta wrote:

>Dominant Java implementations support UTF-16 with BOM; either directly or

>through Java's Reader implementations that handle BOMs.

>String concatenation case seems irrelevant, since BOMs are not included in

>in-memory representation anyway, as opposed to byte stream serialization.

HTTP implementations cannot correctly determine whether an entity body

is text in a single character encoding and if so what that encoding is,

accordingly the dominant API deals in byte[] arrays, not text Strings;

furthermore, many programming languages default to byte[] arrays for

string literals. That often combines into forms of

  byte[] json = sprintf('{"x": %s, "y": %s}', GET(...), GET(...));

which works fine if all three byte[] arrays are UTF-8 encoded and use

no Unicode signature, which is the case 99% of the time.

My point was just that although it appears that many scripting languages may not deal with BOM properly, same is not true on all platforms. Proper JSON APIs on JVM do accept both String and byte[] based input; byte[] being preferred since it is more efficient, and reliably with auto-detection, assuming that -- as per JSON specification -- the only single-byte encoding used is UTF-8.

-+ Tatu +-