On Thu, Nov 21, 2013 at 1:37 PM, Bjoern Hoehrmann <derhoermi@xxxxxxx> wrote: > * John Cowan wrote: >>Bjoern Hoehrmann scripsit: >> >>> Is there any chance, by the way, to change `JSON.stringify` so it does >>> not output strings that cannot be encoded using UTF-8? Specifically, >>> >>> JSON.stringify(JSON.parse("\"\uD800\"")) >>> >>> would need to escape the surrogate instead of emitting it literally. >> >>No, there isn't. We've been down this road repeatedly. People can and >>do use JSON strings to encode arbitrary sequences of unsigned 16-bit integers. > > The output of JSON.stringify("\uD800") contains no backslash character, > if you call `utf8_encode(JSON.stringify("\uD800"))` you get an exception > because UTF-8 cannot encode the lone surrogate and `utf8_encode` does > not know it could encode it as `\uD800` without loss of information. If > `JSON.stringify` produced an escape sequence instead, there would be no > problem passing the output to `utf8_encode`. That's just one implementation. We had hundreds of e-mails in this list about this. Well over a thousand to cover several issues like this. I think the only area where we have [roughly] consensus to revisit the previous consensus is the top-level value restriction, which has led to the whole UTF and byte-order detection sub-thread (which we had, also, had before). We're on much stronger ground to revisit this one matter than the whole unpaired surrogates matter, and we're much much less likely to change our consensus on that because one proposal is about relaxing JSON to match ECMAScript's definition, while yours is to do the opposite. Nico --