On 2017-03-11 03:08, John Cowan wrote:
On Thu, Mar 9, 2017 at 12:53 AM, Benjamin Kaduk <kaduk@xxxxxxx <mailto:kaduk@xxxxxxx>> wrote: If that's what's supposed to happen, it should probably be more clear, yes. (But aren't there texts that have valid interpretations in multiple encodings?) Not if the content is well-formed JSON and the only possible encodings are UTF-8, UTF-16, and UTF-32. It suffices to examine the first four bytes of the input. If there are no NUL bytes in the first four bytes, it is UTF-8; if there are two NUL bytes, it is UTF-16; if there are three NUL bytes, it is UTF-32. This works because the grammar requires the first character to be in the ASCII repertoire, and the NUL *character* (U+0000) is not allowed at all.
Good explanation. Maybe the spec should include it. Best regards, Julian