On 2017-03-12 16:31, Carsten Bormann wrote:
On 12 Mar 2017, at 10:14, Julian Reschke <julian.reschke@xxxxxx> wrote:
Does anybody recall why we removed <https://tools.ietf.org/html/rfc4627#section-3>:
I seem to remember that the advice simply is no longer working since JSON was extended from 4627 to 7159. Instead of trying to come up with an updated algorithm, the WG recognized that this is not a real-world problem.
> ...
So the changes in RFC 7159 allow top-level strings, so we can't rely on
the first *two* characters being US-ASCII. But we *can* rely on the
first one being US-ASCII, no?
So the following should still be correct:
Since the first character of a JSON text will always be an ASCII
character [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx xx xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx xx UTF-16LE
xx xx xx xx UTF-8
Best regards, Julian