On 2017-03-12 10:06, Peter Cordell wrote:
...
This exact issue just came up in a media type review, where someone
specified a charset parameter because they weren't aware of this
algorithm.
It would be very helpful to have this text in the RFC.
Although it does need slightly more detail to take into account
endian-ness in the case of UTF-16 and -32.
...
Does anybody recall why we removed
<https://tools.ietf.org/html/rfc4627#section-3>:
3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
?
Best regards, Julian