Hi all, I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. It is rather surprising to find notable omissions with what is effectively rfc4627ter, though that seems to be my conclusion after performing this review. Luckily, they should be easy to resolve with just a touch more text in the security considerations, for the most part. As an overall summary, this document is a simple, clean, crisp writeup of the JSON interchange format, which is expected to match that of the ECMAscript standard. Since this is a data format and not a protocol, any security issues with this document are expected to be second-order effects, relating to how the format is used or could be (mis)used by consuming protocols. It is useful to give guidance on issues that may occur due to known-bad implementations, edge cases that can cause trouble, etc., and this document does so, noting that the fields in an object are unordered (but some implementations respect the order in their APIs), when comparing field names the potential for escaped characters must be taken into account, many implementations have limited precision/bounds for numbers, non-number mathematical values cannot be expressed in JSON, etc. However, I think that the security considerations section would benefit from some discussion of potential/example consequences of failing to heed those issues: Doing name comparison without respect for excaping could let an attacker inject unsanitized data into what is supposed to be a trusted structure, potentially giving privilege escalation, compromising authentication and authorization, etc. Considering only the first (or last) of duplicated name/value pairs can lead to vulnerabilities by bypassing security checks, in mixed environments. Similarly, a reliance on fields being returned in order could cause issues (certainly denial of service, potentially worse) if an attacker uses a different order than expected. Perhaps the start of section 4 could refer forward to discussion in the security considerations, which give some exposition on the consequences of breaking the "names within an object SHOULD be unique" guidance. Implementations should probably check that their numerical routines do not attempt to emit "NaN" or "Inf" or simlar, though I do not have a good picture of how that would affect real systems (hopefully, just a parse error on the receiver, as those are not permitted by the grammar). Similarly, the use of extremely large or precise numbers in mixed environments can lead to unexpected results. I expect that there could be vulnerabilities here, such as if code can be induced into a loop with incearingly large values, where a sender ends up producing numbers larger than can be represented on the receiver, which then saturates its strtonum-equivalent and hits an unexpected codepath. I'm also concerned about the freewheeling use of Unicode. While this document does discuss the potential encodings and lists UTF-8 as the default (and most interoperable), I think it would benefit from a stricter warning that parties using JSON for communication must have some out-of-band way to agree on what encoding is to be used. I would expect that this is usually going to be done by the protocol using JSON, but could see a place for the actual communicating peers to have out-of-band knowledge. (An application having to guess what encoding is being used based on heuristics is a recipe for disaster.) Additionally, the document makes no mention of Unicode normalization, which can be a minefield. The precis working group has a lot of work in this area, from which the executive summary is: it's a lot of work to do things correctly, and being sloppy usually leads to vulnerabilities. The most obvious issue would be in (the comparison of) field names using strings that can be represented differently in different normalization forms (for example, e with acute accent), which can be either U+00e9 or U+0064 and the combining character U+0301. Simply converting to Unicode code points is insufficient for an implementation to cause those strings to compare as equivalent. I think this document should at least mention that Unicode normalization forms exist and should be considered by protocol designers when using JSON with characters outside of US-ASCII. Section 9 (parsers) mentions that an implementation "may set limits" on various parameters. That seems like a good place to give some guidance on what limits are in common use and how protocols or protocol peers might discover and/or negotiate each others' implementation limits. With the proposed additions to the security considerations out of the say, on to some other minutiae. Section 3 describes that "[a] JSON value MUST be an object, array, number, or string, or one of the following three literal names: false null true"; in the introduction, this (equivalent) list is given as "object, array, number, string, boolean, or null", in the guise of "four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays)". There may be some value in using consistent terminology between the two locations, but it would be pretty minor value and might not be worth the effort. The format for numbers (Section 6) permits an explicit (redundant) plus sign in the exponent part, but not in front of the overall number. This feels slightly strange to me, but I don't know if there are any implementations that get this wrong. (I also don't really see security consequences of getting it wrong.) Similarly, the fractional part must contain a digit (that is, a number ending in a decimal point is forbidden), which is probably harmless. I'll also note that the number 3.141592653589793238462643383279 is used as an example of a precision unrepresentable in 64-bit IEEE 754 floating-point, but the next digit in the expansion of pi is a '5', which would round up to ...80. Maybe another digit should be added to keep overly pedantic readers from worrying about rounding modes :) Should section 8.3 (string comparison) use any RFC 2119 language with respect to treating strings as equal that use escapes (or even normalization forms, as covered above)? The IANA considerations (section 11) might be a little more explicit that the IANA "Media Types" registry is what is being modified. I'm also rather curious about the claim that no "charset" parameter is needed as it "really has no effect on compliant recipients". Why is this not a good way to communicate whether UTF-8, UTF-16, or UTF-32 is in use for a given text? It might also be useful to have an example of an array object that contains elements of different type, to reinforce that this is permitted. With respect to section 1.3, I personally am more used to seeing a "changes since <previous document>" section that lays out what changed inline, as opposed to with (opaque) references to specific errata. But that is rather an editorial issue so my personal preference should not bind you. -Ben