--On Tuesday, April 2, 2024 18:21 -0700 Rob Sayre <sayrer@xxxxxxxxx> wrote: > On Tue, Apr 2, 2024 at 5:44 PM John C Klensin > <john-ietf@xxxxxxx> wrote: > >> It seems to me that your conclusion depends on an assumption >> about "the JMAP community" that might be questionable. An >> analysis of the JMAP specification leads me to a slightly >> different conclusion; inline below. > At the risk of repeating myself, I don't think PRECIS really > does the trick here, even though it's easy to see it does work > sometimes (I went and looked at the "Referenced By" stuff in > the datatracker). The document is "Preparation, Enforcement, > and Comparison of Internationalized Strings Representing > Usernames and Passwords". Rob, I didn't mean to say "require PRECIS", only that we are under some obligation to do better than saying "UTF-8" and giving length boundaries. If PRECIS is a good answer, fine. If not, that is fine too, as long as we are clear about the alternative. Beyond that as far as PRECIS is concerned, I'm not sure what you are talking about because both RFC 7564 and 8264 are "Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols". RFC 8265 is about "...Representing Usernames and Passwords", but that is appreciably more restrictive. > But there are other problems that will come up, like writing > street addresses in Japanese, Thai, etc., as written here: > > https://datatracker.ietf.org/doc/draft-ietf-calext-jscontact/ > > So, while PRECIS might apply cleanly to this draft, I think > the JSON implementations will be facing much more varied > content. Ok. But this is a Last Call on this document. It isn't even about "JSON implementations", it is about a very specific application of JMAP. > That's why I prefer the approach here: > > https://datatracker.ietf.org/doc/draft-bray-unichars/ > > It's also realistic about the fact you will get the so-called > "toxic waste" from whatever JavaScript's JSON.parse and > JSON.stringify do. This way is even better as it relates to > "any UTF-8 string", because the Unichars draft covers escape > sequences. The issue in JSON or XML is that you can send > perfectly valid UTF-8, but there might be escape sequences > that represent total garbage from a Unicode perspective. The > Unichars draft provides some usefully adversarial examples. Understood. I'm reluctantly going to respond briefly on this thread but then think the discussion should go elsewhere since we are not, AFAICT, even close to a Last Call on, e.g., draft-bray-unichars. I suggest that there is something of a spectrum with "just use valid UTF-8" at one extreme and very restrictive, application-specific and application-tuned, models like IDNA2008 at the other. In between lie specifications that try to avoid the worst problems one can get into with Unicode, including, in no particular order, draft-bray-unichars, draft-bormann-dispatch-modern-network-unicode, the (IMO) badly outdated RFC 5198, and UTR#36 and/or UTS#39. The PRECIS framework document (RFC 8264), which is what I've had in mind when I (and others) have said "just use PRECIS" is somewhat further out on that spectrum although perhaps not much further than UTS#39 and UTR#36 with a good choice of options. And PRECIS in the specific "Usernames and Passwords" (RFC 8265) and "Nicknames" (RFC 8266) variations are someone further out, but still not as far as IDNA2008. Now, if we can agree that "just use a UTF-8 string" is not good enough (whether STD 63/ RFC 3269 are explicitly referenced or not), the question is how far out on that spectrum one should go in providing restrictions and/or advice ... with the understanding that, while draft-bray-unichars and draft-bormann-dispatch-modern-network-unicode eliminate considerable "toxic waste" and other problem-prone constructions, going further with trash and risk reduction may be appropriate, at least as advice, in many situations. Those I-Ds each specific subsets of the possible collection of strings. The more restrictive specs specify what should be subset of what those two allow (if they aren't proper subsets, I think some careful review is in order). But, again, as far as this particular draft is concerned, the issue as I see it is moving back "UTF-8 string" with some length limits. Something like draft-bray-unichars would be a step in the right direction with the question of whether it goes far enough. But, independent of that answer, it suggests that the current I-D in IETF LC needs some work. best, john -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call